[GitHub] [spark] bjornjorgensen commented on pull request #40381: [SPARK-42761][BUILD] Upgrade `fabric8:kubernetes-client` to 6.5.0

2023-03-12 Thread via GitHub


bjornjorgensen commented on PR #40381:
URL: https://github.com/apache/spark/pull/40381#issuecomment-1465158915

   **The maintainers of the library contend that the application's trust would 
already have had to be compromised or established and therefore dispute the 
risk associated with this issue on the basis that there is a high bar for 
exploitation. Thus, no fix is expected.**
   
   https://security.snyk.io/vuln/SNYK-JAVA-ORGYAML-3152153
   
   
   This is part of snakeyaml release notes
   
   2.0 (2023-02-26)
   
   Fix #570: SafeConstructor ignores LoaderOptions setCodePointLimit() (thanks 
to Robert Patrick)
   
   Update #565: (Backwards-incompatible) Do not allow global tags by default to 
fix CVE-2022-1471 (thanks to Jonathan Leitschuh)
   
   
   1.32 (2022-09-12)
   
   Fix #547: Set the limit for incoming data to prevent a CVE report in NIST. 
By default it is 3MB
   
   https://bitbucket.org/snakeyaml/snakeyaml/wiki/Changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #39239: [SPARK-41730][PYTHON] Set tz to UTC while converting of timestamps to python's datetime

2023-03-12 Thread via GitHub


MaxGekk commented on PR #39239:
URL: https://github.com/apache/spark/pull/39239#issuecomment-1465148669

   @HyukjinKwon @cloud-fan A problem of PySpark's timestamp_ltz is it is a 
local timestamp, and not a physical timestamp that timestamp_ltz is supposed to 
be. Let's see even Java 7 timestamp (it also has some issues but it is still a 
physical time point not local timestamp):
   ```
   ➜  ~ TZ=America/Los_Angeles ./spark-3.3/bin/spark-shell
   Welcome to
   __
/ __/__  ___ _/ /__
   _\ \/ _ \/ _ `/ __/  '_/
  /___/ .__/\_,_/_/ /_/\_\   version 3.3.2
 /_/
   scala> spark.conf.set("spark.sql.session.timeZone", "Europe/Moscow")
   
   scala> val df = sql("select timestamp'1970-01-01T00:00:00+'")
   df: org.apache.spark.sql.DataFrame = [TIMESTAMP '1970-01-01 03:00:00': 
timestamp]
   scala> df.collect()(0).getTimestamp(0).toGMTString
   res1: String = 1 Jan 1970 00:00:00 GMT
   ```
   The timestamp still knows how to show itself in the UTC time zone because it 
contains an offset from the epoch, and can render itself in any time zone.
   
   Let's see at PySpark timestamp:
   ```
   $ TZ=America/Los_Angeles ./spark-3.3/bin/pyspark
   Welcome to
   __
/ __/__  ___ _/ /__
   _\ \/ _ \/ _ `/ __/  '_/
  /__ / .__/\_,_/_/ /_/\_\   version 3.3.2
 /_/
   >>> spark.conf.set("spark.sql.session.timeZone", "Europe/Moscow")
   >>>
   >>> df = sql("select timestamp'1970-01-01T00:00:00+'")
   >>> df.collect()[0][0].utctimetuple()
   time.struct_time(tm_year=1969, tm_mon=12, tm_mday=31, tm_hour=16, tm_min=0, 
tm_sec=0, tm_wday=2, tm_yday=365, tm_isdst=0)
   ```
   Since python's timestamp becomes a local timestamp at the conversion:
   ```python
   return datetime.datetime.fromtimestamp(ts // 
100).replace(microsecond=ts % 100)
   ```
   it cannot render itself in the UTC time zone correctly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bjornjorgensen commented on pull request #40381: [SPARK-42761][BUILD] Upgrade `fabric8:kubernetes-client` to 6.5.0

2023-03-12 Thread via GitHub


bjornjorgensen commented on PR #40381:
URL: https://github.com/apache/spark/pull/40381#issuecomment-1465141874

   Well, the comment that you are refereeing to, have a link but I cant get in 
   
![image](https://user-images.githubusercontent.com/47577197/224536606-58b733ab-cfb9-47e6-bf19-485fae5e3f2c.png)
   
   3 weeks later they merged a PR 
https://github.com/fabric8io/kubernetes-client/commit/43b04f6cc2cde0b8cebb76c842c09de30c236780
 that fix this issue. 
   
   And yesterday SNYK open a PR to my repo for this issue. 
https://github.com/bjornjorgensen/spark/pull/102 
   I can always change the text for this PR, but I haven't seen anything that 
makes me believe that kubernets-client is not affected by this CVE.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yabola commented on pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice when no filters in vectorized reader

2023-03-12 Thread via GitHub


yabola commented on PR #39950:
URL: https://github.com/apache/spark/pull/39950#issuecomment-1465215240

   @sunchao Hi~ Could you take a look at this PR?  I think it will be useful 
when there are joined tables and filter conditions are on few tables.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


dongjoon-hyun commented on PR #40386:
URL: https://github.com/apache/spark/pull/40386#issuecomment-1465291603

   cc @wangyum and @kenny-ddd 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


wangyum commented on PR #40365:
URL: https://github.com/apache/spark/pull/40365#issuecomment-1465210746

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bjornjorgensen commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


bjornjorgensen commented on PR #40386:
URL: https://github.com/apache/spark/pull/40386#issuecomment-1465304288

   @dongjoon-hyun Thank you. Yes, this PR looks ok. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bjornjorgensen commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


bjornjorgensen commented on PR #40386:
URL: https://github.com/apache/spark/pull/40386#issuecomment-1465316569

   Seams to be working now :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] commented on pull request #38887: [SPARK-41368][SQL] Reorder the window partition expressions by expression stats

2023-03-12 Thread via GitHub


github-actions[bot] commented on PR #38887:
URL: https://github.com/apache/spark/pull/38887#issuecomment-1465343346

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] github-actions[bot] closed pull request #38674: [SPARK-41160][YARN] Fix error when submitting a task to the yarn that enabled the timeline service

2023-03-12 Thread via GitHub


github-actions[bot] closed pull request #38674: [SPARK-41160][YARN] Fix error 
when submitting a task to the yarn that enabled the timeline service
URL: https://github.com/apache/spark/pull/38674


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


dongjoon-hyun commented on PR #40365:
URL: https://github.com/apache/spark/pull/40365#issuecomment-1465290797

   Oh, it seems to break Scala style, @wangyum .
   ```
   [error] 
/home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/LimitPushDownThroughWindow.scala:33:
 File line length exceeds 100 characters
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


dongjoon-hyun commented on code in PR #40365:
URL: https://github.com/apache/spark/pull/40365#discussion_r1133318969


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/LimitPushDownThroughWindow.scala:
##
@@ -30,7 +30,7 @@ import 
org.apache.spark.sql.catalyst.trees.TreePattern.{LIMIT, WINDOW}
  * }}}
  */
 object LimitPushDownThroughWindow extends Rule[LogicalPlan] {
-  // The window frame of RankLike and RowNumberLike can only be UNBOUNDED 
PRECEDING to CURRENT ROW.
+  // The window frame of Rank, DenseRank and RowNumber can only be UNBOUNDED 
PRECEDING to CURRENT ROW.

Review Comment:
   This seems to exceed 100 line and this PR didn't pass the CI.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40381: [SPARK-42761][BUILD] Upgrade `fabric8:kubernetes-client` to 6.5.0

2023-03-12 Thread via GitHub


dongjoon-hyun commented on PR #40381:
URL: https://github.com/apache/spark/pull/40381#issuecomment-1465293982

   BTW, I have two additional questions for the following PR you referred.
   
   
   1. Do you mean it's the evidence of the previous assessment?
   
   > SafeConstructor ignores LoaderOptions setCodePointLimit() (thanks to 
Robert Patrick)
   
   2. Do you think the following major version change is safe to us?
   ```
   - 1.33
   + 2.5
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


dongjoon-hyun commented on PR #40386:
URL: https://github.com/apache/spark/pull/40386#issuecomment-1465298354

   Scala linter passed. 
   
   ![Screenshot 2023-03-12 at 1 59 52 
PM](https://user-images.githubusercontent.com/9700541/224573370-cbd15aa7-0cdc-43d6-8d01-9f35d77547f6.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


dongjoon-hyun commented on PR #40386:
URL: https://github.com/apache/spark/pull/40386#issuecomment-1465298419

   Hi, @bjornjorgensen . Could you review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


dongjoon-hyun commented on PR #40365:
URL: https://github.com/apache/spark/pull/40365#issuecomment-1465332828

   Oh, no problem at all. I just wanted to share my old mistakes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] StevenChenDatabricks opened a new pull request, #40385: [SPARK-42753] ReusedExchange refers to non-existent nodes

2023-03-12 Thread via GitHub


StevenChenDatabricks opened a new pull request, #40385:
URL: https://github.com/apache/spark/pull/40385

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bjornjorgensen commented on pull request #40381: [SPARK-42761][BUILD] Upgrade `fabric8:kubernetes-client` to 6.5.0

2023-03-12 Thread via GitHub


bjornjorgensen commented on PR #40381:
URL: https://github.com/apache/spark/pull/40381#issuecomment-1465294879

   ok, I didn't know who this user was. I have updated the PR text now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


dongjoon-hyun commented on PR #40365:
URL: https://github.com/apache/spark/pull/40365#issuecomment-1465294691

   To @wangyum , sometimes, the contributor's GitHub Action doesn't work 
properly. I also made similar mistakes before by forgetting the check. It's our 
responsibility, the committers, are responsible to check it manually. :)
   
   ![Screenshot 2023-03-12 at 1 39 08 
PM](https://user-images.githubusercontent.com/9700541/224572230-7ba55e19-ff83-48dc-923f-1fd72c5389e9.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


dongjoon-hyun commented on PR #40386:
URL: https://github.com/apache/spark/pull/40386#issuecomment-1465305631

   No, only the PR author can do that because we use the author's GitHub 
Actions for that PR.
   > I have seen this one time before. Can you restart tests on the PR that 
fails?
   
   For Apache Spark branch, of course, you can because you are the committer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


dongjoon-hyun opened a new pull request, #40386:
URL: https://github.com/apache/spark/pull/40386

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


dongjoon-hyun commented on PR #40386:
URL: https://github.com/apache/spark/pull/40386#issuecomment-1465300338

   Could you review this in order to recover master branch, @sunchao ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


wangyum commented on PR #40365:
URL: https://github.com/apache/spark/pull/40365#issuecomment-1465332554

   @dongjoon-hyun Sorry. I will pay attention to check next time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #40381: [SPARK-42761][BUILD][K8S] Upgrade `kubernetes-client` to 6.5.0

2023-03-12 Thread via GitHub


dongjoon-hyun closed pull request #40381: [SPARK-42761][BUILD][K8S] Upgrade 
`kubernetes-client` to 6.5.0
URL: https://github.com/apache/spark/pull/40381


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #40380: [SPARK-42760][DOCS][PYTHON] provide one format for writing to kafka

2023-03-12 Thread via GitHub


srowen commented on PR #40380:
URL: https://github.com/apache/spark/pull/40380#issuecomment-1465301926

   Wrong JIRA is linked, please adjust


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bjornjorgensen commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


bjornjorgensen commented on PR #40386:
URL: https://github.com/apache/spark/pull/40386#issuecomment-1465304673

   I have seen this one time before. Can you restart tests on the PR that 
fails? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


dongjoon-hyun commented on PR #40386:
URL: https://github.com/apache/spark/pull/40386#issuecomment-1465304619

   Thank you so much, @bjornjorgensen .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40381: [SPARK-42761][BUILD] Upgrade `fabric8:kubernetes-client` to 6.5.0

2023-03-12 Thread via GitHub


dongjoon-hyun commented on PR #40381:
URL: https://github.com/apache/spark/pull/40381#issuecomment-1465293060

   @bjornjorgensen . Do you mean you can not trust `winniegy`, the fabric8io 
community member's comment? He close the issue after that comment.
   ![Screenshot 2023-03-12 at 1 26 33 
PM](https://user-images.githubusercontent.com/9700541/224571715-ee1bd3d2-07c6-4097-9f6f-09e08d9c920f.png)
   
   Hence, the migration happens independently from the CVE just for the future 
release.
   
   In short, the following claim is wrong according to the context.
   > 3 weeks later they merged a PR 
https://github.com/fabric8io/kubernetes-client/commit/43b04f6cc2cde0b8cebb76c842c09de30c236780
 that fix this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] bjornjorgensen commented on pull request #40381: [SPARK-42761][BUILD] Upgrade `fabric8:kubernetes-client` to 6.5.0

2023-03-12 Thread via GitHub


bjornjorgensen commented on PR #40381:
URL: https://github.com/apache/spark/pull/40381#issuecomment-1465295815

   Now that it turns out that the information that I have been given is not 
correct. This change the whole picture with why we should include this PR. I 
think we can wait until we are done with 3.4 so that we are on the safe side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


dongjoon-hyun commented on PR #40386:
URL: https://github.com/apache/spark/pull/40386#issuecomment-1465302694

   Since this is a comment only change, let me merge this to recover the master 
branch and PRs.
   
   ![Screenshot 2023-03-12 at 2 20 11 
PM](https://user-images.githubusercontent.com/9700541/224574385-6ab72bd1-4d16-42cf-a42a-72c0459f9a73.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


dongjoon-hyun closed pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle 
in LimitPushDownThroughWindow
URL: https://github.com/apache/spark/pull/40386


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40386: [MINOR][SQL][FOLLOWUP] Fix scalastyle in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


dongjoon-hyun commented on PR #40386:
URL: https://github.com/apache/spark/pull/40386#issuecomment-1465302800

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum closed pull request #40365: [MINOR][SQL] Fix incorrect comment in LimitPushDownThroughWindow

2023-03-12 Thread via GitHub


wangyum closed pull request #40365: [MINOR][SQL] Fix incorrect comment in 
LimitPushDownThroughWindow
URL: https://github.com/apache/spark/pull/40365


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ivoson commented on pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-12 Thread via GitHub


ivoson commented on PR #40286:
URL: https://github.com/apache/spark/pull/40286#issuecomment-1465226363

   > Just a doc change I had missed last time around. Rest looks good to me - 
can you check the proposed change, and reformulate it to something similar ? I 
will merge it once done
   
   Hi @mridulm thanks for the suggestion. Updated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mridulm commented on pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry

2023-03-12 Thread via GitHub


mridulm commented on PR #40286:
URL: https://github.com/apache/spark/pull/40286#issuecomment-1465327881

   Can you update to latest @ivoson ?
   The style failure is not related to your change, but blocks build


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #40375: [SPARK-42755][CONNECT] Factor literal value conversion out to `connect-common`

2023-03-12 Thread via GitHub


zhengruifeng commented on code in PR #40375:
URL: https://github.com/apache/spark/pull/40375#discussion_r1133354786


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralExpressionProtoConverter.scala:
##
@@ -107,7 +107,7 @@ object LiteralValueProtoConverter {
   
ArrayType(DataTypeProtoConverter.toCatalystType(lit.getArray.getElementType)))
 
   case _ =>
-throw InvalidPlanInput(
+throw new UnsupportedOperationException(
   s"Unsupported Literal Type: ${lit.getLiteralTypeCase.getNumber}" +
 s"(${lit.getLiteralTypeCase.name})")

Review Comment:
   +1, let me revert here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #40376: [SPARK-42756][CONNECT][PYTHON] Helper function to convert proto literal to value in Python Client

2023-03-12 Thread via GitHub


zhengruifeng commented on code in PR #40376:
URL: https://github.com/apache/spark/pull/40376#discussion_r1133369721


##
python/pyspark/sql/connect/expressions.py:
##
@@ -308,6 +308,43 @@ def _infer_type(cls, value: Any) -> DataType:
 def _from_value(cls, value: Any) -> "LiteralExpression":
 return LiteralExpression(value=value, 
dataType=LiteralExpression._infer_type(value))
 
+@classmethod
+def _to_value(cls, literal: "proto.Expression.Literal") -> Any:
+if literal.HasField("null"):
+return None
+elif literal.HasField("binary"):
+return literal.binary
+elif literal.HasField("boolean"):
+return literal.boolean
+elif literal.HasField("byte"):
+return literal.byte
+elif literal.HasField("short"):
+return literal.short
+elif literal.HasField("integer"):
+return literal.integer
+elif literal.HasField("long"):
+return literal.long
+elif literal.HasField("float"):
+return literal.float
+elif literal.HasField("double"):
+return literal.double
+elif literal.HasField("decimal"):
+return decimal.Decimal(literal.decimal.value)
+elif literal.HasField("string"):
+return literal.string
+elif literal.HasField("date"):
+return DateType().fromInternal(literal.date)
+elif literal.HasField("timestamp"):
+return TimestampType().fromInternal(literal.timestamp)
+elif literal.HasField("timestamp_ntz"):
+return TimestampNTZType().fromInternal(literal.timestamp_ntz)
+elif literal.HasField("day_time_interval"):
+return 
DayTimeIntervalType().fromInternal(literal.day_time_interval)
+elif literal.HasField("array"):
+return [LiteralExpression._to_value(v) for v in 
literal.array.elements]

Review Comment:
   that is fine



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #40376: [SPARK-42756][CONNECT][PYTHON] Helper function to convert proto literal to value in Python Client

2023-03-12 Thread via GitHub


zhengruifeng commented on code in PR #40376:
URL: https://github.com/apache/spark/pull/40376#discussion_r1133369822


##
python/pyspark/sql/connect/expressions.py:
##
@@ -308,6 +308,43 @@ def _infer_type(cls, value: Any) -> DataType:
 def _from_value(cls, value: Any) -> "LiteralExpression":
 return LiteralExpression(value=value, 
dataType=LiteralExpression._infer_type(value))
 
+@classmethod
+def _to_value(cls, literal: "proto.Expression.Literal") -> Any:
+if literal.HasField("null"):
+return None

Review Comment:
   `None` here is a valid value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you opened a new pull request, #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default

2023-03-12 Thread via GitHub


ulysses-you opened a new pull request, #40390:
URL: https://github.com/apache/spark/pull/40390

   
   
   ### What changes were proposed in this pull request?
   
   This pr enables the 
`spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` by default.
   
   ### Why are the changes needed?
   
   We have fixed all known issues when enable cache + AQE since SPARK-42101. 
There is no reason to skip AQE optimizing cached plan.
   
   ### Does this PR introduce _any_ user-facing change?
   
   yes, the default config changed
   
   ### How was this patch tested?
   
   Pass CI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a diff in pull request #40262: [SPARK-42651][SQL] Optimize global sort to driver sort

2023-03-12 Thread via GitHub


ulysses-you commented on code in PR #40262:
URL: https://github.com/apache/spark/pull/40262#discussion_r1133461951


##
sql/core/src/main/scala/org/apache/spark/sql/execution/DriverSortExec.scala:
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import java.util.concurrent.TimeUnit.NANOSECONDS
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Attribute, NamedExpression, 
SortOrder, UnsafeProjection}
+import 
org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering
+import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, 
SinglePartition}
+import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
+
+/**
+ * A special operator for global [[SortExec]] if it is the root node 
with/without [[ProjectExec]].
+ */
+case class DriverSortExec(

Review Comment:
   We have a rule `EliminateLimits` to remove limit if it's large enough, i.e., 
the actually row count is less than or equal to limit number. So if the `Sort` 
is the root node that means:
   1. the user does not specify the limit
   2. the limit has no meaning and has been eliminated
   
   I think if the data size is small enough, `DriverSortExec` is faster than 
`TakeOrderedAndProjectExec`:
   - `DriverSortExec` only sort data once at driver side
   - `TakeOrderedAndProjectExec` sort the data at executor side then merge the 
sorted data at driver side



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #40376: [SPARK-42756][CONNECT][PYTHON] Helper function to convert proto literal to value in Python Client

2023-03-12 Thread via GitHub


zhengruifeng commented on code in PR #40376:
URL: https://github.com/apache/spark/pull/40376#discussion_r1133373330


##
python/pyspark/sql/connect/expressions.py:
##
@@ -308,6 +308,43 @@ def _infer_type(cls, value: Any) -> DataType:
 def _from_value(cls, value: Any) -> "LiteralExpression":
 return LiteralExpression(value=value, 
dataType=LiteralExpression._infer_type(value))
 
+@classmethod
+def _to_value(cls, literal: "proto.Expression.Literal") -> Any:
+if literal.HasField("null"):
+return None
+elif literal.HasField("binary"):
+return literal.binary
+elif literal.HasField("boolean"):
+return literal.boolean
+elif literal.HasField("byte"):
+return literal.byte
+elif literal.HasField("short"):
+return literal.short
+elif literal.HasField("integer"):
+return literal.integer
+elif literal.HasField("long"):
+return literal.long
+elif literal.HasField("float"):
+return literal.float
+elif literal.HasField("double"):
+return literal.double
+elif literal.HasField("decimal"):
+return decimal.Decimal(literal.decimal.value)
+elif literal.HasField("string"):
+return literal.string
+elif literal.HasField("date"):
+return DateType().fromInternal(literal.date)
+elif literal.HasField("timestamp"):
+return TimestampType().fromInternal(literal.timestamp)
+elif literal.HasField("timestamp_ntz"):
+return TimestampNTZType().fromInternal(literal.timestamp_ntz)
+elif literal.HasField("day_time_interval"):
+return 
DayTimeIntervalType().fromInternal(literal.day_time_interval)
+elif literal.HasField("array"):
+return [LiteralExpression._to_value(v) for v in 
literal.array.elements]

Review Comment:
   let me add one



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] navinvishy commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-12 Thread via GitHub


navinvishy commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1465390080

   > @navinvishy sorry, it seems late for 3.4.0. would you mind changing the 
version from `3.4.0` to `3.5.0`?
   > 
   > I am going to merge this PR to master this week if no more comments.
   
   I've updated the version. Thanks @zhengruifeng !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-meng opened a new pull request, #40388: [SPARK-42765][CONNECT][PYTHON] Regulate the import path of `pandas_udf`

2023-03-12 Thread via GitHub


xinrong-meng opened a new pull request, #40388:
URL: https://github.com/apache/spark/pull/40388

   ### What changes were proposed in this pull request?
   Regulate the import path of `pandas_udf`.
   
   ### Why are the changes needed?
   Usability.
   
   ### Does this PR introduce _any_ user-facing change?
   No. Only the error message is improved, as shown below.
   
   FROM
   ```sh
   >>> pyspark.sql.connect.functions.pandas_udf()
   Traceback (most recent call last):
 File "", line 1, in 
 File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/functions.py", 
line 2470, in pandas_udf
   raise NotImplementedError("pandas_udf() is not implemented.")
   NotImplementedError: pandas_udf() is not implemented.
   ```
   
   TO
   ```sh
   >>> pyspark.sql.connect.functions.pandas_udf()
   Traceback (most recent call last):
 File "", line 1, in 
 File "/Users/xinrong.meng/spark/python/pyspark/sql/connect/functions.py", 
line 2470, in pandas_udf
   raise NotImplementedError("Please import pandas_udf from 
pyspark.sql.functions.")
   NotImplementedError: Please import pandas_udf from pyspark.sql.functions
   ```
   
   ### How was this patch tested?
   Unit test.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] liang3zy22 commented on a diff in pull request #40347: [SPARK-42711][BUILD]Update usage info for sbt tool

2023-03-12 Thread via GitHub


liang3zy22 commented on code in PR #40347:
URL: https://github.com/apache/spark/pull/40347#discussion_r1133395930


##
build/sbt:
##
@@ -17,7 +17,7 @@
 # limitations under the License.
 #
 
-SELF=$(cd $(dirname $0) && pwd)
+SELF=$(cd "$(dirname "$0")" && pwd)

Review Comment:
   This is shellcheck error/warning fix. I think we can keep it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] liang3zy22 commented on a diff in pull request #40347: [SPARK-42711][BUILD]Update usage info for sbt tool

2023-03-12 Thread via GitHub


liang3zy22 commented on code in PR #40347:
URL: https://github.com/apache/spark/pull/40347#discussion_r1133397703


##
build/sbt:
##
@@ -62,7 +63,7 @@ Usage: $script_name [options]
   -sbt-rc   use an RC version of sbt
   -sbt-snapshot use a snapshot version of sbt
 
-  # java version (default: java from PATH, currently $(java -version 2>&1 | 
grep version))
+  # JAVA_HOME (this will override default JAVA_HOME)

Review Comment:
   Well, my words are not clear. I will rollback to old hint info.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang commented on pull request #40389: [SPARK-42767][CONNECT][TESTS] Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly

2023-03-12 Thread via GitHub


LuciferYang commented on PR #40389:
URL: https://github.com/apache/spark/pull/40389#issuecomment-1465482691

   Will update pr description later
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] LuciferYang opened a new pull request, #40389: [SPARK-42767][CONNECT][TESTS] Add check condition to start connect server fallback with `in-memory` and auto ignored some tests strongly

2023-03-12 Thread via GitHub


LuciferYang opened a new pull request, #40389:
URL: https://github.com/apache/spark/pull/40389

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-12 Thread via GitHub


aokolnychyi commented on code in PR #40308:
URL: https://github.com/apache/spark/pull/40308#discussion_r1133467299


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
 } else {
   v2Write
 }
-
-  case u: UpdateTable if !u.skipSchemaResolution && u.resolved =>

Review Comment:
   The only reason I would consider adding a config is to avoid an implicit 
behavior change as we currently check field compatibility for INSERTS during 
analysis. What is the community policy on such things? Is it okay to just 
transition to runtime checks in 3.5?



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
 } else {
   v2Write
 }
-
-  case u: UpdateTable if !u.skipSchemaResolution && u.resolved =>

Review Comment:
   The only reason I would consider adding a config is to avoid an implicit 
behavior change as we currently check field compatibility for INSERTs during 
analysis. What is the community policy on such things? Is it okay to just 
transition to runtime checks in 3.5?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #40387: [SPARK-42764][K8S] Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread via GitHub


dongjoon-hyun commented on PR #40387:
URL: https://github.com/apache/spark/pull/40387#issuecomment-1465358231

   Could you review this PR, @viirya ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes

2023-03-12 Thread via GitHub


viirya commented on code in PR #40308:
URL: https://github.com/apache/spark/pull/40308#discussion_r1133355436


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentUtils.scala:
##
@@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.analysis
+
+import scala.collection.mutable
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.SQLConfHelper
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, 
AttributeReference, CreateNamedStruct, Expression, ExtractValue, 
GetStructField, Literal, NamedExpression}
+import org.apache.spark.sql.catalyst.plans.logical.Assignment
+import org.apache.spark.sql.catalyst.util.CharVarcharUtils
+import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._
+import org.apache.spark.sql.errors.QueryCompilationErrors
+import org.apache.spark.sql.internal.SQLConf.StoreAssignmentPolicy
+import org.apache.spark.sql.types.{ArrayType, DataType, MapType, StructType}
+
+object AssignmentUtils extends SQLConfHelper with CastSupport {
+
+  private case class ColumnUpdate(ref: Seq[String], expr: Expression)
+
+  /**
+   * Aligns assignments to match table columns.
+   * 
+   * This method processes and reorders given assignments so that each target 
column gets
+   * an expression it should be set to. If a column does not have a matching 
assignment,
+   * it will be set to its current value. For example, if one passes table 
attributes c1, c2
+   * and an assignment c2 = 1, this method will return c1 = c1, c2 = 1.
+   * 
+   * This method also handles updates to nested columns. If there is an 
assignment to a particular
+   * nested field, this method will construct a new struct with one field 
updated preserving other
+   * fields that have not been modified. For example, if one passes table 
attributes c1, c2
+   * where c2 is a struct with fields n1 and n2 and an assignment c2.n2 = 1, 
this method will
+   * return c1 = c1, c2 = struct(c2.n1, 1).
+   *
+   * @param attrs table attributes
+   * @param assignments assignments to align
+   * @return aligned assignments that match table columns

Review Comment:
   nit: a consistent description
   
   ```suggestion
  * @param attrs table attributes
  * @param assignments assignments to align
  * @return aligned assignments that match table attributes
   ```



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
 } else {
   v2Write
 }
-
-  case u: UpdateTable if !u.skipSchemaResolution && u.resolved =>

Review Comment:
   Hmm, I feel it is a bit more than required to have additional config for 
this runtime error/analysis error behavior. It sounds like an internal decision 
to databases, not sure if we need it to be configurable. Maybe I don't see too 
much good with such config.
   
   Ideally it should be an unified behavior, I agree this too. If there is 
already some inconsistency. I think we can address them incrementally.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -319,6 +319,7 @@ class Analyzer(override val catalogManager: CatalogManager) 
extends RuleExecutor
   ResolveRandomSeed ::
   ResolveBinaryArithmetic ::
   ResolveUnion ::
+  AlignRowLevelCommandAssignments ::

Review Comment:
   Maybe add a comment that this rule cannot be changed in order for now.



##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentUtils.scala:
##
@@ -0,0 +1,275 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *

[GitHub] [spark] zhengruifeng commented on pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-12 Thread via GitHub


zhengruifeng commented on PR #38947:
URL: https://github.com/apache/spark/pull/38947#issuecomment-1465385482

   @navinvishy  sorry, it seems late for 3.4.0.
   would you mind changing the version from `3.4.0` to `3.5.0`?
   
   I am going to merge this PR to master this week if no more comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #40376: [SPARK-42756][CONNECT][PYTHON] Helper function to convert proto literal to value in Python Client

2023-03-12 Thread via GitHub


zhengruifeng commented on PR #40376:
URL: https://github.com/apache/spark/pull/40376#issuecomment-1465436796

   > Do we intentionally ignore `YearMonthIntervalType`?
   
   it seems that we have not support `YearMonthIntervalType` in vanilla 
PySpark, and the Python Client is reusing PySpark's types.
   So it seems that we should add `YearMonthIntervalType` in PySpark first


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-12 Thread via GitHub


cloud-fan commented on PR #40359:
URL: https://github.com/apache/spark/pull/40359#issuecomment-1465448190

   thanks, merged to master! can you open a backport PR for 3.4?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #40262: [SPARK-42651][SQL] Optimize global sort to driver sort

2023-03-12 Thread via GitHub


cloud-fan commented on code in PR #40262:
URL: https://github.com/apache/spark/pull/40262#discussion_r1133447713


##
sql/core/src/main/scala/org/apache/spark/sql/execution/DriverSortExec.scala:
##
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import java.util.concurrent.TimeUnit.NANOSECONDS
+
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.{Attribute, NamedExpression, 
SortOrder, UnsafeProjection}
+import 
org.apache.spark.sql.catalyst.expressions.codegen.LazilyGeneratedOrdering
+import org.apache.spark.sql.catalyst.plans.physical.{Partitioning, 
SinglePartition}
+import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
+
+/**
+ * A special operator for global [[SortExec]] if it is the root node 
with/without [[ProjectExec]].
+ */
+case class DriverSortExec(

Review Comment:
   Can we use `TakeOrderedAndProjectExec` to replace it? They seem to be 
similar. What if we give a large enough limit to `TakeOrderedAndProjectExec`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default

2023-03-12 Thread via GitHub


ulysses-you commented on PR #40390:
URL: https://github.com/apache/spark/pull/40390#issuecomment-1465534836

   thank you @dongjoon-hyun , I'm fine to hold on this until next release. 
Another thought is I want to make sure all tests can be passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #40314: [SPARK-42698][CORE] SparkSubmit should also stop SparkContext when exit program in yarn mode and pass exitCode to AM side

2023-03-12 Thread via GitHub


cloud-fan commented on PR #40314:
URL: https://github.com/apache/spark/pull/40314#issuecomment-1465541487

   @AngersZh can you comment on the original discussion thread and convince 
related people to add back `sc.stop`? 
https://github.com/apache/spark/pull/32081#discussion_r663434289


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request, #40387: [SPARK-42764][K8S] Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread via GitHub


dongjoon-hyun opened a new pull request, #40387:
URL: https://github.com/apache/spark/pull/40387

   … 
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-12 Thread via GitHub


cloud-fan commented on PR #39624:
URL: https://github.com/apache/spark/pull/39624#issuecomment-1465444732

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #40375: [SPARK-42755][CONNECT] Factor literal value conversion out to `connect-common`

2023-03-12 Thread via GitHub


zhengruifeng commented on code in PR #40375:
URL: https://github.com/apache/spark/pull/40375#discussion_r1133419950


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralExpressionProtoConverter.scala:
##
@@ -192,8 +173,7 @@ object LiteralValueProtoConverter {
 } else if (elementType.hasArray) {
   makeArrayData(v => toArrayData(v.getArray))
 } else {
-  throw InvalidPlanInput(s"Unsupported Literal Type: $elementType)")
+  throw new UnsupportedOperationException(s"Unsupported Literal Type: 
$elementType)")

Review Comment:
   thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-12 Thread via GitHub


cloud-fan commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1133450986


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##
@@ -1399,6 +1399,151 @@ case class ArrayContains(left: Expression, right: 
Expression)
 copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array, element) - Add the element at the beginning of the array 
passed as first
+  argument. Type of element should be similar to type of the elements of 
the array.
+  Null element is also prepended to the array. But if the array passed is 
NULL
+  output is NULL
+""",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array('b', 'd', 'c', 'a'), 'd');
+   ["d","b","d","c","a"]
+  > SELECT _FUNC_(array(1, 2, 3, null), null);
+   [null,1,2,3,null]
+  > SELECT _FUNC_(CAST(null as Array), 2);
+   NULL
+  """,
+  group = "array_funcs",
+  since = "3.5.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+with ImplicitCastInputTypes
+with ComplexTypeMergingExpression
+with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  @transient protected lazy val elementType: DataType =
+inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def eval(input: InternalRow): Any = {
+val value1 = left.eval(input)
+if (value1 == null) {
+  null
+} else {
+  val value2 = right.eval(input)
+  nullSafeEval(value1, value2)
+}
+  }
+  override def nullSafeEval(arr: Any, elementData: Any): Any = {
+val arrayData = arr.asInstanceOf[ArrayData]
+val numberOfElements = arrayData.numElements() + 1
+if (numberOfElements > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+  throw 
QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+}
+val finalData = new Array[Any](numberOfElements)
+finalData.update(0, elementData)
+arrayData.foreach(elementType, (i: Int, v: Any) => finalData.update(i + 1, 
v))
+new GenericArrayData(finalData)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val f = (arr: String, value: String) => {
+  val newArraySize = ctx.freshName("newArraySize")
+  val newArray = ctx.freshName("newArray")
+  val i = ctx.freshName("i")
+  val iPlus1 = s"$i+1"
+  val zero = "0"
+  val allocation = CodeGenerator.createArrayData(
+newArray,
+elementType,
+newArraySize,

Review Comment:
   do we really need this variable? can we just pass `s"$arr.numElements() + 
1"`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] xinrong-meng commented on pull request #40376: [SPARK-42756][CONNECT][PYTHON] Helper function to convert proto literal to value in Python Client

2023-03-12 Thread via GitHub


xinrong-meng commented on PR #40376:
URL: https://github.com/apache/spark/pull/40376#issuecomment-1465422914

   Do we intentionally ignore `YearMonthIntervalType`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #39624: [SPARK-42101][SQL] Make AQE support InMemoryTableScanExec

2023-03-12 Thread via GitHub


cloud-fan closed pull request #39624: [SPARK-42101][SQL] Make AQE support 
InMemoryTableScanExec
URL: https://github.com/apache/spark/pull/39624


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-12 Thread via GitHub


beliefer commented on PR #40359:
URL: https://github.com/apache/spark/pull/40359#issuecomment-1465460479

   > thanks, merged to master! can you open a backport PR for 3.4?
   
   Thank you! I will create it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #40372: [SPARK-42752][PYSPARK][SQL] Make PySpark exceptions printable during initialization

2023-03-12 Thread via GitHub


zhengruifeng commented on PR #40372:
URL: https://github.com/apache/spark/pull/40372#issuecomment-1465459996

   cc @itholic @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on a diff in pull request #40375: [SPARK-42755][CONNECT] Factor literal value conversion out to `connect-common`

2023-03-12 Thread via GitHub


beliefer commented on code in PR #40375:
URL: https://github.com/apache/spark/pull/40375#discussion_r1133416489


##
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralExpressionProtoConverter.scala:
##
@@ -192,8 +173,7 @@ object LiteralValueProtoConverter {
 } else if (elementType.hasArray) {
   makeArrayData(v => toArrayData(v.getArray))
 } else {
-  throw InvalidPlanInput(s"Unsupported Literal Type: $elementType)")
+  throw new UnsupportedOperationException(s"Unsupported Literal Type: 
$elementType)")

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40387: [SPARK-42764][K8S] Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread via GitHub


dongjoon-hyun commented on code in PR #40387:
URL: https://github.com/apache/spark/pull/40387#discussion_r1133444982


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala:
##
@@ -82,7 +82,7 @@ private[spark] object KubernetesExecutorBackend extends 
Logging {
 clientMode = true)
 
   var driver: RpcEndpointRef = null
-  val nTries = 3
+  val nTries = 
sys.env.getOrElse("EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS", 3).toInt

Review Comment:
   Thank you for review. This is an internal parameter like `.internal` config. 
However, this code path is invoked to get `SparkConf`. So, we cannot use 
`SparkConf` value. We may document this env variable later in some debug 
section.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on pull request #39037: [SPARK-41214][SQL] Fix AQE cache does not update plan and metrics

2023-03-12 Thread via GitHub


ulysses-you commented on PR #39037:
URL: https://github.com/apache/spark/pull/39037#issuecomment-1465513263

   close this in favor of https://github.com/apache/spark/pull/39624


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you closed pull request #39037: [SPARK-41214][SQL] Fix AQE cache does not update plan and metrics

2023-03-12 Thread via GitHub


ulysses-you closed pull request #39037: [SPARK-41214][SQL] Fix AQE cache does 
not update plan and metrics
URL: https://github.com/apache/spark/pull/39037


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-12 Thread via GitHub


cloud-fan commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1133448219


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##
@@ -1399,6 +1399,151 @@ case class ArrayContains(left: Expression, right: 
Expression)
 copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array, element) - Add the element at the beginning of the array 
passed as first
+  argument. Type of element should be similar to type of the elements of 
the array.

Review Comment:
   ```suggestion
 argument. Type of element should be the same to the type of the 
elements of the array.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40390: [SPARK-42768][SQL] Enable cached plan apply AQE by default

2023-03-12 Thread via GitHub


dongjoon-hyun commented on code in PR #40390:
URL: https://github.com/apache/spark/pull/40390#discussion_r1133456778


##
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##
@@ -1493,15 +1493,14 @@ object SQLConf {
 
   val CAN_CHANGE_CACHED_PLAN_OUTPUT_PARTITIONING =
 buildConf("spark.sql.optimizer.canChangeCachedPlanOutputPartitioning")
-  .internal()

Review Comment:
   BTW, you don't need to expose this. As you see in the most legacy configs, 
`.internal()` is fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40376: [SPARK-42756][CONNECT][PYTHON] Helper function to convert proto literal to value in Python Client

2023-03-12 Thread via GitHub


WeichenXu123 commented on code in PR #40376:
URL: https://github.com/apache/spark/pull/40376#discussion_r1133366703


##
python/pyspark/sql/connect/expressions.py:
##
@@ -308,6 +308,43 @@ def _infer_type(cls, value: Any) -> DataType:
 def _from_value(cls, value: Any) -> "LiteralExpression":
 return LiteralExpression(value=value, 
dataType=LiteralExpression._infer_type(value))
 
+@classmethod
+def _to_value(cls, literal: "proto.Expression.Literal") -> Any:
+if literal.HasField("null"):
+return None

Review Comment:
   Shall we raise error in this case ?



##
python/pyspark/sql/connect/expressions.py:
##
@@ -308,6 +308,43 @@ def _infer_type(cls, value: Any) -> DataType:
 def _from_value(cls, value: Any) -> "LiteralExpression":
 return LiteralExpression(value=value, 
dataType=LiteralExpression._infer_type(value))
 
+@classmethod
+def _to_value(cls, literal: "proto.Expression.Literal") -> Any:
+if literal.HasField("null"):
+return None
+elif literal.HasField("binary"):
+return literal.binary
+elif literal.HasField("boolean"):
+return literal.boolean
+elif literal.HasField("byte"):
+return literal.byte
+elif literal.HasField("short"):
+return literal.short
+elif literal.HasField("integer"):
+return literal.integer
+elif literal.HasField("long"):
+return literal.long
+elif literal.HasField("float"):
+return literal.float
+elif literal.HasField("double"):
+return literal.double
+elif literal.HasField("decimal"):
+return decimal.Decimal(literal.decimal.value)
+elif literal.HasField("string"):
+return literal.string
+elif literal.HasField("date"):
+return DateType().fromInternal(literal.date)
+elif literal.HasField("timestamp"):
+return TimestampType().fromInternal(literal.timestamp)
+elif literal.HasField("timestamp_ntz"):
+return TimestampNTZType().fromInternal(literal.timestamp_ntz)
+elif literal.HasField("day_time_interval"):
+return 
DayTimeIntervalType().fromInternal(literal.day_time_interval)
+elif literal.HasField("array"):
+return [LiteralExpression._to_value(v) for v in 
literal.array.elements]

Review Comment:
   Q: shall we verify the element_type field ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a diff in pull request #40387: [SPARK-42764][K8S] Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread via GitHub


viirya commented on code in PR #40387:
URL: https://github.com/apache/spark/pull/40387#discussion_r1133366833


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala:
##
@@ -82,7 +82,7 @@ private[spark] object KubernetesExecutorBackend extends 
Logging {
 clientMode = true)
 
   var driver: RpcEndpointRef = null
-  val nTries = 3
+  val nTries = 
sys.env.getOrElse("EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS", 3).toInt

Review Comment:
   Do we need to document this parameter? Or it is just one internal parameter 
for test purpose only?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng closed pull request #40382: [SPARK-42679][CONNECT][PYTHON] createDataFrame doesn't work with non-nullable schema

2023-03-12 Thread via GitHub


zhengruifeng closed pull request #40382: [SPARK-42679][CONNECT][PYTHON] 
createDataFrame doesn't work with non-nullable schema
URL: https://github.com/apache/spark/pull/40382


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #40382: [SPARK-42679][CONNECT][PYTHON] createDataFrame doesn't work with non-nullable schema

2023-03-12 Thread via GitHub


zhengruifeng commented on PR #40382:
URL: https://github.com/apache/spark/pull/40382#issuecomment-1465447635

   merged to master/branch-3.4


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan closed pull request #40359: [SPARK-42740][SQL] Fix the bug that pushdown offset or paging is invalid for some built-in dialect

2023-03-12 Thread via GitHub


cloud-fan closed pull request #40359: [SPARK-42740][SQL] Fix the bug that 
pushdown offset or paging is invalid for some built-in dialect
URL: https://github.com/apache/spark/pull/40359


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-12 Thread via GitHub


cloud-fan commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1133450131


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##
@@ -1399,6 +1399,151 @@ case class ArrayContains(left: Expression, right: 
Expression)
 copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array, element) - Add the element at the beginning of the array 
passed as first
+  argument. Type of element should be similar to type of the elements of 
the array.

Review Comment:
   We don't need to document the type coercion behavior in each function. We 
have a dedicated doc to explain what "similar" type is: 
https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#type-promotion-and-precedence



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40388: [SPARK-42765][CONNECT][PYTHON] Regulate the import path of `pandas_udf`

2023-03-12 Thread via GitHub


dongjoon-hyun commented on code in PR #40388:
URL: https://github.com/apache/spark/pull/40388#discussion_r1133455875


##
python/pyspark/sql/connect/functions.py:
##
@@ -2467,7 +2467,7 @@ def udf(
 
 
 def pandas_udf(*args: Any, **kwargs: Any) -> None:

Review Comment:
   +1 for @zhengruifeng 's comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #40367: [SPARK-42747][ML] Fix incorrect internal status of LoR and AFT

2023-03-12 Thread via GitHub


zhengruifeng commented on PR #40367:
URL: https://github.com/apache/spark/pull/40367#issuecomment-1465353652

   @srowen Thank you for the reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40387: [SPARK-42764][K8S] Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend

2023-03-12 Thread via GitHub


dongjoon-hyun commented on code in PR #40387:
URL: https://github.com/apache/spark/pull/40387#discussion_r1133357991


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala:
##
@@ -82,7 +82,7 @@ private[spark] object KubernetesExecutorBackend extends 
Logging {
 clientMode = true)
 
   var driver: RpcEndpointRef = null
-  val nTries = 3
+  val nTries = 
sys.env.getOrElse("EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS", 3)

Review Comment:
   This name came from the name of this RPCEnv.
   
https://github.com/apache/spark/blob/71a54f0b4456028226d0867081e6becd4935d33f/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala#L75



##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala:
##
@@ -82,7 +82,7 @@ private[spark] object KubernetesExecutorBackend extends 
Logging {
 clientMode = true)
 
   var driver: RpcEndpointRef = null
-  val nTries = 3
+  val nTries = 
sys.env.getOrElse("EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS", 3)

Review Comment:
   This env name came from the name of this RPCEnv.
   
https://github.com/apache/spark/blob/71a54f0b4456028226d0867081e6becd4935d33f/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesExecutorBackend.scala#L75



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #40388: [SPARK-42765][CONNECT][PYTHON] Regulate the import path of `pandas_udf`

2023-03-12 Thread via GitHub


zhengruifeng commented on code in PR #40388:
URL: https://github.com/apache/spark/pull/40388#discussion_r1133413599


##
python/pyspark/sql/connect/functions.py:
##
@@ -2467,7 +2467,7 @@ def udf(
 
 
 def pandas_udf(*args: Any, **kwargs: Any) -> None:

Review Comment:
   what about adding a few comments here? So that the contributors will be 
aware that this is already implemented



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-12 Thread via GitHub


cloud-fan commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1465459756

   The auto-generated alias name is fragile and we are trying to improve it at 
https://github.com/apache/spark/pull/40126
   
   Can you give some examples of how the new update changes the alias name? If 
it's not reasonable, we should keep the previous code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] sunchao commented on pull request #39950: [SPARK-42388][SQL] Avoid parquet footer reads twice when no filters in vectorized reader

2023-03-12 Thread via GitHub


sunchao commented on PR #39950:
URL: https://github.com/apache/spark/pull/39950#issuecomment-1465508828

   > Sorry, it might be necessary to read footer twice if having filters. We 
should read schema in footer meta first to get which filters need to be pushed 
down. After that we set pushdown info 
((https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L261)
 and read filtered RowGroups with the filter configuration.
   
   Hmm why? when we read the footer for the first time, the result already 
contain all the row groups. We just need to pass these to `ParquetFileReader`, 
which will apply the filters we pushed down on these row groups and return a 
list of filtered ones. See 
[here](https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L769).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-12 Thread via GitHub


shrprasa commented on PR #40258:
URL: https://github.com/apache/spark/pull/40258#issuecomment-1465517104

   Gentle ping @dongjoon-hyun @mridulm @HyukjinKwon @yaooqinn Can you please 
review this PR or direct it to someone who is aware of this code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] shrprasa commented on pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-12 Thread via GitHub


shrprasa commented on PR #40128:
URL: https://github.com/apache/spark/pull/40128#issuecomment-1465517975

   Gentle ping @holdenk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] holdenk commented on a diff in pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-12 Thread via GitHub


holdenk commented on code in PR #40128:
URL: https://github.com/apache/spark/pull/40128#discussion_r1133449379


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala:
##
@@ -143,6 +144,9 @@ private[spark] class Client(
 logError("Please check \"kubectl auth can-i create [resource]\" 
first." +
   " It should be yes. And please also check your feature step 
implementation.")
 kubernetesClient.resourceList(preKubernetesResources: _*).delete()
+// register shutdownhook for cleaning up the upload dir only

Review Comment:
   ah interesting. What about if we only register in client mode?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-12 Thread via GitHub


cloud-fan commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1133454113


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##
@@ -1399,6 +1399,151 @@ case class ArrayContains(left: Expression, right: 
Expression)
 copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array, element) - Add the element at the beginning of the array 
passed as first
+  argument. Type of element should be similar to type of the elements of 
the array.
+  Null element is also prepended to the array. But if the array passed is 
NULL
+  output is NULL
+""",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array('b', 'd', 'c', 'a'), 'd');
+   ["d","b","d","c","a"]
+  > SELECT _FUNC_(array(1, 2, 3, null), null);
+   [null,1,2,3,null]
+  > SELECT _FUNC_(CAST(null as Array), 2);
+   NULL
+  """,
+  group = "array_funcs",
+  since = "3.5.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+with ImplicitCastInputTypes
+with ComplexTypeMergingExpression
+with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  @transient protected lazy val elementType: DataType =
+inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def eval(input: InternalRow): Any = {
+val value1 = left.eval(input)
+if (value1 == null) {
+  null
+} else {
+  val value2 = right.eval(input)
+  nullSafeEval(value1, value2)
+}
+  }
+  override def nullSafeEval(arr: Any, elementData: Any): Any = {
+val arrayData = arr.asInstanceOf[ArrayData]
+val numberOfElements = arrayData.numElements() + 1
+if (numberOfElements > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+  throw 
QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+}
+val finalData = new Array[Any](numberOfElements)
+finalData.update(0, elementData)
+arrayData.foreach(elementType, (i: Int, v: Any) => finalData.update(i + 1, 
v))
+new GenericArrayData(finalData)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val f = (arr: String, value: String) => {
+  val newArraySize = ctx.freshName("newArraySize")
+  val newArray = ctx.freshName("newArray")
+  val i = ctx.freshName("i")
+  val iPlus1 = s"$i+1"
+  val zero = "0"
+  val allocation = CodeGenerator.createArrayData(
+newArray,
+elementType,
+newArraySize,
+s" $prettyName failed.")
+  val assignment =
+CodeGenerator.createArrayAssignment(newArray, elementType, arr, 
iPlus1, i, false)
+  val newElemAssignment =
+CodeGenerator.setArrayElement(newArray, elementType, zero, value, 
Some(rightGen.isNull))
+  s"""
+ |int $newArraySize = $arr.numElements() + 1;
+ |$allocation
+ |$newElemAssignment
+ |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+ |  $assignment
+ |}
+ |${ev.value} = $newArray;
+ |""".stripMargin
+}
+val resultCode = f(leftGen.value, rightGen.value)
+if(nullable) {
+  val nullSafeEval = leftGen.code + rightGen.code + 
ctx.nullSafeExec(nullable, leftGen.isNull) {
+s"""
+   |${ev.isNull} = false;
+   |${resultCode}
+   |""".stripMargin
+  }
+  ev.copy(code =
+code"""
+boolean ${ev.isNull} = true;
+${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+$nullSafeEval
+  """)

Review Comment:
   please please the same code style as
   ```
   s"""
  |...
  |...
  |""". stripMargin
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a diff in pull request #38947: [SPARK-41233][SQL][PYTHON] Add `array_prepend` function

2023-03-12 Thread via GitHub


cloud-fan commented on code in PR #38947:
URL: https://github.com/apache/spark/pull/38947#discussion_r1133454186


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala:
##
@@ -1399,6 +1399,151 @@ case class ArrayContains(left: Expression, right: 
Expression)
 copy(left = newLeft, right = newRight)
 }
 
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array, element) - Add the element at the beginning of the array 
passed as first
+  argument. Type of element should be similar to type of the elements of 
the array.
+  Null element is also prepended to the array. But if the array passed is 
NULL
+  output is NULL
+""",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array('b', 'd', 'c', 'a'), 'd');
+   ["d","b","d","c","a"]
+  > SELECT _FUNC_(array(1, 2, 3, null), null);
+   [null,1,2,3,null]
+  > SELECT _FUNC_(CAST(null as Array), 2);
+   NULL
+  """,
+  group = "array_funcs",
+  since = "3.5.0")
+case class ArrayPrepend(left: Expression, right: Expression)
+  extends BinaryExpression
+with ImplicitCastInputTypes
+with ComplexTypeMergingExpression
+with QueryErrorsBase {
+
+  override def nullable: Boolean = left.nullable
+
+  @transient protected lazy val elementType: DataType =
+inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def eval(input: InternalRow): Any = {
+val value1 = left.eval(input)
+if (value1 == null) {
+  null
+} else {
+  val value2 = right.eval(input)
+  nullSafeEval(value1, value2)
+}
+  }
+  override def nullSafeEval(arr: Any, elementData: Any): Any = {
+val arrayData = arr.asInstanceOf[ArrayData]
+val numberOfElements = arrayData.numElements() + 1
+if (numberOfElements > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) {
+  throw 
QueryExecutionErrors.concatArraysWithElementsExceedLimitError(numberOfElements)
+}
+val finalData = new Array[Any](numberOfElements)
+finalData.update(0, elementData)
+arrayData.foreach(elementType, (i: Int, v: Any) => finalData.update(i + 1, 
v))
+new GenericArrayData(finalData)
+  }
+  override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): 
ExprCode = {
+val leftGen = left.genCode(ctx)
+val rightGen = right.genCode(ctx)
+val f = (arr: String, value: String) => {
+  val newArraySize = ctx.freshName("newArraySize")
+  val newArray = ctx.freshName("newArray")
+  val i = ctx.freshName("i")
+  val iPlus1 = s"$i+1"
+  val zero = "0"
+  val allocation = CodeGenerator.createArrayData(
+newArray,
+elementType,
+newArraySize,
+s" $prettyName failed.")
+  val assignment =
+CodeGenerator.createArrayAssignment(newArray, elementType, arr, 
iPlus1, i, false)
+  val newElemAssignment =
+CodeGenerator.setArrayElement(newArray, elementType, zero, value, 
Some(rightGen.isNull))
+  s"""
+ |int $newArraySize = $arr.numElements() + 1;
+ |$allocation
+ |$newElemAssignment
+ |for (int $i = 0; $i < $arr.numElements(); $i ++) {
+ |  $assignment
+ |}
+ |${ev.value} = $newArray;
+ |""".stripMargin
+}
+val resultCode = f(leftGen.value, rightGen.value)
+if(nullable) {
+  val nullSafeEval = leftGen.code + rightGen.code + 
ctx.nullSafeExec(nullable, leftGen.isNull) {
+s"""
+   |${ev.isNull} = false;
+   |${resultCode}
+   |""".stripMargin
+  }
+  ev.copy(code =
+code"""
+boolean ${ev.isNull} = true;
+${CodeGenerator.javaType(dataType)} ${ev.value} = 
${CodeGenerator.defaultValue(dataType)};
+$nullSafeEval
+  """)
+} else {
+  ev.copy(code =
+code"""

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org