Re: [PR] [SPARK-48116][INFRA][FOLLOW-UP] Simplify the build with fixing the if condition [spark]

2024-05-03 Thread via GitHub


HyukjinKwon commented on PR #46381:
URL: https://github.com/apache/spark/pull/46381#issuecomment-2094024562

   ![Screenshot 2024-05-04 at 2 21 43 
PM](https://github.com/apache/spark/assets/6477701/1d669562-fbf2-4666-8554-7816757f127a)
   
   vs
   
   ![Screenshot 2024-05-04 at 2 22 15 
PM](https://github.com/apache/spark/assets/6477701/91afb74f--4832-bff9-7f067b0def4f)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA][FOLLOW-UP] Simplify the build with fixing the if condition [spark]

2024-05-03 Thread via GitHub


HyukjinKwon commented on PR #46381:
URL: https://github.com/apache/spark/pull/46381#issuecomment-2094022030

   Tested at 
https://github.com/HyukjinKwon/spark/actions/runs/8948215777/job/24581131687
   
   cc @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA][FOLLOWUP] Simplify the build with fixing the if condition [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46380:
URL: https://github.com/apache/spark/pull/46380#issuecomment-2094017688

   Oh, thank you for the fast recovery. Looking forward to seeing the new try.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA][FOLLOWUP] Simplify the build with fixing the if condition [spark]

2024-05-03 Thread via GitHub


HyukjinKwon commented on PR #46380:
URL: https://github.com/apache/spark/pull/46380#issuecomment-2094016895

   Reverted. I will make a new try with trying within my own fork. I 
cherry-picked yours and merged it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA][FOLLOWUP] Simplify the build with fixing the if condition [spark]

2024-05-03 Thread via GitHub


HyukjinKwon closed pull request #46380: [SPARK-48116][INFRA][FOLLOWUP] Simplify 
the build with fixing the if condition
URL: https://github.com/apache/spark/pull/46380


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA][FOLLOWUP] Simplify the build with fixing the if condition [spark]

2024-05-03 Thread via GitHub


HyukjinKwon commented on PR #46380:
URL: https://github.com/apache/spark/pull/46380#issuecomment-2094015219

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48127][INFRA] Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun closed pull request #46376: [SPARK-48127][INFRA] Fix 
`dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules
URL: https://github.com/apache/spark/pull/46376


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48127][INFRA] Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46376:
URL: https://github.com/apache/spark/pull/46376#issuecomment-2094014606

   Thank you, @viirya !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA][FOLLOWUP] Fix `if` statement to check repository [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46379:
URL: https://github.com/apache/spark/pull/46379#issuecomment-2094014537

   Of course.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA][FOLLOWUP] Fix `if` statement to check repository [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun closed pull request #46379: [SPARK-48116][INFRA][FOLLOWUP] Fix 
`if` statement to check repository
URL: https://github.com/apache/spark/pull/46379


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48127][INFRA] Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules [spark]

2024-05-03 Thread via GitHub


viirya commented on PR #46376:
URL: https://github.com/apache/spark/pull/46376#issuecomment-2094014506

   Good catch!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA] Simplify the build with fixing the if condition [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46380:
URL: https://github.com/apache/spark/pull/46380#issuecomment-2094014497

   Feel free to merge at any time when this PR is ready ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA][FOLLOWUP] Fix `if` statement to check repository [spark]

2024-05-03 Thread via GitHub


HyukjinKwon commented on PR #46379:
URL: https://github.com/apache/spark/pull/46379#issuecomment-2094014303

   @dongjoon-hyun Can we maybe just merge 
https://github.com/apache/spark/pull/46380 instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA] Simplify the build with fixing the if condition [spark]

2024-05-03 Thread via GitHub


HyukjinKwon commented on PR #46380:
URL: https://github.com/apache/spark/pull/46380#issuecomment-2094013225

   cc @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48127][INFRA] Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46376:
URL: https://github.com/apache/spark/pull/46376#issuecomment-2094007120

   Could you review this PR, @viirya ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48127][INFRA] Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46376:
URL: https://github.com/apache/spark/pull/46376#issuecomment-2094006942

   All tests passed.
   
   https://github.com/apache/spark/assets/9700541/0865ffb8-9bfa-467b-9bde-9bd7ef703b8f;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-48116][INFRA][FOLLOWUP] Fix `if` statement to check repository [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun opened a new pull request, #46379:
URL: https://github.com/apache/spark/pull/46379

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA] Run `pyspark-pandas*` only in PR builder and Daily Python CIs [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46367:
URL: https://github.com/apache/spark/pull/46367#issuecomment-2094000682

   Merged to master~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA] Run `pyspark-pandas*` only in PR builder and Daily Python CIs [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun closed pull request #46367: [SPARK-48116][INFRA] Run 
`pyspark-pandas*` only in PR builder and Daily Python CIs
URL: https://github.com/apache/spark/pull/46367


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA] Run `pyspark-pandas*` only in PR builder and Daily Python CIs [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46367:
URL: https://github.com/apache/spark/pull/46367#issuecomment-2094000544

   Thank you for the suggestion. Yes, I hope to do the deduplication later~
   Let me merge this to monitor the reduced usage. Thank you, @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48056][CONNECT][PYTHON] Re-execute plan if a SESSION_NOT_FOUND error is raised and no partial response was received [spark]

2024-05-03 Thread via GitHub


HyukjinKwon commented on PR #46297:
URL: https://github.com/apache/spark/pull/46297#issuecomment-2093996045

   Thanks for fixing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA] Run `pyspark-pandas*` only in PR builder and Daily Python CIs [spark]

2024-05-03 Thread via GitHub


HyukjinKwon commented on PR #46367:
URL: https://github.com/apache/spark/pull/46367#issuecomment-2093995678

   LGTM,one comment but I can followup.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA] Run `pyspark-pandas*` only in PR builder and Daily Python CIs [spark]

2024-05-03 Thread via GitHub


HyukjinKwon commented on code in PR #46367:
URL: https://github.com/apache/spark/pull/46367#discussion_r1589877623


##
.github/workflows/build_and_test.yml:
##
@@ -347,12 +353,139 @@ jobs:
 pyspark-core, pyspark-errors, pyspark-streaming
   - >-
 pyspark-mllib, pyspark-ml, pyspark-ml-connect
+  - >-

Review Comment:
   I'm fine with sikpping itself but wonder if we can deduplicate this by 
directly adding an if to the module based on the JSON input above. I can follow 
up.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA] Run `pyspark-pandas*` only in PR builder and Daily Python CIs [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46367:
URL: https://github.com/apache/spark/pull/46367#issuecomment-2093993468

   Could you review this PR, @gengliangwang ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48127][INFRA] Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46376:
URL: https://github.com/apache/spark/pull/46376#issuecomment-2093993155

   cc @panbingkun and @gengliangwang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48056][PYTHON][CONNECT][FOLLOW-UP] Use `assertEqual` instead of `assertEquals` for Python 3.12 [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun closed pull request #46377: 
[SPARK-48056][PYTHON][CONNECT][FOLLOW-UP] Use `assertEqual` instead of 
`assertEquals` for Python 3.12
URL: https://github.com/apache/spark/pull/46377


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48056][PYTHON][CONNECT][FOLLOW-UP] Use `assertEqual` instead of `assertEquals` for Python 3.12 [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46377:
URL: https://github.com/apache/spark/pull/46377#issuecomment-2093992824

   Let me merge this to recover the Python CI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47969][PYTHON][TESTS][FOLLOWUP] Make Test `test_creation_index` deterministic [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46378:
URL: https://github.com/apache/spark/pull/46378#issuecomment-2093991946

   Merged to master.
   Thank you, @zhengruifeng .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47969][PYTHON][TESTS][FOLLOWUP] Make Test `test_creation_index` deterministic [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun closed pull request #46378: 
[SPARK-47969][PYTHON][TESTS][FOLLOWUP]  Make Test `test_creation_index` 
deterministic
URL: https://github.com/apache/spark/pull/46378


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48056][CONNECT][PYTHON] Re-execute plan if a SESSION_NOT_FOUND error is raised and no partial response was received [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46297:
URL: https://github.com/apache/spark/pull/46297#issuecomment-2093955454

   This seems to be missed in another PR too.
   - #45242


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-47969][PYTHON][TESTS][FOLLOWUP] Make Test `test_creation_index` deterministic - followup [spark]

2024-05-03 Thread via GitHub


zhengruifeng opened a new pull request, #46378:
URL: https://github.com/apache/spark/pull/46378

   ### What changes were proposed in this pull request?
   followup https://github.com/apache/spark/pull/46200
   
   
   ### Why are the changes needed?
   there is still non-deterministic codes in this test:
   ```
   Traceback (most recent call last):
 File "/home/jenkins/python/pyspark/testing/pandasutils.py", line 91, in 
_assert_pandas_equal
   assert_frame_equal(
 File 
"/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py",
 line 1257, in assert_frame_equal
   assert_index_equal(
 File 
"/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py",
 line 407, in assert_index_equal
   raise_assert_detail(obj, msg, left, right)
 File 
"/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py",
 line 665, in raise_assert_detail
   raise AssertionError(msg)
   AssertionError: DataFrame.index are different
   DataFrame.index values are different (75.0 %)
   [left]:  DatetimeIndex(['2022-09-02', '2022-09-03', '2022-08-31', 
'2022-09-05'], dtype='datetime64[ns]', freq=None)
   [right]: DatetimeIndex(['2022-08-31', '2022-09-02', '2022-09-03', 
'2022-09-05'], dtype='datetime64[ns]', freq=None)
   
   ```
   
   
   ### Does this PR introduce _any_ user-facing change?
   no, test only
   
   
   ### How was this patch tested?
   ci
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-48056][PYTHON][CONNECT][FOLLOW-UP] Use `assertEqual` instead of `assertEquals` for Python 3.12 [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun opened a new pull request, #46377:
URL: https://github.com/apache/spark/pull/46377

   … 
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA] Run `pyspark-pandas*` only in PR builder and Daily Python CIs [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46367:
URL: https://github.com/apache/spark/pull/46367#issuecomment-2093951179

   ASF Infra team requests GitHub Action usage control to Apache Spark PMC.
   - https://lists.apache.org/thread/rgy1cg17tkd3yox7qfq87ht12sqclkbg


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA] Run `pyspark-pandas*` only in PR builder and Daily Python CIs [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46367:
URL: https://github.com/apache/spark/pull/46367#issuecomment-2093950583

   Could you review this PR, @zhengruifeng ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on code in PR #46022:
URL: https://github.com/apache/spark/pull/46022#discussion_r1589845846


##
connector/profiler/src/main/scala/org/apache/spark/executor/profiler/ExecutorJVMProfiler.scala:
##
@@ -25,7 +25,8 @@ import org.apache.hadoop.fs.{FileSystem, FSDataOutputStream, 
Path}
 
 import org.apache.spark.SparkConf
 import org.apache.spark.deploy.SparkHadoopUtil
-import org.apache.spark.internal.Logging
+import org.apache.spark.internal.LogKey.PATH
+import org.apache.spark.internal.{Logging, MDC}

Review Comment:
   Unfortunately, this import ordering issue was missed because 
`dev/scalastyle` didn't include this module.
   
   Here is the fix for `dev/scalastyle` and this.
   - https://github.com/apache/spark/pull/46376



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on code in PR #46022:
URL: https://github.com/apache/spark/pull/46022#discussion_r1589845864


##
connector/profiler/src/main/scala/org/apache/spark/executor/profiler/ExecutorProfilerPlugin.scala:
##
@@ -23,7 +23,8 @@ import scala.util.Random
 
 import org.apache.spark.SparkConf
 import org.apache.spark.api.plugin.{DriverPlugin, ExecutorPlugin, 
PluginContext, SparkPlugin}
-import org.apache.spark.internal.Logging
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+import org.apache.spark.internal.{Logging, MDC}

Review Comment:
   ditto. Import ordering issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48058][SPARK-43727][PYTHON][CONNECT] `UserDefinedFunction.returnType` parse the DDL string [spark]

2024-05-03 Thread via GitHub


zhengruifeng commented on PR #46300:
URL: https://github.com/apache/spark/pull/46300#issuecomment-2093941093

   thanks @HyukjinKwon and @dongjoon-hyun for reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-48127][INFRA] Fix `dev/scalastyle` to check `hadoop-cloud` and `jvm-profiler` modules [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun opened a new pull request, #46376:
URL: https://github.com/apache/spark/pull/46376

   … 
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-03 Thread via GitHub


panbingkun commented on PR #46288:
URL: https://github.com/apache/spark/pull/46288#issuecomment-2093917835

   Currently, this GA has turned green.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48049][BUILD] Upgrade Scala to 2.13.14 [spark]

2024-05-03 Thread via GitHub


panbingkun commented on PR #46288:
URL: https://github.com/apache/spark/pull/46288#issuecomment-2093915493

   > Thank you for the analysis and sharing, @panbingkun and @LuciferYang .
   
   
   
   > [INFO] Restricted to JDK 17 yet org.jline:jline:jar:3.25.1:compile 
contains org/jline/terminal/impl/ffm/CLibrary$termios.class targeted to 65.-257
   > 
   > Oh... It seems that `CLibrary$termios.class` is not compatible with Java 8.
   > 
   > In jline 3.25.1, CLibrary references some implementations that were moved 
to the `java.lang.foreign` package in Java 21, but in Java 17, they were still 
in the `jdk.incubator.foreign` package.
   > 
   > However, in jline 3.24.1, CLibrary also import classes in 
`java.lang.foreign` package, it's just that the final jline jar does not 
include the `terminal-ffm` module, so it won't trigger a compilation error from 
the `maven-enforcer-plugin`.
   > 
   > So it seems that using jline 3.25.1 should not pose any issues. It's 
unclear whether it's possible to workaround this by modifying the rules of the 
`maven-enforcer-plugin`. @panbingkun
   
   Let me try, We should be able to refer to the following:
   https://github.com/mojohaus/extra-enforcer-rules/issues/46
   https://github.com/apache/spark/assets/15246973/db42b4e1-650e-44c2-9cfb-d9f6dc9914b8;>
   
   But I'm not sure we really want to do this? 
   We have 2 options:
   1. Use `org.jline:jline:3.14.1` (exclude `org.jline:jline:3.15.1` from 
`org.scala-lang:scala-compiler:jar:2.13.14`)
   2. Use `org.jline:jline:3.15.1` (Modify the rule `enforceBytecodeVersion` by 
configuring `ignoreClasses ` and bypass it ) 
   WDYT @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48059][CORE] Implement the structured log framework on the java side [spark]

2024-05-03 Thread via GitHub


gengliangwang commented on PR #46301:
URL: https://github.com/apache/spark/pull/46301#issuecomment-2093912796

   @panbingkun Awesome, thanks for the work!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48059][CORE] Implement the structured log framework on the java side [spark]

2024-05-03 Thread via GitHub


panbingkun commented on PR #46301:
URL: https://github.com/apache/spark/pull/46301#issuecomment-2093912354

   > Thanks, merging to master
   
   Thank you very much.
   Based on this, I will do log migration and auxiliary work on the java side.
   (eg, after the migration is completed, we can add a `rule` in `checkstyle` 
that prohibits importing `org.slf4j.Logger` and `org.slf4j.LoggerFactory`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47578][CORE] Migrate logWarning with variables to structured logging framework [spark]

2024-05-03 Thread via GitHub


dtenedor commented on PR #46309:
URL: https://github.com/apache/spark/pull/46309#issuecomment-2093911859

   The two remaining test failures appear to be flaky/unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46813][CORE] Don't set the executor id to "driver" when SparkContext is created by the executor side [spark]

2024-05-03 Thread via GitHub


github-actions[bot] commented on PR #44855:
URL: https://github.com/apache/spark/pull/44855#issuecomment-2093908111

   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [WIP][SQL] Avoid parquet footer reads twice [spark]

2024-05-03 Thread via GitHub


github-actions[bot] closed pull request #44853: [WIP][SQL] Avoid parquet footer 
reads twice
URL: https://github.com/apache/spark/pull/44853


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46738][CONNECT][PYTHON] Make the display of `cast` in `Regular Spark` and `Spark Connect` consistent [spark]

2024-05-03 Thread via GitHub


github-actions[bot] closed pull request #44829: [SPARK-46738][CONNECT][PYTHON] 
Make the display of `cast` in `Regular Spark` and `Spark Connect` consistent
URL: https://github.com/apache/spark/pull/44829


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-46714][SQL] Overwrite a partition with custom location [spark]

2024-05-03 Thread via GitHub


github-actions[bot] closed pull request #44725: [SPARK-46714][SQL] Overwrite a 
partition with custom location
URL: https://github.com/apache/spark/pull/44725


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-42669][CONNECT] Short circuit local relation RPCs [spark]

2024-05-03 Thread via GitHub


github-actions[bot] closed pull request #40782: [SPARK-42669][CONNECT] Short 
circuit local relation RPCs
URL: https://github.com/apache/spark/pull/40782


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47484][SQL] Allow trailing comma in column definition list [spark]

2024-05-03 Thread via GitHub


srielau commented on PR #45593:
URL: https://github.com/apache/spark/pull/45593#issuecomment-2093902007

   
   
   
   > Sorry for being late, @MaxGekk .
   > 
   > * Is this ANSI SQL syntax?
   > * Do you have any reference for this syntax in other references?
   
   I only know of this for the SELECT list, where it is popularized by duckDB 
and now Snowflake.
   This is NOT ANSI SQL syntax and I'm not in favor of starting down this path 
for two reasons:
   1. This will increase pressure on trailing commas in SELECT which we cannot 
do without a hard breaking change
   2. Generally I think "nothing" after a comma is associated with an unset 
value (i.e. a NULL). E.g. (,,) should, if we were to support this be parsed as 
(NULL, NULL, NULL).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47097][CONNECT][TESTS][FOLLOWUP] Increase timeout to `1 minute` for `interrupt tag` test [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46374:
URL: https://github.com/apache/spark/pull/46374#issuecomment-2093895463

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47097][CONNECT][TESTS][FOLLOWUP] Increase timeout to `1 minute` for `interrupt tag` test [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun closed pull request #46374: 
[SPARK-47097][CONNECT][TESTS][FOLLOWUP] Increase timeout to `1 minute` for 
`interrupt tag` test
URL: https://github.com/apache/spark/pull/46374


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47097][CONNECT][TESTS][FOLLOWUP] Increase timeout to `1 minute` for `interrupt tag` test [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46374:
URL: https://github.com/apache/spark/pull/46374#issuecomment-2093895254

   Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48123][Core] Provide a constant table schema for querying structured logs [spark]

2024-05-03 Thread via GitHub


gengliangwang commented on code in PR #46375:
URL: https://github.com/apache/spark/pull/46375#discussion_r1589812045


##
common/utils/src/main/scala/org/apache/spark/util/LogUtils.scala:
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+object LogUtils {
+  val LOG_SCHEMA = """|ts TIMESTAMP,

Review Comment:
   Updated



##
common/utils/src/main/scala/org/apache/spark/util/LogUtils.scala:
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+object LogUtils {
+  val LOG_SCHEMA = """|ts TIMESTAMP,
+| level STRING,

Review Comment:
   Thanks, updated



##
sql/core/src/test/scala/org/apache/spark/sql/LogQuerySuite.scala:
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.internal.{Logging, LogKeys, MDC}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.util.LogUtils.LOG_SCHEMA
+
+/**
+ * Test suite for querying Spark logs using SQL.
+ */
+class LogQuerySuite extends QueryTest
+with SharedSparkSession with BeforeAndAfterAll with Logging {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48123][Core] Provide a constant table schema for querying structured logs [spark]

2024-05-03 Thread via GitHub


gengliangwang commented on code in PR #46375:
URL: https://github.com/apache/spark/pull/46375#discussion_r1589811109


##
common/utils/src/main/scala/org/apache/spark/util/LogUtils.scala:
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+object LogUtils {

Review Comment:
   Sure, updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48123][Core] Provide a constant table schema for querying structured logs [spark]

2024-05-03 Thread via GitHub


ueshin commented on code in PR #46375:
URL: https://github.com/apache/spark/pull/46375#discussion_r1589805454


##
common/utils/src/main/scala/org/apache/spark/util/LogUtils.scala:
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+object LogUtils {
+  val LOG_SCHEMA = """|ts TIMESTAMP,
+| level STRING,

Review Comment:
   nit:
   
   ```suggestion
 val LOG_SCHEMA = """
   | ts TIMESTAMP,
   | level STRING,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47097][CONNECT][TESTS][FOLLOWUP] Increase timeout to 1 minute for `interrupt tag` test [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46374:
URL: https://github.com/apache/spark/pull/46374#issuecomment-2093886085

   Could you review this test PR, @gengliangwang ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48123][Core] Provide a constant table schema for querying structured logs [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on code in PR #46375:
URL: https://github.com/apache/spark/pull/46375#discussion_r1589807167


##
sql/core/src/test/scala/org/apache/spark/sql/LogQuerySuite.scala:
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.internal.{Logging, LogKeys, MDC}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.util.LogUtils.LOG_SCHEMA
+
+/**
+ * Test suite for querying Spark logs using SQL.
+ */
+class LogQuerySuite extends QueryTest
+with SharedSparkSession with BeforeAndAfterAll with Logging {

Review Comment:
   Please remove `with BeforeAndAfterAll` because we have it already.
   - 
https://github.com/apache/spark/blob/5c01f196afc3ba75f10c4aedf2c8405b6f59336a/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala#L226



##
sql/core/src/test/scala/org/apache/spark/sql/LogQuerySuite.scala:
##
@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.File
+
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.spark.internal.{Logging, LogKeys, MDC}
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.util.LogUtils.LOG_SCHEMA
+
+/**
+ * Test suite for querying Spark logs using SQL.
+ */
+class LogQuerySuite extends QueryTest
+with SharedSparkSession with BeforeAndAfterAll with Logging {

Review Comment:
   Please remove `with BeforeAndAfterAll` because we have it already.
   
   
https://github.com/apache/spark/blob/5c01f196afc3ba75f10c4aedf2c8405b6f59336a/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala#L226



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48123][Core] Provide a constant table schema for querying structured logs [spark]

2024-05-03 Thread via GitHub


ueshin commented on code in PR #46375:
URL: https://github.com/apache/spark/pull/46375#discussion_r1589805454


##
common/utils/src/main/scala/org/apache/spark/util/LogUtils.scala:
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+object LogUtils {
+  val LOG_SCHEMA = """|ts TIMESTAMP,
+| level STRING,

Review Comment:
   nit: Just my preference, you can ignore:
   
   ```suggestion
 val LOG_SCHEMA = """
   | ts TIMESTAMP,
   | level STRING,
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48059][CORE] Implement the structured log framework on the java side [spark]

2024-05-03 Thread via GitHub


gengliangwang closed pull request #46301: [SPARK-48059][CORE] Implement the 
structured log framework on the java side
URL: https://github.com/apache/spark/pull/46301


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48059][CORE] Implement the structured log framework on the java side [spark]

2024-05-03 Thread via GitHub


gengliangwang commented on PR #46301:
URL: https://github.com/apache/spark/pull/46301#issuecomment-2093883660

   Thanks, merging to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48059][CORE] Implement the structured log framework on the java side [spark]

2024-05-03 Thread via GitHub


gengliangwang commented on code in PR #46301:
URL: https://github.com/apache/spark/pull/46301#discussion_r1589805794


##
common/utils/src/test/java/org/apache/spark/util/LoggerSuiteBase.java:
##
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.util.List;
+
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.logging.log4j.Level;
+import org.junit.jupiter.api.Test;
+
+import org.apache.spark.internal.Logger;
+import org.apache.spark.internal.LogKeys;
+import org.apache.spark.internal.MDC;
+
+public abstract class LoggerSuiteBase {

Review Comment:
   nit: ideally, we can have a followup to reduce the duplicated test code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48123][Core] Provide a constant table schema for querying structured logs [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on code in PR #46375:
URL: https://github.com/apache/spark/pull/46375#discussion_r1589805435


##
common/utils/src/main/scala/org/apache/spark/util/LogUtils.scala:
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+object LogUtils {

Review Comment:
   Could you add `DeveloperApi` and `Since` tag officially?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48123][Core] Provide a constant table schema for querying structured logs [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on code in PR #46375:
URL: https://github.com/apache/spark/pull/46375#discussion_r1589805250


##
common/utils/src/main/scala/org/apache/spark/util/LogUtils.scala:
##
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.util
+
+object LogUtils {
+  val LOG_SCHEMA = """|ts TIMESTAMP,

Review Comment:
   This looks a little weird. Could you reformat this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48121][K8S] Promote `KubernetesDriverConf` to `DeveloperApi` [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46373:
URL: https://github.com/apache/spark/pull/46373#issuecomment-2093881734

   Merged to master for Apache Spark 4.0.0-preview.
   Thank you, @jiangzho .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48121][K8S] Promote `KubernetesDriverConf` to `DeveloperApi` [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun closed pull request #46373: [SPARK-48121][K8S] Promote 
`KubernetesDriverConf` to `DeveloperApi`
URL: https://github.com/apache/spark/pull/46373


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-48123][Core] Provide a constant table schema for querying structured logs [spark]

2024-05-03 Thread via GitHub


gengliangwang opened a new pull request, #46375:
URL: https://github.com/apache/spark/pull/46375

   
   
   ### What changes were proposed in this pull request?
   
   Providing a table schema LOG_SCHEMA, so that users can load structured logs 
with the following code:
   
   ```
   import org.apache.spark.util.LogUtils.LOG_SCHEMA
   
   val logDf = spark.read.schema(LOG_SCHEMA).json("path/to/logs")
   ```
   
   ### Why are the changes needed?
   
   Provide a convenient way to query Spark logs using Spark SQL.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   New UT
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47578][CORE] Migrate logWarning with variables to structured logging framework [spark]

2024-05-03 Thread via GitHub


dtenedor commented on code in PR #46309:
URL: https://github.com/apache/spark/pull/46309#discussion_r1589791573


##
core/src/test/scala/org/apache/spark/deploy/RPackageUtilsSuite.scala:
##
@@ -127,7 +127,7 @@ class RPackageUtilsSuite
   RPackageUtils.checkAndBuildRPackage(jar.getAbsolutePath, new 
BufferPrintStream,
 verbose = true)
   val output = lineBuffer.mkString("\n")
-  assert(output.contains(RPackageUtils.RJarDoc))
+  assert(output.contains("Building R package"))

Review Comment:
   Created https://issues.apache.org/jira/browse/SPARK-48122 to track it 
separately.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48121][K8S] Promote `KubernetesDriverConf` to `DeveloperApi` [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on code in PR #46373:
URL: https://github.com/apache/spark/pull/46373#discussion_r1589790988


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala:
##
@@ -78,7 +79,16 @@ private[spark] abstract class KubernetesConf(val sparkConf: 
SparkConf) {
   def getOption(key: String): Option[String] = sparkConf.getOption(key)
 }
 
-private[spark] class KubernetesDriverConf(
+
+/**
+ * :: DeveloperApi ::
+ *
+ * Used for K8s operations internally and Spark K8s operator.
+ */
+@Unstable
+@DeveloperApi
+@Since("4.0.0")

Review Comment:
   Ya, I hope we can make it stable from 4.0.0.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48121][K8S] Promote `KubernetesDriverConf` to `DeveloperApi` [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on code in PR #46373:
URL: https://github.com/apache/spark/pull/46373#discussion_r1589790586


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala:
##
@@ -78,7 +79,16 @@ private[spark] abstract class KubernetesConf(val sparkConf: 
SparkConf) {
   def getOption(key: String): Option[String] = sparkConf.getOption(key)
 }
 
-private[spark] class KubernetesDriverConf(
+
+/**
+ * :: DeveloperApi ::
+ *
+ * Used for K8s operations internally and Spark K8s operator.
+ */
+@Unstable
+@DeveloperApi
+@Since("4.0.0")

Review Comment:
   Is this correct?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47578][CORE] Migrate logWarning with variables to structured logging framework [spark]

2024-05-03 Thread via GitHub


dtenedor commented on code in PR #46309:
URL: https://github.com/apache/spark/pull/46309#discussion_r1589790627


##
core/src/test/scala/org/apache/spark/deploy/RPackageUtilsSuite.scala:
##
@@ -127,7 +127,7 @@ class RPackageUtilsSuite
   RPackageUtils.checkAndBuildRPackage(jar.getAbsolutePath, new 
BufferPrintStream,
 verbose = true)
   val output = lineBuffer.mkString("\n")
-  assert(output.contains(RPackageUtils.RJarDoc))
+  assert(output.contains("Building R package"))

Review Comment:
   You are right, this change is not too great.
   I reverted the changes to `RPackageUtils.scala` and this file for now.
   I will figure out how to get the R packages installed on my local machine 
and fix this file in a next step.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting [spark]

2024-05-03 Thread via GitHub


HeartSaVioR commented on PR #46351:
URL: https://github.com/apache/spark/pull/46351#issuecomment-2093861697

   > @HeartSaVioR - do we also need to port to older Spark versions ? 3.5 maybe 
?
   
   Ideally all non-EOL version lines, 3.5/3.4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48121][K8S] Promote `KubernetesDriverConf` to `DeveloperApi` [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on code in PR #46373:
URL: https://github.com/apache/spark/pull/46373#discussion_r1589790522


##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala:
##
@@ -78,7 +79,16 @@ private[spark] abstract class KubernetesConf(val sparkConf: 
SparkConf) {
   def getOption(key: String): Option[String] = sparkConf.getOption(key)
 }
 
-private[spark] class KubernetesDriverConf(
+

Review Comment:
   Please remove this redundant empty line addition.



##
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala:
##
@@ -78,7 +79,16 @@ private[spark] abstract class KubernetesConf(val sparkConf: 
SparkConf) {
   def getOption(key: String): Option[String] = sparkConf.getOption(key)
 }
 
-private[spark] class KubernetesDriverConf(
+
+/**
+ * :: DeveloperApi ::
+ *
+ * Used for K8s operations internally and Spark K8s operator.
+ */
+@Unstable
+@DeveloperApi
+@Since("4.0.0")

Review Comment:
   Is this correct?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48120] Enable autolink to SPARK jira issue [spark-kubernetes-operator]

2024-05-03 Thread via GitHub


dongjoon-hyun closed pull request #11: [SPARK-48120] Enable autolink to SPARK 
jira issue
URL: https://github.com/apache/spark-kubernetes-operator/pull/11


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48120] Enable autolink to SPARK jira issue [spark-kubernetes-operator]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #11:
URL: 
https://github.com/apache/spark-kubernetes-operator/pull/11#issuecomment-2093859360

   Thank you for review and approval, @jiangzho !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48019][SQL][FOLLOWUP] Use primitive arrays over object arrays when nulls exist [spark]

2024-05-03 Thread via GitHub


gene-db commented on PR #46372:
URL: https://github.com/apache/spark/pull/46372#issuecomment-2093857652

   > Please revise the PR title and description.
   
   @dongjoon-hyun Updated the title and description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] Promote `KubernetesDriverConf` to `DeveloperApi` [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46373:
URL: https://github.com/apache/spark/pull/46373#issuecomment-2093854600

   One more thing. If you have some time, please review the following PR on 
`Apache Spark K8s Operator` repository.
   - https://github.com/apache/spark-kubernetes-operator/pull/11


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48019][SQL][FOLLOWUP] Set nulls correctly for primitive arrays [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46372:
URL: https://github.com/apache/spark/pull/46372#issuecomment-2093851288

   Please revise the PR title and description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48019][SQL][FOLLOWUP] Set nulls correctly for primitive arrays [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46372:
URL: https://github.com/apache/spark/pull/46372#issuecomment-2093850534

   To @gene-db , for the following purpose, you should not use `correctly` in 
the PR title because it's misleading.
   > faster than the object ones.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-47097][CONNECT][TESTS][FOLLOWUP] Increase timeout to 1 minute for `interrupt tag` test [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun opened a new pull request, #46374:
URL: https://github.com/apache/spark/pull/46374

   ### What changes were proposed in this pull request?
   
   This is a follow-up.
   - #45173
   
   ### Why are the changes needed?
   
   To reduce the flakiness more
   - https://github.com/apache/spark/actions/runs/8944948827/job/24572965877
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] Promote `KubernetesDriverConf` to `DeveloperApi` [spark]

2024-05-03 Thread via GitHub


jiangzho opened a new pull request, #46373:
URL: https://github.com/apache/spark/pull/46373

   
   
   ### What changes were proposed in this pull request?
   
   
   This PR aims to promote `KubernetesDriverConf` to `DeveloperApi`
   
   ### Why are the changes needed?
   
   
   Since Apache Spark Kubernetes Operator requires this, we had better maintain 
it as a developer API officially from Apache Spark 4.0.0.
   
   https://github.com/apache/spark-kubernetes-operator/pull/10
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   Pass the CIs
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48019][SQL][FOLLOWUP] Set nulls correctly for primitive arrays [spark]

2024-05-03 Thread via GitHub


gene-db commented on PR #46372:
URL: https://github.com/apache/spark/pull/46372#issuecomment-2093845240

   @dongjoon-hyun 
   > * Could you provide a test case to validate your contribution and to 
prevent any future regression at this area?
   
   Yes, the previous PR added many 
[tests](https://github.com/apache/spark/blob/b42d235c29302b9faa4254d07db1282207345f69/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala#L503)
 in this area, to make sure it is correct.
   
   The reason I am updating in this PR is to go back to using primitive arrays 
(and still produce correct results with the tests), which are faster than the 
object ones.
   
   > * I'd like to recommend to use `[SQL]` in SQL module PR title next time.
   
   Ok, I will add the appropriate module name next time. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47578][CORE] Migrate logWarning with variables to structured logging framework [spark]

2024-05-03 Thread via GitHub


gengliangwang commented on code in PR #46309:
URL: https://github.com/apache/spark/pull/46309#discussion_r1589773774


##
core/src/test/scala/org/apache/spark/deploy/RPackageUtilsSuite.scala:
##
@@ -127,7 +127,7 @@ class RPackageUtilsSuite
   RPackageUtils.checkAndBuildRPackage(jar.getAbsolutePath, new 
BufferPrintStream,
 verbose = true)
   val output = lineBuffer.mkString("\n")
-  assert(output.contains(RPackageUtils.RJarDoc))
+  assert(output.contains("Building R package"))

Review Comment:
   This fix seems hacky. Is there a better fix for this one?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48116][INFRA] Run `pyspark-pandas*` only in PR builder and Daily Python CIs [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46367:
URL: https://github.com/apache/spark/pull/46367#issuecomment-2093836257

   Could you review this PR, @HyukjinKwon ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting [spark]

2024-05-03 Thread via GitHub


HeartSaVioR commented on code in PR #46351:
URL: https://github.com/apache/spark/pull/46351#discussion_r1589764171


##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:
##
@@ -351,7 +351,7 @@ private[sql] class HDFSBackedStateStoreProvider extends 
StateStoreProvider with
   }
 
   override def close(): Unit = {
-synchronized { loadedMaps.values.asScala.foreach(_.clear()) }
+synchronized { 
loadedMaps.keySet().asScala.toSeq.foreach(loadedMaps.remove) }

Review Comment:
   > so i think it just resets for the root and won't clear the values
   
   If we have belief of the implementation of GC then this shouldn't matter at 
all. The entry instance referenced by root is dereferenced, losing reference 
count, and will be GC-ed. cascading effect would apply to the whole tree and 
effectively all of them will be GC-ed.
   
   It might be still a best effort of freeing memory from these entries 
earlier, likewise we clear the value previously, but in theory they are 
semantically same. I'm OK with keeping best effort of it though. Not a big 
deal. Probably one-liner code comment on intention?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting [spark]

2024-05-03 Thread via GitHub


HeartSaVioR commented on code in PR #46351:
URL: https://github.com/apache/spark/pull/46351#discussion_r1589764171


##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:
##
@@ -351,7 +351,7 @@ private[sql] class HDFSBackedStateStoreProvider extends 
StateStoreProvider with
   }
 
   override def close(): Unit = {
-synchronized { loadedMaps.values.asScala.foreach(_.clear()) }
+synchronized { 
loadedMaps.keySet().asScala.toSeq.foreach(loadedMaps.remove) }

Review Comment:
   > so i think it just resets for the root and won't clear the values
   
   If we have belief of the implementation of GC then this shouldn't matter at 
all. The entry instance referenced by root is dereferenced, losing reference 
count, and will be GC-ed. cascading effect would apply to the whole tree and 
effectively all of them will be GC-ed.
   
   It might be still a best effort of freeing memory from these entries 
earlier, likewise we clear the value previously, but in theory they are 
semantically same. I'm OK with keeping best effort of it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48114][CORE] Precompile template regex to avoid unnecessary work [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun closed pull request #46365: [SPARK-48114][CORE] Precompile 
template regex to avoid unnecessary work
URL: https://github.com/apache/spark/pull/46365


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting [spark]

2024-05-03 Thread via GitHub


HeartSaVioR commented on code in PR #46351:
URL: https://github.com/apache/spark/pull/46351#discussion_r1589764171


##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:
##
@@ -351,7 +351,7 @@ private[sql] class HDFSBackedStateStoreProvider extends 
StateStoreProvider with
   }
 
   override def close(): Unit = {
-synchronized { loadedMaps.values.asScala.foreach(_.clear()) }
+synchronized { 
loadedMaps.keySet().asScala.toSeq.foreach(loadedMaps.remove) }

Review Comment:
   > so i think it just resets for the root and won't clear the values
   
   If we have belief of the implementation of GC then this shouldn't matter at 
all. the entry instance referenced by root is dereferenced, losing reference 
count, and will be GC-ed. cascading effect would apply to the whole tree and 
effectively all of them will be GC-ed.
   
   It might be still a best effort of freeing memory from these entries 
earlier, likewise we clear the value previously, but in theory they are 
semantically same. I'm OK with keeping best effort of it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48019][FOLLOWUP] Set nulls correctly for primitive arrays [spark]

2024-05-03 Thread via GitHub


gene-db commented on PR #46372:
URL: https://github.com/apache/spark/pull/46372#issuecomment-2093825065

   @cloud-fan Could you take a look at this followup to improve the primitive 
array with nulls issue. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48120] Enable autolink to SPARK jira issue [spark-kubernetes-operator]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #11:
URL: 
https://github.com/apache/spark-kubernetes-operator/pull/11#issuecomment-2093823794

   Could you review this, @jiangzho ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-48120] Enable autolink to SPARK jira issue [spark-kubernetes-operator]

2024-05-03 Thread via GitHub


dongjoon-hyun opened a new pull request, #11:
URL: https://github.com/apache/spark-kubernetes-operator/pull/11

   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48119][K8S] Promote `KubernetesDriverSpec` to `DeveloperApi` [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46371:
URL: https://github.com/apache/spark/pull/46371#issuecomment-2093819636

   Merged to master for Apache Spark 4.0.0-preview.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48119][K8S] Promote `KubernetesDriverSpec` to `DeveloperApi` [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun closed pull request #46371: [SPARK-48119][K8S] Promote 
`KubernetesDriverSpec` to `DeveloperApi`
URL: https://github.com/apache/spark/pull/46371


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48118][SQL] Support `SPARK_SQL_LEGACY_CREATE_HIVE_TABLE` env variable [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46369:
URL: https://github.com/apache/spark/pull/46369#issuecomment-2093817733

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48118][SQL] Support `SPARK_SQL_LEGACY_CREATE_HIVE_TABLE` env variable [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun closed pull request #46369: [SPARK-48118][SQL] Support 
`SPARK_SQL_LEGACY_CREATE_HIVE_TABLE` env variable
URL: https://github.com/apache/spark/pull/46369


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48118][SQL] Support `SPARK_SQL_LEGACY_CREATE_HIVE_TABLE` env variable [spark]

2024-05-03 Thread via GitHub


dongjoon-hyun commented on PR #46369:
URL: https://github.com/apache/spark/pull/46369#issuecomment-2093816952

   Thank you, @viirya !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting [spark]

2024-05-03 Thread via GitHub


huanliwang-db commented on PR #46351:
URL: https://github.com/apache/spark/pull/46351#issuecomment-2093814026

   > @huanliwang-db - actually do we need clear in the HDFSStateStoreMap 
implementation any more - are there any more callers for that function ? I 
guess we can remove that interface and the corresponding implementation too ?
   
   @anishshri-db there is no other caller, we can remove it. cc: @HeartSaVioR 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting [spark]

2024-05-03 Thread via GitHub


anishshri-db commented on PR #46351:
URL: https://github.com/apache/spark/pull/46351#issuecomment-2093810520

   @huanliwang-db - actually do we need `clear` in the HDFSStateStoreMap 
implementation any more - are there any more callers for that function ? I 
guess we can remove that interface and the corresponding implementation too ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-48119][K8S] Promote ` KubernetesDriverSpec` to `DeveloperApi` [spark]

2024-05-03 Thread via GitHub


jiangzho opened a new pull request, #46371:
URL: https://github.com/apache/spark/pull/46371

   
   
   ### What changes were proposed in this pull request?
   
   
   This PR aims to promote ` KubernetesDriverSpec` to `DeveloperApi`
   
   ### Why are the changes needed?
   
   
   Since Apache Spark Kubernetes Operator requires this, we had better maintain 
it as a developer API officially from Apache Spark 4.0.0.
   
   https://github.com/apache/spark-kubernetes-operator/pull/10
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   No
   
   ### How was this patch tested?
   
   
   Pass the CIs
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting [spark]

2024-05-03 Thread via GitHub


anishshri-db commented on PR #46351:
URL: https://github.com/apache/spark/pull/46351#issuecomment-2093801768

   @HeartSaVioR - do we also need to port to older Spark versions ? 3.5 maybe ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48105][SS] Fix the race condition between state store unloading and snapshotting [spark]

2024-05-03 Thread via GitHub


HeartSaVioR commented on code in PR #46351:
URL: https://github.com/apache/spark/pull/46351#discussion_r1589746596


##
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala:
##
@@ -351,7 +351,7 @@ private[sql] class HDFSBackedStateStoreProvider extends 
StateStoreProvider with
   }
 
   override def close(): Unit = {
-synchronized { loadedMaps.values.asScala.foreach(_.clear()) }
+synchronized { 
loadedMaps.keySet().asScala.toSeq.foreach(loadedMaps.remove) }

Review Comment:
   It may not be very huge as we are only copying the "map" and the memory 
usage of UnsafeRow would be the same, but the proposed way is still an 
optimized one (removing the overhead of copying) so I agree it's better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >