Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-28 Thread Asif Shahid
Thanks for the explanation. Regards Asif On Tue, Jan 28, 2025 at 10:00 AM Herman van Hovell wrote: > There are many factors: > >- Typically it is a race between multiple PRs, where they all pass CI >without the other changes, and get merged at the same time. >- Differences between

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-28 Thread Herman van Hovell
There are many factors: - Typically it is a race between multiple PRs, where they all pass CI without the other changes, and get merged at the same time. - Differences between (the nightly job and the PR job) environments (e.g. size of the machine) can also cause these issues. - In

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-28 Thread Asif Shahid
I am genuinely curious to know, as to how do those commits which are reliably failing the build, end up in master ? Is there some window of race where two conflicting PRs in terms of logic ,tend to mess up the final state in master ? I have seen in past few months, while synching up my open PRs, f

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-27 Thread Dongjoon Hyun
Did you see the PR, Martin? SBT is also broken like the following and we've been waiting for actions over two days on the original PR. $ build/sbt clean "catalyst/testOnly org.apache.spark.sql.catalyst.encoders.EncoderResolutionSuite" ... [info] *** 1 SUITE ABORTED *** [error] Error during tests:

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-27 Thread Martin Grund
Would it not have been mindful to wait for the original author to investigate the PR and do a forward fix instead of reverting such a big change? Since this was only blocking the Maven test we could have waited probably a few more days without any issues. On Mon, Jan 27, 2025 at 8:32 PM Dongjoon H

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-27 Thread Dongjoon Hyun
This is reverted from branch-4.0 via the following. - https://github.com/apache/spark/pull/49696 Revert "[SPARK-49700][CONNECT][SQL] Unified Scala Interface for Connect and Classic" Dongjoon. On 2025/01/26 16:58:45 Dongjoon Hyun wrote: > Thank you! > > Dongjoon > > On Sat, Jan 25, 2025 at 20:

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-26 Thread Dongjoon Hyun
Thank you! Dongjoon On Sat, Jan 25, 2025 at 20:01 Yang Jie wrote: > I reported a test issue that is suspected to be related to this pr: > > - https://github.com/apache/spark/pull/48818/files#r1929652392 > > and it seems to be causing the failure of the Maven daily test. > > Thanks, > Jie Yang >

Re: FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-25 Thread Yang Jie
I reported a test issue that is suspected to be related to this pr: - https://github.com/apache/spark/pull/48818/files#r1929652392 and it seems to be causing the failure of the Maven daily test. Thanks, Jie Yang On 2025/01/24 20:24:57 Dongjoon Hyun wrote: > Hi, All. > > SPARK-49700 landed one

FYI: SPARK-49700 Unified Scala Interface for Connect and Classic

2025-01-24 Thread Dongjoon Hyun
Hi, All. SPARK-49700 landed one hour ago. Since this is another huge package redesign across 399 files in Spark 4.0, please check if you are not affected accidentally. Best Regards, Dongjoon.

RE: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-23 Thread Balaji Sudharsanam V
Do we have a Java Client for Spark Connect which is something like PySpark? From: Mich Talebzadeh Sent: 22 January 2025 15:05 To: Hyukjin Kwon Cc: Martin Grund ; Holden Karau ; Dongjoon Hyun ; dev Subject: [EXTERNAL] Re: FYI: A Hallucination about Spark Connect Stability in Spark 4 CI

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Mich Talebzadeh
o they work well with Apache Spark 4? I'm wondering if there is >>>> any clue for the Apache Spark community to do assessment? >>>> >>>> Given (1), (2), and (3), how can we make sure that `Spark Connect` is >>>> stable or ready in Spark 4? From my perspective, this is still actively >>>> under development with an open end. >>>> >>>> The bottom line is `Spark Connect` needs more community love in order >>>> to be claimed as Stable in Apache Spark 4. I'm looking forward to seeing >>>> the healthy Spark Connect CI in Spark 4. Until then, let's clarify what is >>>> stable in `Spark Connect` and what is not yet. >>>> >>>> Best Regards, >>>> Dongjoon. >>>> >>>> PS. >>>> This is a seperate thread from the previous flakiness issues. >>>> https://lists.apache.org/thread/r5dzdr3w4ly0dr99k24mqvld06r4mzmq >>>> ([FYI] Known `Spark Connect` Test Suite Flakiness) >>>> >>>

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Hyukjin Kwon
, and (3), how can we make sure that `Spark Connect` is >>> stable or ready in Spark 4? From my perspective, this is still actively >>> under development with an open end. >>> >>> The bottom line is `Spark Connect` needs more community love in order to >>> be claimed as Stable in Apache Spark 4. I'm looking forward to seeing the >>> healthy Spark Connect CI in Spark 4. Until then, let's clarify what is >>> stable in `Spark Connect` and what is not yet. >>> >>> Best Regards, >>> Dongjoon. >>> >>> PS. >>> This is a seperate thread from the previous flakiness issues. >>> https://lists.apache.org/thread/r5dzdr3w4ly0dr99k24mqvld06r4mzmq >>> ([FYI] Known `Spark Connect` Test Suite Flakiness) >>> >>

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-22 Thread Martin Grund
in Spark 4? From my perspective, this is still actively >> under development with an open end. >> >> The bottom line is `Spark Connect` needs more community love in order to >> be claimed as Stable in Apache Spark 4. I'm looking forward to seeing the >> healthy Spark Con

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Jules Damji
t;>>> https://github.com/apache/spark/actions/workflows/build_python_connect35.yml >>>>> (Spark Connect Python-only:master-server, 35-client) >>>>> >>>>> 3. What about the stability and the feature parities in different >>>>> languages? Do they work well with Apache Spark 4? I'm wondering if there >>>>> is >>>>> any clue for the Apache Spark community to do assessment? >>>>> >>>>> Given (1), (2), and (3), how can we make sure that `Spark Connect` is >>>>> stable or ready in Spark 4? From my perspective, this is still actively >>>>> under development with an open end. >>>>> >>>>> The bottom line is `Spark Connect` needs more community love in order >>>>> to be claimed as Stable in Apache Spark 4. I'm looking forward to seeing >>>>> the healthy Spark Connect CI in Spark 4. Until then, let's clarify what is >>>>> stable in `Spark Connect` and what is not yet. >>>>> >>>>> Best Regards, >>>>> Dongjoon. >>>>> >>>>> PS. >>>>> This is a seperate thread from the previous flakiness issues. >>>>> https://lists.apache.org/thread/r5dzdr3w4ly0dr99k24mqvld06r4mzmq >>>>> ([FYI] Known `Spark Connect` Test Suite Flakiness) >>>>> >>>>

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Ángel
t;>>>> 2. >>>>> https://github.com/apache/spark/actions/workflows/build_python_connect35.yml >>>>> (Spark Connect Python-only:master-server, 35-client) >>>>> >>>>> 3. What about the stability and the feature parities in differe

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Dongjoon Hyun
rk/actions/workflows/build_python_connect35.yml >>>>> (Spark Connect Python-only:master-server, 35-client) >>>>> >>>>> 3. What about the stability and the feature parities in different >>>>> languages? Do they work well with Apache Spark 4? I'm wondering if there >>>>> is >>>>> any clue for the Apache Spark community to do assessment? >>>>> >>>>> Given (1), (2), and (3), how can we make sure that `Spark Connect` is >>>>> stable or ready in Spark 4? From my perspective, this is still actively >>>>> under development with an open end. >>>>> >>>>> The bottom line is `Spark Connect` needs more community love in order >>>>> to be claimed as Stable in Apache Spark 4. I'm looking forward to seeing >>>>> the healthy Spark Connect CI in Spark 4. Until then, let's clarify what is >>>>> stable in `Spark Connect` and what is not yet. >>>>> >>>>> Best Regards, >>>>> Dongjoon. >>>>> >>>>> PS. >>>>> This is a seperate thread from the previous flakiness issues. >>>>> https://lists.apache.org/thread/r5dzdr3w4ly0dr99k24mqvld06r4mzmq >>>>> ([FYI] Known `Spark Connect` Test Suite Flakiness) >>>>> >>>>

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Hyukjin Kwon
ment? >>>> >>>> Given (1), (2), and (3), how can we make sure that `Spark Connect` is >>>> stable or ready in Spark 4? From my perspective, this is still actively >>>> under development with an open end. >>>> >>>> The bottom line is `Spark Connect` needs more community love in order >>>> to be claimed as Stable in Apache Spark 4. I'm looking forward to seeing >>>> the healthy Spark Connect CI in Spark 4. Until then, let's clarify what is >>>> stable in `Spark Connect` and what is not yet. >>>> >>>> Best Regards, >>>> Dongjoon. >>>> >>>> PS. >>>> This is a seperate thread from the previous flakiness issues. >>>> https://lists.apache.org/thread/r5dzdr3w4ly0dr99k24mqvld06r4mzmq >>>> ([FYI] Known `Spark Connect` Test Suite Flakiness) >>>> >>>

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Hyukjin Kwon
` is >>> stable or ready in Spark 4? From my perspective, this is still actively >>> under development with an open end. >>> >>> The bottom line is `Spark Connect` needs more community love in order to >>> be claimed as Stable in Apache Spark 4. I'm looking forward to seeing the >>> healthy Spark Connect CI in Spark 4. Until then, let's clarify what is >>> stable in `Spark Connect` and what is not yet. >>> >>> Best Regards, >>> Dongjoon. >>> >>> PS. >>> This is a seperate thread from the previous flakiness issues. >>> https://lists.apache.org/thread/r5dzdr3w4ly0dr99k24mqvld06r4mzmq >>> ([FYI] Known `Spark Connect` Test Suite Flakiness) >>> >>

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Mich Talebzadeh
under development with an open end. >> >> The bottom line is `Spark Connect` needs more community love in order to >> be claimed as Stable in Apache Spark 4. I'm looking forward to seeing the >> healthy Spark Connect CI in Spark 4. Until then, let's clarify what is >> stable in `Spark Connect` and what is not yet. >> >> Best Regards, >> Dongjoon. >> >> PS. >> This is a seperate thread from the previous flakiness issues. >> https://lists.apache.org/thread/r5dzdr3w4ly0dr99k24mqvld06r4mzmq >> ([FYI] Known `Spark Connect` Test Suite Flakiness) >> >

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Dongjoon Hyun
; healthy Spark Connect CI in Spark 4. Until then, let's clarify what is > > stable in `Spark Connect` and what is not yet. > > > > Best Regards, > > Dongjoon. > > > > PS. > > This is a seperate thread from the previous flakiness issues. > > https://lists.apache.org/thread/r5dzdr3w4ly0dr99k24mqvld06r4mzmq > > ([FYI] Known `Spark Connect` Test Suite Flakiness) > > > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Holden Karau
` needs more community love in order to > be claimed as Stable in Apache Spark 4. I'm looking forward to seeing the > healthy Spark Connect CI in Spark 4. Until then, let's clarify what is > stable in `Spark Connect` and what is not yet. > > Best Regards, > Dongjoon. >

FYI: A Hallucination about Spark Connect Stability in Spark 4

2025-01-21 Thread Dongjoon Hyun
le in `Spark Connect` and what is not yet. Best Regards, Dongjoon. PS. This is a seperate thread from the previous flakiness issues. https://lists.apache.org/thread/r5dzdr3w4ly0dr99k24mqvld06r4mzmq ([FYI] Known `Spark Connect` Test Suite Flakiness)

Re: [FYI] Known `Spark Connect` Test Suite Flakiness

2025-01-20 Thread Dongjoon Hyun
Thank you, Paddy. Dongjoon. On Mon, Jan 20, 2025 at 2:32 AM Paddy Xu wrote: > I have worked on tests related to “interrupt”. Not sure about SPARK-50888: > > My findings: > 1. These test failures only occur in the GitHub CI. > 2. The failure is due to the thread pool we created in CI having on

RE: [FYI] Known `Spark Connect` Test Suite Flakiness

2025-01-20 Thread Paddy Xu
I have worked on tests related to “interrupt”. Not sure about SPARK-50888: My findings: 1. These test failures only occur in the GitHub CI. 2. The failure is due to the thread pool we created in CI having only two threads, while our tests require three concurrent threads to run. To workaround th

[FYI] Known `Spark Connect` Test Suite Flakiness

2025-01-18 Thread Dongjoon Hyun
Hi, All. This is a kind of head-up as a part of Apache Spark 4.0.0 preparation. https://issues.apache.org/jira/browse/SPARK-44111 (Prepare Apache Spark 4.0.0) It would be great if we are able to fix long-standing `Spark Connect` test flakiness together during the QA period (2025-02-01 ~) in orde

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-27 Thread Hussein Awala
>> > 日期: 2024年4月26日 星期五 15:05 >> > 收件人: Xinrong Meng >> > 抄送: Dongjoon Hyun , "dev@spark.apache.org" < >> dev@spark.apache.org> >> > 主题: Re: [FYI] SPARK-47993: Drop Python 3.8 >> > >> > >> > >> > +1

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-26 Thread John Zhuge
+1 On Fri, Apr 26, 2024 at 8:41 AM Kent Yao wrote: > +1 > > yangjie01 于2024年4月26日周五 17:16写道: > > > > +1 > > > > > > > > 发件人: Ruifeng Zheng > > 日期: 2024年4月26日 星期五 15:05 > > 收件人: Xinrong Meng > > 抄送: Dongjoon Hyun , "dev@spark

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-26 Thread Kent Yao
+1 yangjie01 于2024年4月26日周五 17:16写道: > > +1 > > > > 发件人: Ruifeng Zheng > 日期: 2024年4月26日 星期五 15:05 > 收件人: Xinrong Meng > 抄送: Dongjoon Hyun , "dev@spark.apache.org" > > 主题: Re: [FYI] SPARK-47993: Drop Python 3.8 > > > > +1 > > >

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-26 Thread yangjie01
+1 发件人: Ruifeng Zheng 日期: 2024年4月26日 星期五 15:05 收件人: Xinrong Meng 抄送: Dongjoon Hyun , "dev@spark.apache.org" 主题: Re: [FYI] SPARK-47993: Drop Python 3.8 +1 On Fri, Apr 26, 2024 at 10:26 AM Xinrong Meng mailto:xinr...@apache.org>> wrote: +1 On Thu, Apr 25, 2024 at 2:08

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Ruifeng Zheng
gt;>> Web: https://zero323.net >>> PGP: A30CEF0C31A501EC >>> >>> On 4/25/24 6:21 PM, Reynold Xin wrote: >>> >>> +1 >>> >>> On Thu, Apr 25, 2024 at 9:01 AM Santosh Pingale >>> >>> wrote: >>> >>>

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Denny Lee
>>> >>> Web: https://zero323.net >>> PGP: A30CEF0C31A501EC >>> >>> On 4/25/24 6:21 PM, Reynold Xin wrote: >>> >>> +1 >>> >>> On Thu, Apr 25, 2024 at 9:01 AM Santosh Pingale >>> >>> wrote: >>>

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Xinrong Meng
>> >> On Thu, Apr 25, 2024 at 9:01 AM Santosh Pingale >> >> wrote: >> >>> +1 >>> >>> On Thu, Apr 25, 2024, 5:41 PM Dongjoon Hyun >>> wrote: >>> >>>> FYI, there is a proposal to drop Python 3.8 because its EO

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread L. C. Hsieh
e > wrote: >> >> +1 >> >> On Thu, Apr 25, 2024, 5:41 PM Dongjoon Hyun wrote: >>> >>> FYI, there is a proposal to drop Python 3.8 because its EOL is October 2024. >>> >>> https://github.com/apache/spark/pull/46228 >>> [SPARK-47993]

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Holden Karau
gards, > Maciej Szymkiewicz > > Web: https://zero323.net > PGP: A30CEF0C31A501EC > > On 4/25/24 6:21 PM, Reynold Xin wrote: > > +1 > > On Thu, Apr 25, 2024 at 9:01 AM Santosh Pingale > > wrote: > >> +1 >> >> On Thu, Apr 25, 2024, 5:41 PM Don

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Maciej
+1 Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 4/25/24 6:21 PM, Reynold Xin wrote: +1 On Thu, Apr 25, 2024 at 9:01 AM Santosh Pingale wrote: +1 On Thu, Apr 25, 2024, 5:41 PM Dongjoon Hyun wrote: FYI, there is a proposal to drop

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Reynold Xin
+1 On Thu, Apr 25, 2024 at 9:01 AM Santosh Pingale wrote: > +1 > > On Thu, Apr 25, 2024, 5:41 PM Dongjoon Hyun > wrote: > >> FYI, there is a proposal to drop Python 3.8 because its EOL is October >> 2024. >> >> https://github.com/apache/spark/pull/46228 &

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Santosh Pingale
+1 On Thu, Apr 25, 2024, 5:41 PM Dongjoon Hyun wrote: > FYI, there is a proposal to drop Python 3.8 because its EOL is October > 2024. > > https://github.com/apache/spark/pull/46228 > [SPARK-47993][PYTHON] Drop Python 3.8 > > Since it's still alive and there will

[FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Dongjoon Hyun
FYI, there is a proposal to drop Python 3.8 because its EOL is October 2024. https://github.com/apache/spark/pull/46228 [SPARK-47993][PYTHON] Drop Python 3.8 Since it's still alive and there will be an overlap between the lifecycle of Python 3.8 and Apache Spark 4.0.0, please give us

[FYI] SPARK-47046: Apache Spark 4.0.0 Dependency Audit and Cleanup

2024-04-21 Thread Dongjoon Hyun
Hi, All. As a part of Apache Spark 4.0.0 (SPAR-44111), we have been doing dependency audits. Today, we want to share the current readiness of Apache Spark 4.0.0 and get your feedback for further completeness. https://issues.apache.org/jira/browse/SPARK-44111 Prepare Apache Spark 4.0.0 Dependency

Re: [FYI] SPARK-45981: Improve Python language test coverage

2023-12-02 Thread Hyukjin Kwon
Awesome! On Sat, Dec 2, 2023 at 2:33 PM Dongjoon Hyun wrote: > Hi, All. > > As a part of Apache Spark 4.0.0 (SPARK-44111), the Apache Spark community > starts to have test coverage for all supported Python versions from Today. > > - https://github.com/apache/spark/actions/runs/7061665420 > > Her

[FYI] SPARK-45981: Improve Python language test coverage

2023-12-01 Thread Dongjoon Hyun
Hi, All. As a part of Apache Spark 4.0.0 (SPARK-44111), the Apache Spark community starts to have test coverage for all supported Python versions from Today. - https://github.com/apache/spark/actions/runs/7061665420 Here is a summary. 1. Main CI: All PRs and commits on `master` branch are teste

Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-16 Thread Wenchen Fan
Great job! On Sat, Nov 13, 2021 at 11:18 AM Hyukjin Kwon wrote: > Awesome! > > On Sat, Nov 13, 2021 at 12:04 PM Xiao Li wrote: > >> Thank you! Great job! >> >> Xiao >> >> >> On Fri, Nov 12, 2021 at 7:02 PM Mridul Muralidharan >> wrote: >> >>> >>> Nice job ! >>> There are some nice API's which

Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-12 Thread Hyukjin Kwon
Awesome! On Sat, Nov 13, 2021 at 12:04 PM Xiao Li wrote: > Thank you! Great job! > > Xiao > > > On Fri, Nov 12, 2021 at 7:02 PM Mridul Muralidharan > wrote: > >> >> Nice job ! >> There are some nice API's which should be interesting to explore with JDK >> 17 :-) >> >> Regards. >> Mridul >> >> O

Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-12 Thread Xiao Li
Thank you! Great job! Xiao On Fri, Nov 12, 2021 at 7:02 PM Mridul Muralidharan wrote: > > Nice job ! > There are some nice API's which should be interesting to explore with JDK > 17 :-) > > Regards. > Mridul > > On Fri, Nov 12, 2021 at 7:08 PM Yuming Wang wrote: > >> Cool, thank you Dongjoon.

Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-12 Thread Mridul Muralidharan
Nice job ! There are some nice API's which should be interesting to explore with JDK 17 :-) Regards. Mridul On Fri, Nov 12, 2021 at 7:08 PM Yuming Wang wrote: > Cool, thank you Dongjoon. > > On Sat, Nov 13, 2021 at 4:09 AM shane knapp ☠ wrote: > >> woot! nice work everyone! :) >> >> On Fri,

Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-12 Thread Yuming Wang
Cool, thank you Dongjoon. On Sat, Nov 13, 2021 at 4:09 AM shane knapp ☠ wrote: > woot! nice work everyone! :) > > On Fri, Nov 12, 2021 at 11:37 AM Dongjoon Hyun > wrote: > >> Hi, All. >> >> Apache Spark community has been working on Java 17 support under the >> following JIRA. >> >> https

Re: [FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-12 Thread shane knapp ☠
woot! nice work everyone! :) On Fri, Nov 12, 2021 at 11:37 AM Dongjoon Hyun wrote: > Hi, All. > > Apache Spark community has been working on Java 17 support under the > following JIRA. > > https://issues.apache.org/jira/browse/SPARK-33772 > > As of today, Apache Spark starts to have daily

[FYI] Build and run tests on Java 17 for Apache Spark 3.3

2021-11-12 Thread Dongjoon Hyun
Hi, All. Apache Spark community has been working on Java 17 support under the following JIRA. https://issues.apache.org/jira/browse/SPARK-33772 As of today, Apache Spark starts to have daily Java 17 test coverage via GitHub Action jobs for Apache Spark 3.3. https://github.com/apache/spark/

[FYI] Scala 2.13 Maven Artifacts

2021-01-27 Thread Dongjoon Hyun
Hi, All. Apache Spark community starts to publish Scala 2.13 Maven artifacts daily. https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-core_2.13/3.2.0-SNAPSHOT/ It aims to encourage more tests on Scala 2.13 (and Scala 3) and to identify issues in advance for Apa

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-09 Thread Hyukjin Kwon
To share the update about GitHub Actions: - I am informed that having more resources is now being discussed in org level. Hopefully we get the situation better. - Duplicated workflows will be canceled to save resources, see https://github.com/apache/spark/pull/31104 cc @Holden Karau @Xiao Li too

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread Hyukjin Kwon
For GitHub resources of ASF repo, I have been contacting GitHub to address the issue few days ago. This is not a repo level problem cc @Sean Owen . ASF organisation in GitHub has already too many repos, and we should have a way to increase the limit, or set the separare limit specifically for the

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread shane knapp ☠
no, i don't think that'd be a good idea... adding additional dependencies to our cluster won't scale one bit. On Fri, Jan 8, 2021 at 2:16 PM Dongjoon Hyun wrote: > BTW, Shane, do you think we can utilize some of UCB machines as GitHub > Action runners? > > Bests, > Dongjoon. > > On Fri, Jan 8,

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread shane knapp ☠
hmm, the ubuntu16 machines are acting up. i pinned the sbt master builds to ubuntu20 and they're happily building while i investigate wtf is up. On Fri, Jan 8, 2021 at 2:15 PM Dongjoon Hyun wrote: > The followings? > > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread Dongjoon Hyun
BTW, Shane, do you think we can utilize some of UCB machines as GitHub Action runners? Bests, Dongjoon. On Fri, Jan 8, 2021 at 2:14 PM Dongjoon Hyun wrote: > The followings? > > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/18

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread Dongjoon Hyun
The followings? https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/1836/console https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/1887/console On Fri, Jan 8, 2021 at 2:13 P

Re: [FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread shane knapp ☠
> > 1. Jenkins machines start to fail with the following recently. > (master branch) > > Python versions prior to 3.6 are not supported. > Build step 'Execute shell' marked build as failure > > examples please? -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research /

[FYI] CI Infra issues (in both GitHub Action and Jenkins)

2021-01-08 Thread Dongjoon Hyun
Hi, All. There are two issues currently. 1. Jenkins machines start to fail with the following recently. (master branch) Python versions prior to 3.6 are not supported. Build step 'Execute shell' marked build as failure 2. Lack of GitHub Action backend machines. There is a JIRA

Re: FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2020-12-01 Thread Ryan Blue
Wenchen, could you start a new thread? Many people have probably already muted this one, and it isn't really on topic. The question that needs to be discussed is whether this is a safe change for the 3.1 release, and reusing an old thread is not a great way to get people's attention about somethin

Re: FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2020-12-01 Thread Wenchen Fan
I'm reviving this thread because this feature was reverted before the 3.0 release, and now we are trying to add it back since the CREATE TABLE syntax is unified. The benefits are pretty clear: CREATE TABLE by default (without USING or STORED AS) should create native tables that work best with Spar

Re: [FYI] Removing `spark-3.1.0-bin-hadoop2.7-hive1.2.tgz` from Apache Spark 3.1 distribution

2020-10-07 Thread Dongjoon Hyun
There is a new thread here. https://lists.apache.org/thread.html/ra2418b75ac276861a598e7ec943750e2b038c2f8ba49f41db57e5ae9%40%3Cdev.spark.apache.org%3E Could you share your use case of Hive 1.2 here, Koert? Bests, Dongjoon. On Wed, Oct 7, 2020 at 1:04 PM Koert Kuipers wrote: > i am a little c

Re: [FYI] Removing `spark-3.1.0-bin-hadoop2.7-hive1.2.tgz` from Apache Spark 3.1 distribution

2020-10-07 Thread Koert Kuipers
i am a little confused about this. i assumed spark would no longer make a distribution with hive 1.x, but the hive-1.2 profile remains. yet i see the hive-1.2 profile has been removed from pom.xml? On Wed, Sep 23, 2020 at 6:58 PM Dongjoon Hyun wrote: > Hi, All. > > Since Apache Spark 3.0.0, Apa

Re: [FYI] Kubernetes GA Preparation (SPARK-33005)

2020-10-04 Thread Dongjoon Hyun
Sure, German. Please add your comment on that issue. If possible, please provide a reproducible example which you did. On Sat, Oct 3, 2020 at 10:31 PM German Schiavon wrote: > Hi! > I just run to this same issue while testing k8s in local mode > > https://issues.apache.org/jira/browse/SPARK-3

Re: [FYI] Kubernetes GA Preparation (SPARK-33005)

2020-10-03 Thread German Schiavon
Hi! I just run to this same issue while testing k8s in local mode https://issues.apache.org/jira/browse/SPARK-31800 Note that the tittle shouldn't be "*Unable to disable Kerberos when submitting jobs to Kubernetes" *(based on the comments) and something more related with the spark.kubernetes.fil

Re: [FYI] Kubernetes GA Preparation (SPARK-33005)

2020-09-29 Thread Dongjoon Hyun
Thank you! Bests, Dongjoon On Mon, Sep 28, 2020 at 8:07 PM Dr. Kent Yao wrote: > Thanks, Dongjon, > >I pined two long-standing issues to the umbrella. > > > >https://issues.apache.org/jira/browse/SPARK-28895 > >https://issues.apache.org/jira/browse/SPARK-28992 > > > >This helps

Re: [FYI] Kubernetes GA Preparation (SPARK-33005)

2020-09-28 Thread Dr. Kent Yao
Thanks, Dongjon, I pined two long-standing issues to the umbrella. https://issues.apache.org/jira/browse/SPARK-28895 https://issues.apache.org/jira/browse/SPARK-28992 This helps fix some problems of running Spark on k8s application with HDFS. Hoping to get reviewed soon Best

[FYI] Kubernetes GA Preparation (SPARK-33005)

2020-09-28 Thread Dongjoon Hyun
Hi, All. K8s GA preparation is on the way like the following. https://issues.apache.org/jira/browse/SPARK-33005 Apache Spark 3.1/3.2 is scheduled for December 2020 and mid of 2021 (TBD). If you hit K8s issues, please file a JIRA issue. To give more visibility to your issue, you can create yo

[FYI] Removing `spark-3.1.0-bin-hadoop2.7-hive1.2.tgz` from Apache Spark 3.1 distribution

2020-09-23 Thread Dongjoon Hyun
Hi, All. Since Apache Spark 3.0.0, Apache Hive 2.3.7 is the default Hive execution library. The forked Hive 1.2.1 library is not recommended because it's not maintained properly. In Apache Spark 3.1 on December 2020, we are going to remove it from our official distribution. https://github.co

Re: FYI: The evolution on `CHAR` type behavior

2020-03-19 Thread Reynold Xin
I had a wrong assumption for > the implication of that "(2) FYI: SPARK-30098 Use default datasource as > provider for CREATE TABLE syntax", Reynold. I admit that. You may not feel > in the similar way. However, it was a lot to me. Also, switching > `convertMetastoreOrc` at 2.

Re: FYI: The evolution on `CHAR` type behavior

2020-03-19 Thread Dongjoon Hyun
Technically, I has been suffered with (1) `CREATE TABLE` due to many difference for a long time (since 2017). So, I had a wrong assumption for the implication of that "(2) FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax", Reynold. I admit that. You may not f

Re: FYI: The evolution on `CHAR` type behavior

2020-03-19 Thread Reynold Xin
ference and effects are informed widely and > discussed in many ways twice. > > First, this was shared on last December. > >     "FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE > syntax", 2019/12/06 >    https:/ / lists. apache. org/ thread.

Re: FYI: The evolution on `CHAR` type behavior

2020-03-19 Thread Dongjoon Hyun
+1 for Wenchen's suggestion. I believe that the difference and effects are informed widely and discussed in many ways twice. First, this was shared on last December. "FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax", 2019/12/06 https://l

Re: FYI: The evolution on `CHAR` type behavior

2020-03-17 Thread Wenchen Fan
OK let me put a proposal here: 1. Permanently ban CHAR for native data source tables, and only keep it for Hive compatibility. It's OK to forget about padding like what Snowflake and MySQL have done. But it's hard for Spark to require consistent behavior about CHAR type in all data sources. Since

Re: FYI: The evolution on `CHAR` type behavior

2020-03-17 Thread Michael Armbrust
> > What I'd oppose is to just ban char for the native data sources, and do > not have a plan to address this problem systematically. > +1 > Just forget about padding, like what Snowflake and MySQL have done. > Document that char(x) is just an alias for string. And then move on. Almost > no work

Re: FYI: The evolution on `CHAR` type behavior

2020-03-17 Thread Maryann Xue
mp;reserved=0> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>&

Re: FYI: The evolution on `CHAR` type behavior

2020-03-17 Thread Wenchen Fan
&data=02%7C01%7Cscoy%40infomedia.com.au%7C5346c8d2675342008b5708d7c9fdff54%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637199965062054364&sdata=pWQ9QhfVY4Uzyc8oIJ1QONQ0zOBAQ2DGSemyBj%2BvFeM%3D&reserved=0> >>>>>>>>>>> "Revert SPARK-30098 U

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Stephen Coy
policy we voted. The recommendation is always using Apache Spark's native type `String`. Bests, Dongjoon. References: 1. "CHAR implementation?", 2017/09/15 https://lists.apache.org/thread.html/96b004331d9762e356053b5c8c97e953e398e489d15e1b49e775702f%40%3Cdev.spark.apache.o

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
gt;> Please see the following for the context. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> https:/ / issues. apache. org/ jira/ browse/ SPARK-31136 ( >>>

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
> >>>>>>>>>> For new users, depending on whether the underlying metastore >>>>>>>>>> char(3) is either supported but different from ansi Sql (which is >>>>>>>>>> not that >>>>>>>>>&g

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
. >>>>>>>>> >>>>>>>>> On Sat, Mar 14, 2020 at 17:54 Reynold Xin < rxin@ databricks. com ( >>>>>>>>> r...@databricks.com ) > wrote: >>>>>>>>> >>>>>>>>> &

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
gt;>>>>>>> following >>>>>>>>> is the summary. >>>>>>>>> >>>>>>>>> With 1.6.x ~ 2.3.x, `STORED PARQUET` has the following different >>>>>>>>> result. >>>>>>>>> (`spark.

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
>> >>>>>>>>> Hi, All. >>>>>>>>> >>>>>>>>> Apache Spark has been suffered from a known consistency issue on >>>>>>>>> `CHAR` >>>>>>>>> type behavior among its usage

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
.6.x ~ 2.3.x, `STORED PARQUET` has the following different >>>>>>>> result. >>>>>>>> (`spark.sql.hive.convertMetastoreParquet=false` provides a fallback to >>>>>>>> Hive behavior.) >>>>>>>> >>>&

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
gt;> a 2 >>>>>> >>>>>> Since 3.0.0-preview2, `CREATE TABLE` (without `STORED AS` clause) >>>>>> became consistent. >>>>>> (`spark.sql.legacy.createHiveTableByDefault.enabled=true` provides a >>

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Stephen Coy
53b5c8c97e953e398e489d15e1b49e775702f%40%3Cdev.spark.apache.org%3E<https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.apache.org%2Fthread.html%2F96b004331d9762e356053b5c8c97e953e398e489d15e1b49e775702f%2540%253Cdev.spark.apache.org%253E&data=02%7C01%7Cscoy%40infomedia.com.a

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Reynold Xin
;>>     a   3 >>>>>>     spark-sql> SELECT a, length(a) FROM t2; >>>>>>     a   3 >>>>>>     spark-sql> SELECT a, length(a) FROM t3; >>>>>>     a 2 >>>>>> >>&

Re: FYI: The evolution on `CHAR` type behavior

2020-03-16 Thread Dongjoon Hyun
sql.hive.convertMetastoreOrc=false` provides a fallback to >>>>> Hive behavior.) >>>>> >>>>> spark-sql> SELECT a, length(a) FROM t1; >>>>> a 3 >>>>> spark-sql> SELECT a, length(a) FROM t2; >>>>>

Re: FYI: The evolution on `CHAR` type behavior

2020-03-15 Thread Reynold Xin
gt; >>>     spark-sql> SELECT a, length(a) FROM t1; >>>     a 2 >>>     spark-sql> SELECT a, length(a) FROM t2; >>>     a 2 >>>     spark-sql> SELECT a, length(a) FROM t3; >>>     a 2 >>> >>> In addition, in 3.0.0, SP

Re: FYI: The evolution on `CHAR` type behavior

2020-03-15 Thread Dongjoon Hyun
gt;> a 2 >> >> In addition, in 3.0.0, SPARK-31147 aims to ban `CHAR/VARCHAR` type in the >> following syntax to be safe. >> >> CREATE TABLE t(a CHAR(3)); >> https://github.com/apache/spark/pull/27902 >> >> This email is sent out to infor

Re: FYI: The evolution on `CHAR` type behavior

2020-03-14 Thread Reynold Xin
; following syntax to be safe. > > CREATE TABLE t(a CHAR(3)); > https://github.com/apache/spark/pull/27902 > > This email is sent out to inform you based on the new policy we voted. > The recommendation is always using Apache Spark's native type `String`. > > B

FYI: The evolution on `CHAR` type behavior

2020-03-14 Thread Dongjoon Hyun
//github.com/apache/spark/pull/27902 This email is sent out to inform you based on the new policy we voted. The recommendation is always using Apache Spark's native type `String`. Bests, Dongjoon. References: 1. "CHAR implementation?", 2017/09/15 https://lists.apache.org/thread.htm

Re: [FYI] `Target Version` on `Improvement`/`New Feature` JIRA issues

2020-02-02 Thread Wenchen Fan
Thanks for cleaning this up! On Sun, Feb 2, 2020 at 2:08 PM Xiao Li wrote: > Thanks! Dongjoon. > > Xiao > > On Sat, Feb 1, 2020 at 5:15 PM Hyukjin Kwon wrote: > >> Thanks Dongjoon. >> >> On Sun, 2 Feb 2020, 09:08 Dongjoon Hyun, wrote: >> >>> Hi, All. >>> >>> From Today, we have `branch-3.0` as

Re: [FYI] `Target Version` on `Improvement`/`New Feature` JIRA issues

2020-02-01 Thread Xiao Li
Thanks! Dongjoon. Xiao On Sat, Feb 1, 2020 at 5:15 PM Hyukjin Kwon wrote: > Thanks Dongjoon. > > On Sun, 2 Feb 2020, 09:08 Dongjoon Hyun, wrote: > >> Hi, All. >> >> From Today, we have `branch-3.0` as a tool of `Feature Freeze`. >> >> https://github.com/apache/spark/tree/branch-3.0 >> >> A

Re: [FYI] `Target Version` on `Improvement`/`New Feature` JIRA issues

2020-02-01 Thread Hyukjin Kwon
Thanks Dongjoon. On Sun, 2 Feb 2020, 09:08 Dongjoon Hyun, wrote: > Hi, All. > > From Today, we have `branch-3.0` as a tool of `Feature Freeze`. > > https://github.com/apache/spark/tree/branch-3.0 > > All open JIRA issues whose type is `Improvement` or `New Feature` and had > `3.0.0` as a `Ta

[FYI] `Target Version` on `Improvement`/`New Feature` JIRA issues

2020-02-01 Thread Dongjoon Hyun
Hi, All. >From Today, we have `branch-3.0` as a tool of `Feature Freeze`. https://github.com/apache/spark/tree/branch-3.0 All open JIRA issues whose type is `Improvement` or `New Feature` and had `3.0.0` as a `Target Version` are changed accordingly first. - Most of them are re-targeted

Re: [FYI] SBT Build Failure

2020-01-18 Thread Manu Zhang
Thanks Dongjoon for the information and the fix in https://github.com/apache/spark/pull/27242 On Fri, Jan 17, 2020 at 6:58 AM Sean Owen wrote: > Ah. The Maven build already long since points at https:// for > resolution for security. I tried just overriding the resolver for the > SBT build, but

Re: [FYI] SBT Build Failure

2020-01-16 Thread Sean Owen
Ah. The Maven build already long since points at https:// for resolution for security. I tried just overriding the resolver for the SBT build, but it doesn't seem to work. I don't understand the SBT build well enough to debug right now. I think it's possible to override resolvers with local config

[FYI] SBT Build Failure

2020-01-16 Thread Dongjoon Hyun
Hi, All. As of now, Apache Spark sbt build is broken by the Maven Central repository policy. - https://stackoverflow.com/questions/59764749/requests-to-http-repo1-maven-org-maven2-return-a-501-https-required-status-an > Effective January 15, 2020, The Central Maven Repository no longer supports

Re: FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2019-12-06 Thread Takeshi Yamamuro
Oh, looks nice. Thanks for the sharing, Dongjoon Bests, Takeshi On Sat, Dec 7, 2019 at 3:35 AM Dongjoon Hyun wrote: > Hi, All. > > I want to share the following change to the community. > > SPARK-30098 Use default datasource as provider for CREATE TABLE syntax > > This is merged today and n

FYI: SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2019-12-06 Thread Dongjoon Hyun
Hi, All. I want to share the following change to the community. SPARK-30098 Use default datasource as provider for CREATE TABLE syntax This is merged today and now Spark's `CREATE TABLE` is using Spark's default data sources instead of `hive` provider. This is a good and big improvement for

Re: FYI - filed bunch of issues for flaky tests in recent CI builds

2019-09-18 Thread Gabor Somogyi
Had a look at the Kafka test(SPARK-29136 ) and commented. BR, G On Wed, Sep 18, 2019 at 7:54 AM Jungtaek Lim wrote: > Hi devs, > > I've found bunch of test failures (intermittently) in both CI build for > master branch as well as PR builder (o

  1   2   >