Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Wenchen Fan
UPDATE: I've successfully uploaded the release packages: https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview1-rc1-bin/ (I skipped SparkR as I was not able to fix the errors, I'll get back to it later) However, there is a new issue with doc building:

Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Dongjoon Hyun
Please re-try to upload, Wenchen. ASF Infra team bumped up our upload limit based on our request. > Your upload limit has been increased to 650MB Dongjoon. On Thu, May 9, 2024 at 8:12 AM Wenchen Fan wrote: > I've created a ticket: https://issues.apache.org/jira/browse/INFRA-25776 > > On

Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Wenchen Fan
I've created a ticket: https://issues.apache.org/jira/browse/INFRA-25776 On Thu, May 9, 2024 at 11:06 PM Dongjoon Hyun wrote: > In addition, FYI, I was the latest release manager with Apache Spark 3.4.3 > (2024-04-15 Vote) > > According to my work log, I uploaded the following binaries to SVN

Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Dongjoon Hyun
In addition, FYI, I was the latest release manager with Apache Spark 3.4.3 (2024-04-15 Vote) According to my work log, I uploaded the following binaries to SVN from EC2 (us-west-2) without any issues. -rw-r--r--. 1 centos centos 311384003 Apr 15 01:29 pyspark-3.4.3.tar.gz -rw-r--r--. 1 centos

Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Dongjoon Hyun
gt; YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>> >>>>>>> >>>>>>> On Tue, May 7, 2024 at 10:55 AM Nimrod Ofek >>>>>>> wrote: >>>>>>> >>&g

Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Wenchen Fan
CD process on a build server? >>>>>>> >>>>>>> Thanks, >>>>>>> Nimrod >>>>>>> >>>>>>> On Tue, May 7, 2024 at 8:50 PM Wenchen Fan >>>>>>> wrote: >>>>>>> >

Re: [DISCUSS] Spark 4.0.0 release

2024-05-08 Thread Holden Karau
>>>> >>>>>> On Tue, May 7, 2024 at 8:50 PM Wenchen Fan >>>>>> wrote: >>>>>> >>>>>>> UPDATE: >>>>>>> >>>>>>> Unfortunately, it took me quite some time to set up my laptop and &g

Re: [DISCUSS] Spark 4.0.0 release

2024-05-08 Thread Erik Krogen
;>>> wrote: >>>>> >>>>>> UPDATE: >>>>>> >>>>>> Unfortunately, it took me quite some time to set up my laptop and get >>>>>> it ready for the release process (docker desktop doesn't work

Re: [DISCUSS] Spark 4.0.0 release

2024-05-08 Thread Nimrod Ofek
at my tomorrow. Thanks >>>>> for your patience! >>>>> >>>>> Wenchen >>>>> >>>>> On Fri, May 3, 2024 at 7:47 AM yangjie01 wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> >>>&g

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau
t; for your patience! >>>> >>>> Wenchen >>>> >>>> On Fri, May 3, 2024 at 7:47 AM yangjie01 wrote: >>>> >>>>> +1 >>>>> >>>>> >>>>> >>>>> *发件人**: *Jungtaek Lim &g

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Nimrod Ofek
ie01 wrote: >>> >>>> +1 >>>> >>>> >>>> >>>> *发件人**: *Jungtaek Lim >>>> *日期**: *2024年5月2日 星期四 10:21 >>>> *收件人**: *Holden Karau >>>> *抄送**: *Chao Sun , Xiao Li , >>>> Tathagata Das , Wenche

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau
gtaek Lim >>> *日期**: *2024年5月2日 星期四 10:21 >>> *收件人**: *Holden Karau >>> *抄送**: *Chao Sun , Xiao Li , >>> Tathagata Das , Wenchen Fan < >>> cloud0...@gmail.com>, Cheng Pan , Nicholas Chammas < >>> nicholas.cham...@gmail.com>, D

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Dongjoon Hyun
olden Karau >> *抄送**: *Chao Sun , Xiao Li , >> Tathagata Das , Wenchen Fan < >> cloud0...@gmail.com>, Cheng Pan , Nicholas Chammas < >> nicholas.cham...@gmail.com>, Dongjoon Hyun , >> Cheng Pan , Spark dev list , >> Anish Shrigondekar >> *主题**: *Re

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Nimrod Ofek
>> *发件人**: *Jungtaek Lim >> *日期**: *2024年5月2日 星期四 10:21 >> *收件人**: *Holden Karau >> *抄送**: *Chao Sun , Xiao Li , >> Tathagata Das , Wenchen Fan < >> cloud0...@gmail.com>, Cheng Pan , Nicholas Chammas < >> nicholas.cham...@gmail.com>, Dong

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Wenchen Fan
Dongjoon Hyun , > Cheng Pan , Spark dev list , > Anish Shrigondekar > *主题**: *Re: [DISCUSS] Spark 4.0.0 release > > > > +1 love to see it! > > > > On Thu, May 2, 2024 at 10:08 AM Holden Karau > wrote: > > +1 :) yay previews > > > > On Wed, May 1, 20

Re: [DISCUSS] Spark 4.0.0 release

2024-05-02 Thread yangjie01
+1 发件人: Jungtaek Lim 日期: 2024年5月2日 星期四 10:21 收件人: Holden Karau 抄送: Chao Sun , Xiao Li , Tathagata Das , Wenchen Fan , Cheng Pan , Nicholas Chammas , Dongjoon Hyun , Cheng Pan , Spark dev list , Anish Shrigondekar 主题: Re: [DISCUSS] Spark 4.0.0 release +1 love to see it! On Thu, May 2

Re: [DISCUSS] Spark 4.0.0 release

2024-05-02 Thread Mich Talebzadeh
- Integration with additional external data sources or systems, say Hive - Enhancements to the Spark UI for improved monitoring and debugging - Enhancements to machine learning (MLlib) algorithms and capabilities, like TensorFlow or PyTorch,( if any in the pipeline) HTH Mich

Re: [DISCUSS] Spark 4.0.0 release

2024-05-02 Thread Steve Loughran
There's a new parquet RC up this week which would be good to pull in. On Thu, 2 May 2024 at 03:20, Jungtaek Lim wrote: > +1 love to see it! > > On Thu, May 2, 2024 at 10:08 AM Holden Karau > wrote: > >> +1 :) yay previews >> >> On Wed, May 1, 2024 at 5:36 PM Chao Sun wrote: >> >>> +1 >>> >>>

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Jungtaek Lim
+1 love to see it! On Thu, May 2, 2024 at 10:08 AM Holden Karau wrote: > +1 :) yay previews > > On Wed, May 1, 2024 at 5:36 PM Chao Sun wrote: > >> +1 >> >> On Wed, May 1, 2024 at 5:23 PM Xiao Li wrote: >> >>> +1 for next Monday. >>> >>> We can do more previews when the other features are

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Holden Karau
+1 :) yay previews On Wed, May 1, 2024 at 5:36 PM Chao Sun wrote: > +1 > > On Wed, May 1, 2024 at 5:23 PM Xiao Li wrote: > >> +1 for next Monday. >> >> We can do more previews when the other features are ready for preview. >> >> Tathagata Das 于2024年5月1日周三 08:46写道: >> >>> Next week sounds

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Chao Sun
+1 On Wed, May 1, 2024 at 5:23 PM Xiao Li wrote: > +1 for next Monday. > > We can do more previews when the other features are ready for preview. > > Tathagata Das 于2024年5月1日周三 08:46写道: > >> Next week sounds great! Thank you Wenchen! >> >> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan wrote: >>

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Hyukjin Kwon
SGTM On Thu, 2 May 2024 at 02:06, Dongjoon Hyun wrote: > +1 for next Monday. > > Dongjoon. > > On Wed, May 1, 2024 at 8:46 AM Tathagata Das > wrote: > >> Next week sounds great! Thank you Wenchen! >> >> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan wrote: >> >>> Yea I think a preview release

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Xiao Li
+1 for next Monday. We can do more previews when the other features are ready for preview. Tathagata Das 于2024年5月1日周三 08:46写道: > Next week sounds great! Thank you Wenchen! > > On Wed, May 1, 2024 at 11:16 AM Wenchen Fan wrote: > >> Yea I think a preview release won't hurt (without a branch

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Dongjoon Hyun
+1 for next Monday. Dongjoon. On Wed, May 1, 2024 at 8:46 AM Tathagata Das wrote: > Next week sounds great! Thank you Wenchen! > > On Wed, May 1, 2024 at 11:16 AM Wenchen Fan wrote: > >> Yea I think a preview release won't hurt (without a branch cut). We don't >> need to wait for all the

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Tathagata Das
Next week sounds great! Thank you Wenchen! On Wed, May 1, 2024 at 11:16 AM Wenchen Fan wrote: > Yea I think a preview release won't hurt (without a branch cut). We don't > need to wait for all the ongoing projects to be ready. How about we do a > 4.0 preview release based on the current master

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Wenchen Fan
Yea I think a preview release won't hurt (without a branch cut). We don't need to wait for all the ongoing projects to be ready. How about we do a 4.0 preview release based on the current master branch next Monday? On Wed, May 1, 2024 at 11:06 PM Tathagata Das wrote: > Hey all, > > Reviving

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Tathagata Das
Hey all, Reviving this thread, but Spark master has already accumulated a huge amount of changes. As a downstream project maintainer, I want to really start testing the new features and other breaking changes, and it's hard to do that without a Preview release. So the sooner we make a Preview

Re: [DISCUSS] Spark 4.0.0 release

2024-04-17 Thread Wenchen Fan
Thank you all for the replies! To @Nicholas Chammas : Thanks for cleaning up the error terminology and documentation! I've merged the first PR and let's finish others before the 4.0 release. To @Dongjoon Hyun : Thanks for driving the ANSI on by default effort! Now the vote has passed, let's

Re: [DISCUSS] Spark 4.0.0 release

2024-04-16 Thread Cheng Pan
will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0? Thanks, Cheng Pan > On Apr 15, 2024, at 09:58, Jungtaek Lim wrote: > > W.r.t. state data source - reader (SPARK-45511), there are several follow-up > tickets, but we don't plan to address them soon. The current

Re: [DISCUSS] Spark 4.0.0 release

2024-04-14 Thread Jungtaek Lim
W.r.t. state data source - reader (SPARK-45511 ), there are several follow-up tickets, but we don't plan to address them soon. The current implementation is the final shape for Spark 4.0.0, unless there are demands on the follow-up tickets. We

Re: [DISCUSS] Spark 4.0.0 release

2024-04-12 Thread Dongjoon Hyun
Thank you for volunteering, Wenchen. Dongjoon. On 2024/04/12 15:11:04 Wenchen Fan wrote: > Hi all, > > It's close to the previously proposed 4.0.0 release date (June 2024), and I > think it's time to prepare for it and discuss the ongoing projects: > >- ANSI by default >- Spark Connect

[DISCUSS] Spark 4.0.0 release

2024-04-12 Thread Wenchen Fan
Hi all, It's close to the previously proposed 4.0.0 release date (June 2024), and I think it's time to prepare for it and discuss the ongoing projects: - ANSI by default - Spark Connect GA - Structured Logging - Streaming state store data source - new data type VARIANT - STRING