Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Wenchen Fan
UPDATE:

I've successfully uploaded the release packages:
https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview1-rc1-bin/
(I skipped SparkR as I was not able to fix the errors, I'll get back to it
later)

However, there is a new issue with doc building:
https://github.com/apache/spark/pull/44628#discussion_r1595718574

I'll continue after the issue is fixed.

On Fri, May 10, 2024 at 12:29 AM Dongjoon Hyun 
wrote:

> Please re-try to upload, Wenchen. ASF Infra team bumped up our upload
> limit based on our request.
>
> > Your upload limit has been increased to 650MB
>
> Dongjoon.
>
>
>
> On Thu, May 9, 2024 at 8:12 AM Wenchen Fan  wrote:
>
>> I've created a ticket: https://issues.apache.org/jira/browse/INFRA-25776
>>
>> On Thu, May 9, 2024 at 11:06 PM Dongjoon Hyun 
>> wrote:
>>
>>> In addition, FYI, I was the latest release manager with Apache Spark
>>> 3.4.3 (2024-04-15 Vote)
>>>
>>> According to my work log, I uploaded the following binaries to SVN from
>>> EC2 (us-west-2) without any issues.
>>>
>>> -rw-r--r--.  1 centos centos 311384003 Apr 15 01:29 pyspark-3.4.3.tar.gz
>>> -rw-r--r--.  1 centos centos 397870995 Apr 15 00:44
>>> spark-3.4.3-bin-hadoop3-scala2.13.tgz
>>> -rw-r--r--.  1 centos centos 388930980 Apr 15 01:29
>>> spark-3.4.3-bin-hadoop3.tgz
>>> -rw-r--r--.  1 centos centos 300786123 Apr 15 01:04
>>> spark-3.4.3-bin-without-hadoop.tgz
>>> -rw-r--r--.  1 centos centos  32219044 Apr 15 00:23 spark-3.4.3.tgz
>>> -rw-r--r--.  1 centos centos356749 Apr 15 01:29 SparkR_3.4.3.tar.gz
>>>
>>> Since Apache Spark 4.0.0-preview doesn't have Scala 2.12 combination,
>>> the total size should be smaller than 3.4.3 binaires.
>>>
>>> Given that, if there is any INFRA change, that could happen after 4/15.
>>>
>>> Dongjoon.
>>>
>>> On Thu, May 9, 2024 at 7:57 AM Dongjoon Hyun 
>>> wrote:
>>>
 Could you file an INFRA JIRA issue with the error message and context
 first, Wenchen?

 As you know, if we see something, we had better file a JIRA issue
 because it could be not only an Apache Spark project issue but also all ASF
 project issues.

 Dongjoon.


 On Thu, May 9, 2024 at 12:28 AM Wenchen Fan 
 wrote:

> UPDATE:
>
> After resolving a few issues in the release scripts, I can finally
> build the release packages. However, I can't upload them to the staging 
> SVN
> repo due to a transmitting error, and it seems like a limitation from the
> server side. I tried it on both my local laptop and remote AWS instance,
> but neither works. These package binaries are like 300-400 MBs, and we 
> just
> did a release last month. Not sure if this is a new limitation due to cost
> saving.
>
> While I'm looking for help to get unblocked, I'm wondering if we can
> upload release packages to a public git repo instead, under the Apache
> account?
>
>>
>>


Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Dongjoon Hyun
Please re-try to upload, Wenchen. ASF Infra team bumped up our upload limit
based on our request.

> Your upload limit has been increased to 650MB

Dongjoon.



On Thu, May 9, 2024 at 8:12 AM Wenchen Fan  wrote:

> I've created a ticket: https://issues.apache.org/jira/browse/INFRA-25776
>
> On Thu, May 9, 2024 at 11:06 PM Dongjoon Hyun 
> wrote:
>
>> In addition, FYI, I was the latest release manager with Apache Spark
>> 3.4.3 (2024-04-15 Vote)
>>
>> According to my work log, I uploaded the following binaries to SVN from
>> EC2 (us-west-2) without any issues.
>>
>> -rw-r--r--.  1 centos centos 311384003 Apr 15 01:29 pyspark-3.4.3.tar.gz
>> -rw-r--r--.  1 centos centos 397870995 Apr 15 00:44
>> spark-3.4.3-bin-hadoop3-scala2.13.tgz
>> -rw-r--r--.  1 centos centos 388930980 Apr 15 01:29
>> spark-3.4.3-bin-hadoop3.tgz
>> -rw-r--r--.  1 centos centos 300786123 Apr 15 01:04
>> spark-3.4.3-bin-without-hadoop.tgz
>> -rw-r--r--.  1 centos centos  32219044 Apr 15 00:23 spark-3.4.3.tgz
>> -rw-r--r--.  1 centos centos356749 Apr 15 01:29 SparkR_3.4.3.tar.gz
>>
>> Since Apache Spark 4.0.0-preview doesn't have Scala 2.12 combination, the
>> total size should be smaller than 3.4.3 binaires.
>>
>> Given that, if there is any INFRA change, that could happen after 4/15.
>>
>> Dongjoon.
>>
>> On Thu, May 9, 2024 at 7:57 AM Dongjoon Hyun 
>> wrote:
>>
>>> Could you file an INFRA JIRA issue with the error message and context
>>> first, Wenchen?
>>>
>>> As you know, if we see something, we had better file a JIRA issue
>>> because it could be not only an Apache Spark project issue but also all ASF
>>> project issues.
>>>
>>> Dongjoon.
>>>
>>>
>>> On Thu, May 9, 2024 at 12:28 AM Wenchen Fan  wrote:
>>>
 UPDATE:

 After resolving a few issues in the release scripts, I can finally
 build the release packages. However, I can't upload them to the staging SVN
 repo due to a transmitting error, and it seems like a limitation from the
 server side. I tried it on both my local laptop and remote AWS instance,
 but neither works. These package binaries are like 300-400 MBs, and we just
 did a release last month. Not sure if this is a new limitation due to cost
 saving.

 While I'm looking for help to get unblocked, I'm wondering if we can
 upload release packages to a public git repo instead, under the Apache
 account?

>
>


Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Wenchen Fan
I've created a ticket: https://issues.apache.org/jira/browse/INFRA-25776

On Thu, May 9, 2024 at 11:06 PM Dongjoon Hyun 
wrote:

> In addition, FYI, I was the latest release manager with Apache Spark 3.4.3
> (2024-04-15 Vote)
>
> According to my work log, I uploaded the following binaries to SVN from
> EC2 (us-west-2) without any issues.
>
> -rw-r--r--.  1 centos centos 311384003 Apr 15 01:29 pyspark-3.4.3.tar.gz
> -rw-r--r--.  1 centos centos 397870995 Apr 15 00:44
> spark-3.4.3-bin-hadoop3-scala2.13.tgz
> -rw-r--r--.  1 centos centos 388930980 Apr 15 01:29
> spark-3.4.3-bin-hadoop3.tgz
> -rw-r--r--.  1 centos centos 300786123 Apr 15 01:04
> spark-3.4.3-bin-without-hadoop.tgz
> -rw-r--r--.  1 centos centos  32219044 Apr 15 00:23 spark-3.4.3.tgz
> -rw-r--r--.  1 centos centos356749 Apr 15 01:29 SparkR_3.4.3.tar.gz
>
> Since Apache Spark 4.0.0-preview doesn't have Scala 2.12 combination, the
> total size should be smaller than 3.4.3 binaires.
>
> Given that, if there is any INFRA change, that could happen after 4/15.
>
> Dongjoon.
>
> On Thu, May 9, 2024 at 7:57 AM Dongjoon Hyun 
> wrote:
>
>> Could you file an INFRA JIRA issue with the error message and context
>> first, Wenchen?
>>
>> As you know, if we see something, we had better file a JIRA issue because
>> it could be not only an Apache Spark project issue but also all ASF project
>> issues.
>>
>> Dongjoon.
>>
>>
>> On Thu, May 9, 2024 at 12:28 AM Wenchen Fan  wrote:
>>
>>> UPDATE:
>>>
>>> After resolving a few issues in the release scripts, I can finally build
>>> the release packages. However, I can't upload them to the staging SVN repo
>>> due to a transmitting error, and it seems like a limitation from the server
>>> side. I tried it on both my local laptop and remote AWS instance, but
>>> neither works. These package binaries are like 300-400 MBs, and we just did
>>> a release last month. Not sure if this is a new limitation due to cost
>>> saving.
>>>
>>> While I'm looking for help to get unblocked, I'm wondering if we can
>>> upload release packages to a public git repo instead, under the Apache
>>> account?
>>>




Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Dongjoon Hyun
In addition, FYI, I was the latest release manager with Apache Spark 3.4.3
(2024-04-15 Vote)

According to my work log, I uploaded the following binaries to SVN from EC2
(us-west-2) without any issues.

-rw-r--r--.  1 centos centos 311384003 Apr 15 01:29 pyspark-3.4.3.tar.gz
-rw-r--r--.  1 centos centos 397870995 Apr 15 00:44
spark-3.4.3-bin-hadoop3-scala2.13.tgz
-rw-r--r--.  1 centos centos 388930980 Apr 15 01:29
spark-3.4.3-bin-hadoop3.tgz
-rw-r--r--.  1 centos centos 300786123 Apr 15 01:04
spark-3.4.3-bin-without-hadoop.tgz
-rw-r--r--.  1 centos centos  32219044 Apr 15 00:23 spark-3.4.3.tgz
-rw-r--r--.  1 centos centos356749 Apr 15 01:29 SparkR_3.4.3.tar.gz

Since Apache Spark 4.0.0-preview doesn't have Scala 2.12 combination, the
total size should be smaller than 3.4.3 binaires.

Given that, if there is any INFRA change, that could happen after 4/15.

Dongjoon.

On Thu, May 9, 2024 at 7:57 AM Dongjoon Hyun 
wrote:

> Could you file an INFRA JIRA issue with the error message and context
> first, Wenchen?
>
> As you know, if we see something, we had better file a JIRA issue because
> it could be not only an Apache Spark project issue but also all ASF project
> issues.
>
> Dongjoon.
>
>
> On Thu, May 9, 2024 at 12:28 AM Wenchen Fan  wrote:
>
>> UPDATE:
>>
>> After resolving a few issues in the release scripts, I can finally build
>> the release packages. However, I can't upload them to the staging SVN repo
>> due to a transmitting error, and it seems like a limitation from the server
>> side. I tried it on both my local laptop and remote AWS instance, but
>> neither works. These package binaries are like 300-400 MBs, and we just did
>> a release last month. Not sure if this is a new limitation due to cost
>> saving.
>>
>> While I'm looking for help to get unblocked, I'm wondering if we can
>> upload release packages to a public git repo instead, under the Apache
>> account?
>>
>>>
>>>


Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Dongjoon Hyun
;>>> safe
>>>>>>> (there was some concern from earlier release processes).
>>>>>>>
>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>
>>>>>>>
>>>>>>> On Tue, May 7, 2024 at 10:55 AM Nimrod Ofek 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Sorry for the novice question, Wenchen - the release is done
>>>>>>>> manually from a laptop? Not using a CI CD process on a build server?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Nimrod
>>>>>>>>
>>>>>>>> On Tue, May 7, 2024 at 8:50 PM Wenchen Fan 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> UPDATE:
>>>>>>>>>
>>>>>>>>> Unfortunately, it took me quite some time to set up my laptop and
>>>>>>>>> get it ready for the release process (docker desktop doesn't work 
>>>>>>>>> anymore,
>>>>>>>>> my pgp key is lost, etc.). I'll start the RC process at my tomorrow. 
>>>>>>>>> Thanks
>>>>>>>>> for your patience!
>>>>>>>>>
>>>>>>>>> Wenchen
>>>>>>>>>
>>>>>>>>> On Fri, May 3, 2024 at 7:47 AM yangjie01 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *发件人**: *Jungtaek Lim 
>>>>>>>>>> *日期**: *2024年5月2日 星期四 10:21
>>>>>>>>>> *收件人**: *Holden Karau 
>>>>>>>>>> *抄送**: *Chao Sun , Xiao Li <
>>>>>>>>>> gatorsm...@gmail.com>, Tathagata Das ,
>>>>>>>>>> Wenchen Fan , Cheng Pan ,
>>>>>>>>>> Nicholas Chammas , Dongjoon Hyun <
>>>>>>>>>> dongjoon.h...@gmail.com>, Cheng Pan , Spark
>>>>>>>>>> dev list , Anish Shrigondekar <
>>>>>>>>>> anish.shrigonde...@databricks.com>
>>>>>>>>>> *主题**: *Re: [DISCUSS] Spark 4.0.0 release
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> +1 love to see it!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, May 2, 2024 at 10:08 AM Holden Karau <
>>>>>>>>>> holden.ka...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> +1 :) yay previews
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, May 1, 2024 at 5:36 PM Chao Sun 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, May 1, 2024 at 5:23 PM Xiao Li 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> +1 for next Monday.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We can do more previews when the other features are ready for
>>>>>>>>>> preview.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Tathagata Das  于2024年5月1日周三 08:46写道:
>>>>>>>>>>
>>>>>>>>>> Next week sounds great! Thank you Wenchen!
>>>>>>>>>>
>>>>>>>>>>
>>>>&g

Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Wenchen Fan
chen - the release is done
>>>>>>> manually from a laptop? Not using a CI CD process on a build server?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Nimrod
>>>>>>>
>>>>>>> On Tue, May 7, 2024 at 8:50 PM Wenchen Fan 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> UPDATE:
>>>>>>>>
>>>>>>>> Unfortunately, it took me quite some time to set up my laptop and
>>>>>>>> get it ready for the release process (docker desktop doesn't work 
>>>>>>>> anymore,
>>>>>>>> my pgp key is lost, etc.). I'll start the RC process at my tomorrow. 
>>>>>>>> Thanks
>>>>>>>> for your patience!
>>>>>>>>
>>>>>>>> Wenchen
>>>>>>>>
>>>>>>>> On Fri, May 3, 2024 at 7:47 AM yangjie01 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *发件人**: *Jungtaek Lim 
>>>>>>>>> *日期**: *2024年5月2日 星期四 10:21
>>>>>>>>> *收件人**: *Holden Karau 
>>>>>>>>> *抄送**: *Chao Sun , Xiao Li <
>>>>>>>>> gatorsm...@gmail.com>, Tathagata Das ,
>>>>>>>>> Wenchen Fan , Cheng Pan ,
>>>>>>>>> Nicholas Chammas , Dongjoon Hyun <
>>>>>>>>> dongjoon.h...@gmail.com>, Cheng Pan , Spark
>>>>>>>>> dev list , Anish Shrigondekar <
>>>>>>>>> anish.shrigonde...@databricks.com>
>>>>>>>>> *主题**: *Re: [DISCUSS] Spark 4.0.0 release
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> +1 love to see it!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, May 2, 2024 at 10:08 AM Holden Karau <
>>>>>>>>> holden.ka...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> +1 :) yay previews
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 1, 2024 at 5:36 PM Chao Sun 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 1, 2024 at 5:23 PM Xiao Li 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> +1 for next Monday.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> We can do more previews when the other features are ready for
>>>>>>>>> preview.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Tathagata Das  于2024年5月1日周三 08:46写道:
>>>>>>>>>
>>>>>>>>> Next week sounds great! Thank you Wenchen!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Yea I think a preview release won't hurt (without a branch cut).
>>>>>>>>> We don't need to wait for all the ongoing projects to be ready. How 
>>>>>>>>> about
>>>>>>>>> we do a 4.0 preview release based on the current master branch next 
>>>>>>>>> Monday?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, May 1, 2024 at 11:06 PM Tathagata Das <
>>>>>>>>> tathagata.das1...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hey all,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>

Re: [DISCUSS] Spark 4.0.0 release

2024-05-08 Thread Holden Karau
That looks cool, maybe let’s split off a thread on how to improve our
release processes?

Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


On Wed, May 8, 2024 at 9:31 AM Erik Krogen  wrote:

> On that note, GitHub recently released (public preview) a new feature
> called Artifact Attestions which may be relevant/useful here: Introducing
> Artifact Attestations–now in public beta - The GitHub Blog
> <https://github.blog/2024-05-02-introducing-artifact-attestations-now-in-public-beta/>
>
> On Wed, May 8, 2024 at 9:06 AM Nimrod Ofek  wrote:
>
>> I have no permissions so I can't do it but I'm happy to help (although I
>> am more familiar with Gitlab CICD than Github Actions).
>> Is there some point of contact that can provide me needed context and
>> permissions?
>> I'd also love to see why the costs are high and see how we can reduce
>> them...
>>
>> Thanks,
>> Nimrod
>>
>> On Wed, May 8, 2024 at 8:26 AM Holden Karau 
>> wrote:
>>
>>> I think signing the artifacts produced from a secure CI sounds like a
>>> good idea. I know we’ve been asked to reduce our GitHub action usage but
>>> perhaps someone interested could volunteer to set that up.
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>> On Tue, May 7, 2024 at 9:43 PM Nimrod Ofek 
>>> wrote:
>>>
>>>> Hi,
>>>> Thanks for the reply.
>>>>
>>>> From my experience, a build on a build server would be much more
>>>> predictable and less error prone than building on some laptop- and of
>>>> course much faster to have builds, snapshots, release candidates, early
>>>> previews releases, release candidates or final releases.
>>>> It will enable us to have a preview version with current changes-
>>>> snapshot version, either automatically every day or if we need to save
>>>> costs (although build is really not expensive) - with a click of a button.
>>>>
>>>> Regarding keys for signing. - that's what vaults are for, all across
>>>> the industry we are using vaults (such as hashicorp vault)- but if the
>>>> build will be automated and the only thing which will be manual is to sign
>>>> the release for security reasons that would be reasonable.
>>>>
>>>> Thanks,
>>>> Nimrod
>>>>
>>>>
>>>> בתאריך יום ד׳, 8 במאי 2024, 00:54, מאת Holden Karau ‏<
>>>> holden.ka...@gmail.com>:
>>>>
>>>>> Indeed. We could conceivably build the release in CI/CD but the final
>>>>> verification / signing should be done locally to keep the keys safe (there
>>>>> was some concern from earlier release processes).
>>>>>
>>>>> Twitter: https://twitter.com/holdenkarau
>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>
>>>>>
>>>>> On Tue, May 7, 2024 at 10:55 AM Nimrod Ofek 
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Sorry for the novice question, Wenchen - the release is done manually
>>>>>> from a laptop? Not using a CI CD process on a build server?
>>>>>>
>>>>>> Thanks,
>>>>>> Nimrod
>>>>>>
>>>>>> On Tue, May 7, 2024 at 8:50 PM Wenchen Fan 
>>>>>> wrote:
>>>>>>
>>>>>>> UPDATE:
>>>>>>>
>>>>>>> Unfortunately, it took me quite some time to set up my laptop and
>>>>>>> get it ready for the release process (docker desktop doesn't work 
>>>>>>> anymore,
>>>>>>> my pgp key is lost, etc.). I'll start the RC process at my tomorrow. 
>>>>>>> Thanks
>>>>>>> for your patience!
>>>>>>>
>>>>>>> Wenchen
>>>>>>>
>>>>>>> On Fri, May 3, 2024 at 7:47 AM yangjie01 
>>>>>>

Re: [DISCUSS] Spark 4.0.0 release

2024-05-08 Thread Erik Krogen
On that note, GitHub recently released (public preview) a new feature
called Artifact Attestions which may be relevant/useful here: Introducing
Artifact Attestations–now in public beta - The GitHub Blog
<https://github.blog/2024-05-02-introducing-artifact-attestations-now-in-public-beta/>

On Wed, May 8, 2024 at 9:06 AM Nimrod Ofek  wrote:

> I have no permissions so I can't do it but I'm happy to help (although I
> am more familiar with Gitlab CICD than Github Actions).
> Is there some point of contact that can provide me needed context and
> permissions?
> I'd also love to see why the costs are high and see how we can reduce
> them...
>
> Thanks,
> Nimrod
>
> On Wed, May 8, 2024 at 8:26 AM Holden Karau 
> wrote:
>
>> I think signing the artifacts produced from a secure CI sounds like a
>> good idea. I know we’ve been asked to reduce our GitHub action usage but
>> perhaps someone interested could volunteer to set that up.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Tue, May 7, 2024 at 9:43 PM Nimrod Ofek  wrote:
>>
>>> Hi,
>>> Thanks for the reply.
>>>
>>> From my experience, a build on a build server would be much more
>>> predictable and less error prone than building on some laptop- and of
>>> course much faster to have builds, snapshots, release candidates, early
>>> previews releases, release candidates or final releases.
>>> It will enable us to have a preview version with current changes-
>>> snapshot version, either automatically every day or if we need to save
>>> costs (although build is really not expensive) - with a click of a button.
>>>
>>> Regarding keys for signing. - that's what vaults are for, all across the
>>> industry we are using vaults (such as hashicorp vault)- but if the build
>>> will be automated and the only thing which will be manual is to sign the
>>> release for security reasons that would be reasonable.
>>>
>>> Thanks,
>>> Nimrod
>>>
>>>
>>> בתאריך יום ד׳, 8 במאי 2024, 00:54, מאת Holden Karau ‏<
>>> holden.ka...@gmail.com>:
>>>
>>>> Indeed. We could conceivably build the release in CI/CD but the final
>>>> verification / signing should be done locally to keep the keys safe (there
>>>> was some concern from earlier release processes).
>>>>
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>>
>>>> On Tue, May 7, 2024 at 10:55 AM Nimrod Ofek 
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Sorry for the novice question, Wenchen - the release is done manually
>>>>> from a laptop? Not using a CI CD process on a build server?
>>>>>
>>>>> Thanks,
>>>>> Nimrod
>>>>>
>>>>> On Tue, May 7, 2024 at 8:50 PM Wenchen Fan 
>>>>> wrote:
>>>>>
>>>>>> UPDATE:
>>>>>>
>>>>>> Unfortunately, it took me quite some time to set up my laptop and get
>>>>>> it ready for the release process (docker desktop doesn't work anymore, my
>>>>>> pgp key is lost, etc.). I'll start the RC process at my tomorrow. Thanks
>>>>>> for your patience!
>>>>>>
>>>>>> Wenchen
>>>>>>
>>>>>> On Fri, May 3, 2024 at 7:47 AM yangjie01  wrote:
>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *发件人**: *Jungtaek Lim 
>>>>>>> *日期**: *2024年5月2日 星期四 10:21
>>>>>>> *收件人**: *Holden Karau 
>>>>>>> *抄送**: *Chao Sun , Xiao Li ,
>>>>>>> Tathagata Das , Wenchen Fan <
>>>>>>> cloud0...@gmail.com>, Cheng Pan , Nicholas
>>>>>>> Chammas , Dongjoon Hyun <
>>>>>>> dongjoon.h...@gmail.com>, Cheng Pan , Spark
>>>>>>> dev list , Anish Shrigondekar <
>>>>>>> anish.shrigonde...@databricks.com>
>

Re: [DISCUSS] Spark 4.0.0 release

2024-05-08 Thread Nimrod Ofek
I have no permissions so I can't do it but I'm happy to help (although I am
more familiar with Gitlab CICD than Github Actions).
Is there some point of contact that can provide me needed context and
permissions?
I'd also love to see why the costs are high and see how we can reduce
them...

Thanks,
Nimrod

On Wed, May 8, 2024 at 8:26 AM Holden Karau  wrote:

> I think signing the artifacts produced from a secure CI sounds like a good
> idea. I know we’ve been asked to reduce our GitHub action usage but perhaps
> someone interested could volunteer to set that up.
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Tue, May 7, 2024 at 9:43 PM Nimrod Ofek  wrote:
>
>> Hi,
>> Thanks for the reply.
>>
>> From my experience, a build on a build server would be much more
>> predictable and less error prone than building on some laptop- and of
>> course much faster to have builds, snapshots, release candidates, early
>> previews releases, release candidates or final releases.
>> It will enable us to have a preview version with current changes-
>> snapshot version, either automatically every day or if we need to save
>> costs (although build is really not expensive) - with a click of a button.
>>
>> Regarding keys for signing. - that's what vaults are for, all across the
>> industry we are using vaults (such as hashicorp vault)- but if the build
>> will be automated and the only thing which will be manual is to sign the
>> release for security reasons that would be reasonable.
>>
>> Thanks,
>> Nimrod
>>
>>
>> בתאריך יום ד׳, 8 במאי 2024, 00:54, מאת Holden Karau ‏<
>> holden.ka...@gmail.com>:
>>
>>> Indeed. We could conceivably build the release in CI/CD but the final
>>> verification / signing should be done locally to keep the keys safe (there
>>> was some concern from earlier release processes).
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>>
>>> On Tue, May 7, 2024 at 10:55 AM Nimrod Ofek 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Sorry for the novice question, Wenchen - the release is done manually
>>>> from a laptop? Not using a CI CD process on a build server?
>>>>
>>>> Thanks,
>>>> Nimrod
>>>>
>>>> On Tue, May 7, 2024 at 8:50 PM Wenchen Fan  wrote:
>>>>
>>>>> UPDATE:
>>>>>
>>>>> Unfortunately, it took me quite some time to set up my laptop and get
>>>>> it ready for the release process (docker desktop doesn't work anymore, my
>>>>> pgp key is lost, etc.). I'll start the RC process at my tomorrow. Thanks
>>>>> for your patience!
>>>>>
>>>>> Wenchen
>>>>>
>>>>> On Fri, May 3, 2024 at 7:47 AM yangjie01  wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>>
>>>>>>
>>>>>> *发件人**: *Jungtaek Lim 
>>>>>> *日期**: *2024年5月2日 星期四 10:21
>>>>>> *收件人**: *Holden Karau 
>>>>>> *抄送**: *Chao Sun , Xiao Li ,
>>>>>> Tathagata Das , Wenchen Fan <
>>>>>> cloud0...@gmail.com>, Cheng Pan , Nicholas
>>>>>> Chammas , Dongjoon Hyun <
>>>>>> dongjoon.h...@gmail.com>, Cheng Pan , Spark dev
>>>>>> list , Anish Shrigondekar <
>>>>>> anish.shrigonde...@databricks.com>
>>>>>> *主题**: *Re: [DISCUSS] Spark 4.0.0 release
>>>>>>
>>>>>>
>>>>>>
>>>>>> +1 love to see it!
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 2, 2024 at 10:08 AM Holden Karau 
>>>>>> wrote:
>>>>>>
>>>>>> +1 :) yay previews
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, May 1, 2024 at 5:36 PM Chao Sun  wrote:
>>>>>>
>>>>>> +1
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, May 1, 2024 at 5:23 PM Xiao Li  wrote:
>>>>>>

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau
I think signing the artifacts produced from a secure CI sounds like a good
idea. I know we’ve been asked to reduce our GitHub action usage but perhaps
someone interested could volunteer to set that up.

Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


On Tue, May 7, 2024 at 9:43 PM Nimrod Ofek  wrote:

> Hi,
> Thanks for the reply.
>
> From my experience, a build on a build server would be much more
> predictable and less error prone than building on some laptop- and of
> course much faster to have builds, snapshots, release candidates, early
> previews releases, release candidates or final releases.
> It will enable us to have a preview version with current changes- snapshot
> version, either automatically every day or if we need to save costs
> (although build is really not expensive) - with a click of a button.
>
> Regarding keys for signing. - that's what vaults are for, all across the
> industry we are using vaults (such as hashicorp vault)- but if the build
> will be automated and the only thing which will be manual is to sign the
> release for security reasons that would be reasonable.
>
> Thanks,
> Nimrod
>
>
> בתאריך יום ד׳, 8 במאי 2024, 00:54, מאת Holden Karau ‏<
> holden.ka...@gmail.com>:
>
>> Indeed. We could conceivably build the release in CI/CD but the final
>> verification / signing should be done locally to keep the keys safe (there
>> was some concern from earlier release processes).
>>
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>>
>> On Tue, May 7, 2024 at 10:55 AM Nimrod Ofek 
>> wrote:
>>
>>> Hi,
>>>
>>> Sorry for the novice question, Wenchen - the release is done manually
>>> from a laptop? Not using a CI CD process on a build server?
>>>
>>> Thanks,
>>> Nimrod
>>>
>>> On Tue, May 7, 2024 at 8:50 PM Wenchen Fan  wrote:
>>>
>>>> UPDATE:
>>>>
>>>> Unfortunately, it took me quite some time to set up my laptop and get
>>>> it ready for the release process (docker desktop doesn't work anymore, my
>>>> pgp key is lost, etc.). I'll start the RC process at my tomorrow. Thanks
>>>> for your patience!
>>>>
>>>> Wenchen
>>>>
>>>> On Fri, May 3, 2024 at 7:47 AM yangjie01  wrote:
>>>>
>>>>> +1
>>>>>
>>>>>
>>>>>
>>>>> *发件人**: *Jungtaek Lim 
>>>>> *日期**: *2024年5月2日 星期四 10:21
>>>>> *收件人**: *Holden Karau 
>>>>> *抄送**: *Chao Sun , Xiao Li ,
>>>>> Tathagata Das , Wenchen Fan <
>>>>> cloud0...@gmail.com>, Cheng Pan , Nicholas Chammas
>>>>> , Dongjoon Hyun ,
>>>>> Cheng Pan , Spark dev list ,
>>>>> Anish Shrigondekar 
>>>>> *主题**: *Re: [DISCUSS] Spark 4.0.0 release
>>>>>
>>>>>
>>>>>
>>>>> +1 love to see it!
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 2, 2024 at 10:08 AM Holden Karau 
>>>>> wrote:
>>>>>
>>>>> +1 :) yay previews
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 1, 2024 at 5:36 PM Chao Sun  wrote:
>>>>>
>>>>> +1
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 1, 2024 at 5:23 PM Xiao Li  wrote:
>>>>>
>>>>> +1 for next Monday.
>>>>>
>>>>>
>>>>>
>>>>> We can do more previews when the other features are ready for preview.
>>>>>
>>>>>
>>>>>
>>>>> Tathagata Das  于2024年5月1日周三 08:46写道:
>>>>>
>>>>> Next week sounds great! Thank you Wenchen!
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan 
>>>>> wrote:
>>>>>
>>>>> Yea I think a preview release won't hurt (without a branch cut). We
>>>>> don't need to wait for all the ongoing projects to be ready. How about we
>>>>> do a 4.0 preview release based on the current master branch next Monday?
>>>>>
>>&g

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Nimrod Ofek
Hi,
Thanks for the reply.

>From my experience, a build on a build server would be much more
predictable and less error prone than building on some laptop- and of
course much faster to have builds, snapshots, release candidates, early
previews releases, release candidates or final releases.
It will enable us to have a preview version with current changes- snapshot
version, either automatically every day or if we need to save costs
(although build is really not expensive) - with a click of a button.

Regarding keys for signing. - that's what vaults are for, all across the
industry we are using vaults (such as hashicorp vault)- but if the build
will be automated and the only thing which will be manual is to sign the
release for security reasons that would be reasonable.

Thanks,
Nimrod


בתאריך יום ד׳, 8 במאי 2024, 00:54, מאת Holden Karau ‏:

> Indeed. We could conceivably build the release in CI/CD but the final
> verification / signing should be done locally to keep the keys safe (there
> was some concern from earlier release processes).
>
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
>
> On Tue, May 7, 2024 at 10:55 AM Nimrod Ofek  wrote:
>
>> Hi,
>>
>> Sorry for the novice question, Wenchen - the release is done manually
>> from a laptop? Not using a CI CD process on a build server?
>>
>> Thanks,
>> Nimrod
>>
>> On Tue, May 7, 2024 at 8:50 PM Wenchen Fan  wrote:
>>
>>> UPDATE:
>>>
>>> Unfortunately, it took me quite some time to set up my laptop and get it
>>> ready for the release process (docker desktop doesn't work anymore, my pgp
>>> key is lost, etc.). I'll start the RC process at my tomorrow. Thanks for
>>> your patience!
>>>
>>> Wenchen
>>>
>>> On Fri, May 3, 2024 at 7:47 AM yangjie01  wrote:
>>>
>>>> +1
>>>>
>>>>
>>>>
>>>> *发件人**: *Jungtaek Lim 
>>>> *日期**: *2024年5月2日 星期四 10:21
>>>> *收件人**: *Holden Karau 
>>>> *抄送**: *Chao Sun , Xiao Li ,
>>>> Tathagata Das , Wenchen Fan <
>>>> cloud0...@gmail.com>, Cheng Pan , Nicholas Chammas <
>>>> nicholas.cham...@gmail.com>, Dongjoon Hyun ,
>>>> Cheng Pan , Spark dev list ,
>>>> Anish Shrigondekar 
>>>> *主题**: *Re: [DISCUSS] Spark 4.0.0 release
>>>>
>>>>
>>>>
>>>> +1 love to see it!
>>>>
>>>>
>>>>
>>>> On Thu, May 2, 2024 at 10:08 AM Holden Karau 
>>>> wrote:
>>>>
>>>> +1 :) yay previews
>>>>
>>>>
>>>>
>>>> On Wed, May 1, 2024 at 5:36 PM Chao Sun  wrote:
>>>>
>>>> +1
>>>>
>>>>
>>>>
>>>> On Wed, May 1, 2024 at 5:23 PM Xiao Li  wrote:
>>>>
>>>> +1 for next Monday.
>>>>
>>>>
>>>>
>>>> We can do more previews when the other features are ready for preview.
>>>>
>>>>
>>>>
>>>> Tathagata Das  于2024年5月1日周三 08:46写道:
>>>>
>>>> Next week sounds great! Thank you Wenchen!
>>>>
>>>>
>>>>
>>>> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan 
>>>> wrote:
>>>>
>>>> Yea I think a preview release won't hurt (without a branch cut). We
>>>> don't need to wait for all the ongoing projects to be ready. How about we
>>>> do a 4.0 preview release based on the current master branch next Monday?
>>>>
>>>>
>>>>
>>>> On Wed, May 1, 2024 at 11:06 PM Tathagata Das <
>>>> tathagata.das1...@gmail.com> wrote:
>>>>
>>>> Hey all,
>>>>
>>>>
>>>>
>>>> Reviving this thread, but Spark master has already accumulated a huge
>>>> amount of changes.  As a downstream project maintainer, I want to really
>>>> start testing the new features and other breaking changes, and it's hard to
>>>> do that without a Preview release. So the sooner we make a Preview release,
>>>> the faster we can start getting feedback for fixing things for a great
>>>> Spark 4.0 final release.
>>>>
>>>>
>>>>
>>>> So I urge the community to produce a Spark 4.0 Preview soon even if
>>>> certain features targeting the Delta 4.0 release are st

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Holden Karau
Indeed. We could conceivably build the release in CI/CD but the final
verification / signing should be done locally to keep the keys safe (there
was some concern from earlier release processes).

Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


On Tue, May 7, 2024 at 10:55 AM Nimrod Ofek  wrote:

> Hi,
>
> Sorry for the novice question, Wenchen - the release is done manually from
> a laptop? Not using a CI CD process on a build server?
>
> Thanks,
> Nimrod
>
> On Tue, May 7, 2024 at 8:50 PM Wenchen Fan  wrote:
>
>> UPDATE:
>>
>> Unfortunately, it took me quite some time to set up my laptop and get it
>> ready for the release process (docker desktop doesn't work anymore, my pgp
>> key is lost, etc.). I'll start the RC process at my tomorrow. Thanks for
>> your patience!
>>
>> Wenchen
>>
>> On Fri, May 3, 2024 at 7:47 AM yangjie01  wrote:
>>
>>> +1
>>>
>>>
>>>
>>> *发件人**: *Jungtaek Lim 
>>> *日期**: *2024年5月2日 星期四 10:21
>>> *收件人**: *Holden Karau 
>>> *抄送**: *Chao Sun , Xiao Li ,
>>> Tathagata Das , Wenchen Fan <
>>> cloud0...@gmail.com>, Cheng Pan , Nicholas Chammas <
>>> nicholas.cham...@gmail.com>, Dongjoon Hyun ,
>>> Cheng Pan , Spark dev list ,
>>> Anish Shrigondekar 
>>> *主题**: *Re: [DISCUSS] Spark 4.0.0 release
>>>
>>>
>>>
>>> +1 love to see it!
>>>
>>>
>>>
>>> On Thu, May 2, 2024 at 10:08 AM Holden Karau 
>>> wrote:
>>>
>>> +1 :) yay previews
>>>
>>>
>>>
>>> On Wed, May 1, 2024 at 5:36 PM Chao Sun  wrote:
>>>
>>> +1
>>>
>>>
>>>
>>> On Wed, May 1, 2024 at 5:23 PM Xiao Li  wrote:
>>>
>>> +1 for next Monday.
>>>
>>>
>>>
>>> We can do more previews when the other features are ready for preview.
>>>
>>>
>>>
>>> Tathagata Das  于2024年5月1日周三 08:46写道:
>>>
>>> Next week sounds great! Thank you Wenchen!
>>>
>>>
>>>
>>> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan  wrote:
>>>
>>> Yea I think a preview release won't hurt (without a branch cut). We
>>> don't need to wait for all the ongoing projects to be ready. How about we
>>> do a 4.0 preview release based on the current master branch next Monday?
>>>
>>>
>>>
>>> On Wed, May 1, 2024 at 11:06 PM Tathagata Das <
>>> tathagata.das1...@gmail.com> wrote:
>>>
>>> Hey all,
>>>
>>>
>>>
>>> Reviving this thread, but Spark master has already accumulated a huge
>>> amount of changes.  As a downstream project maintainer, I want to really
>>> start testing the new features and other breaking changes, and it's hard to
>>> do that without a Preview release. So the sooner we make a Preview release,
>>> the faster we can start getting feedback for fixing things for a great
>>> Spark 4.0 final release.
>>>
>>>
>>>
>>> So I urge the community to produce a Spark 4.0 Preview soon even if
>>> certain features targeting the Delta 4.0 release are still incomplete.
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan  wrote:
>>>
>>> Thank you all for the replies!
>>>
>>>
>>>
>>> To @Nicholas Chammas  : Thanks for cleaning
>>> up the error terminology and documentation! I've merged the first PR and
>>> let's finish others before the 4.0 release.
>>>
>>> To @Dongjoon Hyun  : Thanks for driving the
>>> ANSI on by default effort! Now the vote has passed, let's flip the config
>>> and finish the DataFrame error context feature before 4.0.
>>>
>>> To @Jungtaek Lim  : Ack. We can treat the
>>> Streaming state store data source as completed for 4.0 then.
>>>
>>> To @Cheng Pan  : Yea we definitely should have a
>>> preview release. Let's collect more feedback on the ongoing projects and
>>> then we can propose a date for the preview release.
>>>
>>>
>>>
>>> On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan  wrote:
>>>
>>> will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0?
>>>
>

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Dongjoon Hyun
Thank you so much for the update, Wenchen!

Dongjoon.

On Tue, May 7, 2024 at 10:49 AM Wenchen Fan  wrote:

> UPDATE:
>
> Unfortunately, it took me quite some time to set up my laptop and get it
> ready for the release process (docker desktop doesn't work anymore, my pgp
> key is lost, etc.). I'll start the RC process at my tomorrow. Thanks for
> your patience!
>
> Wenchen
>
> On Fri, May 3, 2024 at 7:47 AM yangjie01  wrote:
>
>> +1
>>
>>
>>
>> *发件人**: *Jungtaek Lim 
>> *日期**: *2024年5月2日 星期四 10:21
>> *收件人**: *Holden Karau 
>> *抄送**: *Chao Sun , Xiao Li ,
>> Tathagata Das , Wenchen Fan <
>> cloud0...@gmail.com>, Cheng Pan , Nicholas Chammas <
>> nicholas.cham...@gmail.com>, Dongjoon Hyun ,
>> Cheng Pan , Spark dev list ,
>> Anish Shrigondekar 
>> *主题**: *Re: [DISCUSS] Spark 4.0.0 release
>>
>>
>>
>> +1 love to see it!
>>
>>
>>
>> On Thu, May 2, 2024 at 10:08 AM Holden Karau 
>> wrote:
>>
>> +1 :) yay previews
>>
>>
>>
>> On Wed, May 1, 2024 at 5:36 PM Chao Sun  wrote:
>>
>> +1
>>
>>
>>
>> On Wed, May 1, 2024 at 5:23 PM Xiao Li  wrote:
>>
>> +1 for next Monday.
>>
>>
>>
>> We can do more previews when the other features are ready for preview.
>>
>>
>>
>> Tathagata Das  于2024年5月1日周三 08:46写道:
>>
>> Next week sounds great! Thank you Wenchen!
>>
>>
>>
>> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan  wrote:
>>
>> Yea I think a preview release won't hurt (without a branch cut). We don't
>> need to wait for all the ongoing projects to be ready. How about we do a
>> 4.0 preview release based on the current master branch next Monday?
>>
>>
>>
>> On Wed, May 1, 2024 at 11:06 PM Tathagata Das <
>> tathagata.das1...@gmail.com> wrote:
>>
>> Hey all,
>>
>>
>>
>> Reviving this thread, but Spark master has already accumulated a huge
>> amount of changes.  As a downstream project maintainer, I want to really
>> start testing the new features and other breaking changes, and it's hard to
>> do that without a Preview release. So the sooner we make a Preview release,
>> the faster we can start getting feedback for fixing things for a great
>> Spark 4.0 final release.
>>
>>
>>
>> So I urge the community to produce a Spark 4.0 Preview soon even if
>> certain features targeting the Delta 4.0 release are still incomplete.
>>
>>
>>
>> Thanks!
>>
>>
>>
>>
>>
>> On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan  wrote:
>>
>> Thank you all for the replies!
>>
>>
>>
>> To @Nicholas Chammas  : Thanks for cleaning
>> up the error terminology and documentation! I've merged the first PR and
>> let's finish others before the 4.0 release.
>>
>> To @Dongjoon Hyun  : Thanks for driving the
>> ANSI on by default effort! Now the vote has passed, let's flip the config
>> and finish the DataFrame error context feature before 4.0.
>>
>> To @Jungtaek Lim  : Ack. We can treat the
>> Streaming state store data source as completed for 4.0 then.
>>
>> To @Cheng Pan  : Yea we definitely should have a
>> preview release. Let's collect more feedback on the ongoing projects and
>> then we can propose a date for the preview release.
>>
>>
>>
>> On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan  wrote:
>>
>> will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0?
>>
>> Thanks,
>> Cheng Pan
>>
>>
>> > On Apr 15, 2024, at 09:58, Jungtaek Lim 
>> wrote:
>> >
>> > W.r.t. state data source - reader (SPARK-45511), there are several
>> follow-up tickets, but we don't plan to address them soon. The current
>> implementation is the final shape for Spark 4.0.0, unless there are demands
>> on the follow-up tickets.
>> >
>> > We may want to check the plan for transformWithState - my understanding
>> is that we want to release the feature to 4.0.0, but there are several
>> remaining works to be done. While the tentative timeline for releasing is
>> June 2024, what would be the tentative timeline for the RC cut?
>> > (cc. Anish to add more context on the plan for transformWithState)
>> >
>> > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan 
>> wrote:
>> > Hi all,
>> >
>> > It's close to the previously proposed 4.0.0 release date (June 2024),
>> and I think it's time to prepare for it and 

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Nimrod Ofek
Hi,

Sorry for the novice question, Wenchen - the release is done manually from
a laptop? Not using a CI CD process on a build server?

Thanks,
Nimrod

On Tue, May 7, 2024 at 8:50 PM Wenchen Fan  wrote:

> UPDATE:
>
> Unfortunately, it took me quite some time to set up my laptop and get it
> ready for the release process (docker desktop doesn't work anymore, my pgp
> key is lost, etc.). I'll start the RC process at my tomorrow. Thanks for
> your patience!
>
> Wenchen
>
> On Fri, May 3, 2024 at 7:47 AM yangjie01  wrote:
>
>> +1
>>
>>
>>
>> *发件人**: *Jungtaek Lim 
>> *日期**: *2024年5月2日 星期四 10:21
>> *收件人**: *Holden Karau 
>> *抄送**: *Chao Sun , Xiao Li ,
>> Tathagata Das , Wenchen Fan <
>> cloud0...@gmail.com>, Cheng Pan , Nicholas Chammas <
>> nicholas.cham...@gmail.com>, Dongjoon Hyun ,
>> Cheng Pan , Spark dev list ,
>> Anish Shrigondekar 
>> *主题**: *Re: [DISCUSS] Spark 4.0.0 release
>>
>>
>>
>> +1 love to see it!
>>
>>
>>
>> On Thu, May 2, 2024 at 10:08 AM Holden Karau 
>> wrote:
>>
>> +1 :) yay previews
>>
>>
>>
>> On Wed, May 1, 2024 at 5:36 PM Chao Sun  wrote:
>>
>> +1
>>
>>
>>
>> On Wed, May 1, 2024 at 5:23 PM Xiao Li  wrote:
>>
>> +1 for next Monday.
>>
>>
>>
>> We can do more previews when the other features are ready for preview.
>>
>>
>>
>> Tathagata Das  于2024年5月1日周三 08:46写道:
>>
>> Next week sounds great! Thank you Wenchen!
>>
>>
>>
>> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan  wrote:
>>
>> Yea I think a preview release won't hurt (without a branch cut). We don't
>> need to wait for all the ongoing projects to be ready. How about we do a
>> 4.0 preview release based on the current master branch next Monday?
>>
>>
>>
>> On Wed, May 1, 2024 at 11:06 PM Tathagata Das <
>> tathagata.das1...@gmail.com> wrote:
>>
>> Hey all,
>>
>>
>>
>> Reviving this thread, but Spark master has already accumulated a huge
>> amount of changes.  As a downstream project maintainer, I want to really
>> start testing the new features and other breaking changes, and it's hard to
>> do that without a Preview release. So the sooner we make a Preview release,
>> the faster we can start getting feedback for fixing things for a great
>> Spark 4.0 final release.
>>
>>
>>
>> So I urge the community to produce a Spark 4.0 Preview soon even if
>> certain features targeting the Delta 4.0 release are still incomplete.
>>
>>
>>
>> Thanks!
>>
>>
>>
>>
>>
>> On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan  wrote:
>>
>> Thank you all for the replies!
>>
>>
>>
>> To @Nicholas Chammas  : Thanks for cleaning
>> up the error terminology and documentation! I've merged the first PR and
>> let's finish others before the 4.0 release.
>>
>> To @Dongjoon Hyun  : Thanks for driving the
>> ANSI on by default effort! Now the vote has passed, let's flip the config
>> and finish the DataFrame error context feature before 4.0.
>>
>> To @Jungtaek Lim  : Ack. We can treat the
>> Streaming state store data source as completed for 4.0 then.
>>
>> To @Cheng Pan  : Yea we definitely should have a
>> preview release. Let's collect more feedback on the ongoing projects and
>> then we can propose a date for the preview release.
>>
>>
>>
>> On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan  wrote:
>>
>> will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0?
>>
>> Thanks,
>> Cheng Pan
>>
>>
>> > On Apr 15, 2024, at 09:58, Jungtaek Lim 
>> wrote:
>> >
>> > W.r.t. state data source - reader (SPARK-45511), there are several
>> follow-up tickets, but we don't plan to address them soon. The current
>> implementation is the final shape for Spark 4.0.0, unless there are demands
>> on the follow-up tickets.
>> >
>> > We may want to check the plan for transformWithState - my understanding
>> is that we want to release the feature to 4.0.0, but there are several
>> remaining works to be done. While the tentative timeline for releasing is
>> June 2024, what would be the tentative timeline for the RC cut?
>> > (cc. Anish to add more context on the plan for transformWithState)
>> >
>> > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan 
>> wrote:
>> > Hi all,
>> >
>> > It's close to the previously pr

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Wenchen Fan
UPDATE:

Unfortunately, it took me quite some time to set up my laptop and get it
ready for the release process (docker desktop doesn't work anymore, my pgp
key is lost, etc.). I'll start the RC process at my tomorrow. Thanks for
your patience!

Wenchen

On Fri, May 3, 2024 at 7:47 AM yangjie01  wrote:

> +1
>
>
>
> *发件人**: *Jungtaek Lim 
> *日期**: *2024年5月2日 星期四 10:21
> *收件人**: *Holden Karau 
> *抄送**: *Chao Sun , Xiao Li ,
> Tathagata Das , Wenchen Fan <
> cloud0...@gmail.com>, Cheng Pan , Nicholas Chammas <
> nicholas.cham...@gmail.com>, Dongjoon Hyun ,
> Cheng Pan , Spark dev list ,
> Anish Shrigondekar 
> *主题**: *Re: [DISCUSS] Spark 4.0.0 release
>
>
>
> +1 love to see it!
>
>
>
> On Thu, May 2, 2024 at 10:08 AM Holden Karau 
> wrote:
>
> +1 :) yay previews
>
>
>
> On Wed, May 1, 2024 at 5:36 PM Chao Sun  wrote:
>
> +1
>
>
>
> On Wed, May 1, 2024 at 5:23 PM Xiao Li  wrote:
>
> +1 for next Monday.
>
>
>
> We can do more previews when the other features are ready for preview.
>
>
>
> Tathagata Das  于2024年5月1日周三 08:46写道:
>
> Next week sounds great! Thank you Wenchen!
>
>
>
> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan  wrote:
>
> Yea I think a preview release won't hurt (without a branch cut). We don't
> need to wait for all the ongoing projects to be ready. How about we do a
> 4.0 preview release based on the current master branch next Monday?
>
>
>
> On Wed, May 1, 2024 at 11:06 PM Tathagata Das 
> wrote:
>
> Hey all,
>
>
>
> Reviving this thread, but Spark master has already accumulated a huge
> amount of changes.  As a downstream project maintainer, I want to really
> start testing the new features and other breaking changes, and it's hard to
> do that without a Preview release. So the sooner we make a Preview release,
> the faster we can start getting feedback for fixing things for a great
> Spark 4.0 final release.
>
>
>
> So I urge the community to produce a Spark 4.0 Preview soon even if
> certain features targeting the Delta 4.0 release are still incomplete.
>
>
>
> Thanks!
>
>
>
>
>
> On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan  wrote:
>
> Thank you all for the replies!
>
>
>
> To @Nicholas Chammas  : Thanks for cleaning
> up the error terminology and documentation! I've merged the first PR and
> let's finish others before the 4.0 release.
>
> To @Dongjoon Hyun  : Thanks for driving the ANSI
> on by default effort! Now the vote has passed, let's flip the config and
> finish the DataFrame error context feature before 4.0.
>
> To @Jungtaek Lim  : Ack. We can treat the
> Streaming state store data source as completed for 4.0 then.
>
> To @Cheng Pan  : Yea we definitely should have a
> preview release. Let's collect more feedback on the ongoing projects and
> then we can propose a date for the preview release.
>
>
>
> On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan  wrote:
>
> will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0?
>
> Thanks,
> Cheng Pan
>
>
> > On Apr 15, 2024, at 09:58, Jungtaek Lim 
> wrote:
> >
> > W.r.t. state data source - reader (SPARK-45511), there are several
> follow-up tickets, but we don't plan to address them soon. The current
> implementation is the final shape for Spark 4.0.0, unless there are demands
> on the follow-up tickets.
> >
> > We may want to check the plan for transformWithState - my understanding
> is that we want to release the feature to 4.0.0, but there are several
> remaining works to be done. While the tentative timeline for releasing is
> June 2024, what would be the tentative timeline for the RC cut?
> > (cc. Anish to add more context on the plan for transformWithState)
> >
> > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan  wrote:
> > Hi all,
> >
> > It's close to the previously proposed 4.0.0 release date (June 2024),
> and I think it's time to prepare for it and discuss the ongoing projects:
> > •
> > ANSI by default
> > • Spark Connect GA
> > • Structured Logging
> > • Streaming state store data source
> > • new data type VARIANT
> > • STRING collation support
> > • Spark k8s operator versioning
> > Please help to add more items to this list that are missed here. I would
> like to volunteer as the release manager for Apache Spark 4.0.0 if there is
> no objection. Thank you all for the great work that fills Spark 4.0!
> >
> > Wenchen Fan
>
>
>
>
> --
>
> Twitter: https://twitter.com/holdenkarau
> <https://mailshield.baidu.com/check?q=9DewFnOIsK%2bK64Uu60Jx4QkcL9rDgnApD6spzOBjk%2fa2KQxn>
>
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9
> <https://mailshield.baidu.com/check?q=D34Ozfkj%2bFrnkuu9ci%2b4FcMkreOvMZ3jO85bIw%3d%3d>
>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> <https://mailshield.baidu.com/check?q=nadOZCZjNeU0qOVGCJesf8dvH4OrsWdKamKIxnJncPneWoN8%2bsIqc2DWow8%3d>
>
>


Re: [DISCUSS] Spark 4.0.0 release

2024-05-02 Thread yangjie01
+1

发件人: Jungtaek Lim 
日期: 2024年5月2日 星期四 10:21
收件人: Holden Karau 
抄送: Chao Sun , Xiao Li , Tathagata 
Das , Wenchen Fan , Cheng Pan 
, Nicholas Chammas , Dongjoon 
Hyun , Cheng Pan , Spark dev list 
, Anish Shrigondekar 
主题: Re: [DISCUSS] Spark 4.0.0 release

+1 love to see it!

On Thu, May 2, 2024 at 10:08 AM Holden Karau 
mailto:holden.ka...@gmail.com>> wrote:
+1 :) yay previews

On Wed, May 1, 2024 at 5:36 PM Chao Sun 
mailto:sunc...@apache.org>> wrote:
+1

On Wed, May 1, 2024 at 5:23 PM Xiao Li 
mailto:gatorsm...@gmail.com>> wrote:
+1 for next Monday.

We can do more previews when the other features are ready for preview.

Tathagata Das mailto:tathagata.das1...@gmail.com>> 
于2024年5月1日周三 08:46写道:
Next week sounds great! Thank you Wenchen!

On Wed, May 1, 2024 at 11:16 AM Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
Yea I think a preview release won't hurt (without a branch cut). We don't need 
to wait for all the ongoing projects to be ready. How about we do a 4.0 preview 
release based on the current master branch next Monday?

On Wed, May 1, 2024 at 11:06 PM Tathagata Das 
mailto:tathagata.das1...@gmail.com>> wrote:
Hey all,

Reviving this thread, but Spark master has already accumulated a huge amount of 
changes.  As a downstream project maintainer, I want to really start testing 
the new features and other breaking changes, and it's hard to do that without a 
Preview release. So the sooner we make a Preview release, the faster we can 
start getting feedback for fixing things for a great Spark 4.0 final release.

So I urge the community to produce a Spark 4.0 Preview soon even if certain 
features targeting the Delta 4.0 release are still incomplete.

Thanks!


On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
Thank you all for the replies!

To @Nicholas Chammas<mailto:nicholas.cham...@gmail.com> : Thanks for cleaning 
up the error terminology and documentation! I've merged the first PR and let's 
finish others before the 4.0 release.
To @Dongjoon Hyun<mailto:dongjoon.h...@gmail.com> : Thanks for driving the ANSI 
on by default effort! Now the vote has passed, let's flip the config and finish 
the DataFrame error context feature before 4.0.
To @Jungtaek Lim<mailto:kabhwan.opensou...@gmail.com> : Ack. We can treat the 
Streaming state store data source as completed for 4.0 then.
To @Cheng Pan<mailto:cheng...@apache.org> : Yea we definitely should have a 
preview release. Let's collect more feedback on the ongoing projects and then 
we can propose a date for the preview release.

On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan 
mailto:pan3...@gmail.com>> wrote:
will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0?

Thanks,
Cheng Pan


> On Apr 15, 2024, at 09:58, Jungtaek Lim 
> mailto:kabhwan.opensou...@gmail.com>> wrote:
>
> W.r.t. state data source - reader (SPARK-45511), there are several follow-up 
> tickets, but we don't plan to address them soon. The current implementation 
> is the final shape for Spark 4.0.0, unless there are demands on the follow-up 
> tickets.
>
> We may want to check the plan for transformWithState - my understanding is 
> that we want to release the feature to 4.0.0, but there are several remaining 
> works to be done. While the tentative timeline for releasing is June 2024, 
> what would be the tentative timeline for the RC cut?
> (cc. Anish to add more context on the plan for transformWithState)
>
> On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan 
> mailto:cloud0...@gmail.com>> wrote:
> Hi all,
>
> It's close to the previously proposed 4.0.0 release date (June 2024), and I 
> think it's time to prepare for it and discuss the ongoing projects:
> •
> ANSI by default
> • Spark Connect GA
> • Structured Logging
> • Streaming state store data source
> • new data type VARIANT
> • STRING collation support
> • Spark k8s operator versioning
> Please help to add more items to this list that are missed here. I would like 
> to volunteer as the release manager for Apache Spark 4.0.0 if there is no 
> objection. Thank you all for the great work that fills Spark 4.0!
>
> Wenchen Fan


--
Twitter: 
https://twitter.com/holdenkarau<https://mailshield.baidu.com/check?q=9DewFnOIsK%2bK64Uu60Jx4QkcL9rDgnApD6spzOBjk%2fa2KQxn>
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
<https://mailshield.baidu.com/check?q=D34Ozfkj%2bFrnkuu9ci%2b4FcMkreOvMZ3jO85bIw%3d%3d>
YouTube Live Streams: 
https://www.youtube.com/user/holdenkarau<https://mailshield.baidu.com/check?q=nadOZCZjNeU0qOVGCJesf8dvH4OrsWdKamKIxnJncPneWoN8%2bsIqc2DWow8%3d>


Re: [DISCUSS] Spark 4.0.0 release

2024-05-02 Thread Mich Talebzadeh
   - Integration with additional external data sources or systems, say Hive
   - Enhancements to the Spark UI for improved monitoring and debugging
   - Enhancements to machine learning (MLlib) algorithms and capabilities,
   like TensorFlow or PyTorch,( if any in the pipeline)

HTH

Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Thu, 2 May 2024 at 17:02, Steve Loughran 
wrote:

> There's a new parquet RC up this week which would be good to pull in.
>
> On Thu, 2 May 2024 at 03:20, Jungtaek Lim 
> wrote:
>
>> +1 love to see it!
>>
>> On Thu, May 2, 2024 at 10:08 AM Holden Karau 
>> wrote:
>>
>>> +1 :) yay previews
>>>
>>> On Wed, May 1, 2024 at 5:36 PM Chao Sun  wrote:
>>>
 +1

 On Wed, May 1, 2024 at 5:23 PM Xiao Li  wrote:

> +1 for next Monday.
>
> We can do more previews when the other features are ready for preview.
>
> Tathagata Das  于2024年5月1日周三 08:46写道:
>
>> Next week sounds great! Thank you Wenchen!
>>
>> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan 
>> wrote:
>>
>>> Yea I think a preview release won't hurt (without a branch cut). We
>>> don't need to wait for all the ongoing projects to be ready. How about 
>>> we
>>> do a 4.0 preview release based on the current master branch next Monday?
>>>
>>> On Wed, May 1, 2024 at 11:06 PM Tathagata Das <
>>> tathagata.das1...@gmail.com> wrote:
>>>
 Hey all,

 Reviving this thread, but Spark master has already accumulated a
 huge amount of changes.  As a downstream project maintainer, I want to
 really start testing the new features and other breaking changes, and 
 it's
 hard to do that without a Preview release. So the sooner we make a 
 Preview
 release, the faster we can start getting feedback for fixing things 
 for a
 great Spark 4.0 final release.

 So I urge the community to produce a Spark 4.0 Preview soon even if
 certain features targeting the Delta 4.0 release are still incomplete.

 Thanks!


 On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan 
 wrote:

> Thank you all for the replies!
>
> To @Nicholas Chammas  : Thanks for
> cleaning up the error terminology and documentation! I've merged the 
> first
> PR and let's finish others before the 4.0 release.
> To @Dongjoon Hyun  : Thanks for driving
> the ANSI on by default effort! Now the vote has passed, let's flip the
> config and finish the DataFrame error context feature before 4.0.
> To @Jungtaek Lim  : Ack. We can
> treat the Streaming state store data source as completed for 4.0 then.
> To @Cheng Pan  : Yea we definitely should
> have a preview release. Let's collect more feedback on the ongoing 
> projects
> and then we can propose a date for the preview release.
>
> On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan 
> wrote:
>
>> will we have preview release for 4.0.0 like we did for 2.0.0 and
>> 3.0.0?
>>
>> Thanks,
>> Cheng Pan
>>
>>
>> > On Apr 15, 2024, at 09:58, Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>> >
>> > W.r.t. state data source - reader (SPARK-45511), there are
>> several follow-up tickets, but we don't plan to address them soon. 
>> The
>> current implementation is the final shape for Spark 4.0.0, unless 
>> there are
>> demands on the follow-up tickets.
>> >
>> > We may want to check the plan for transformWithState - my
>> understanding is that we want to release the feature to 4.0.0, but 
>> there
>> are several remaining works to be done. While the tentative timeline 
>> for
>> releasing is June 2024, what would be the tentative timeline for the 
>> RC cut?
>> > (cc. Anish to add more context on the plan for
>> transformWithState)
>> >
>> > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan <
>> cloud0...@gmail.com> wrote:
>> > Hi all,
>> >
>> > It's close to the previously proposed 4.0.0 release date (June
>> 2024), and I think it's time to prepare for it and discuss the 

Re: [DISCUSS] Spark 4.0.0 release

2024-05-02 Thread Steve Loughran
There's a new parquet RC up this week which would be good to pull in.

On Thu, 2 May 2024 at 03:20, Jungtaek Lim 
wrote:

> +1 love to see it!
>
> On Thu, May 2, 2024 at 10:08 AM Holden Karau 
> wrote:
>
>> +1 :) yay previews
>>
>> On Wed, May 1, 2024 at 5:36 PM Chao Sun  wrote:
>>
>>> +1
>>>
>>> On Wed, May 1, 2024 at 5:23 PM Xiao Li  wrote:
>>>
 +1 for next Monday.

 We can do more previews when the other features are ready for preview.

 Tathagata Das  于2024年5月1日周三 08:46写道:

> Next week sounds great! Thank you Wenchen!
>
> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan 
> wrote:
>
>> Yea I think a preview release won't hurt (without a branch cut). We
>> don't need to wait for all the ongoing projects to be ready. How about we
>> do a 4.0 preview release based on the current master branch next Monday?
>>
>> On Wed, May 1, 2024 at 11:06 PM Tathagata Das <
>> tathagata.das1...@gmail.com> wrote:
>>
>>> Hey all,
>>>
>>> Reviving this thread, but Spark master has already accumulated a
>>> huge amount of changes.  As a downstream project maintainer, I want to
>>> really start testing the new features and other breaking changes, and 
>>> it's
>>> hard to do that without a Preview release. So the sooner we make a 
>>> Preview
>>> release, the faster we can start getting feedback for fixing things for 
>>> a
>>> great Spark 4.0 final release.
>>>
>>> So I urge the community to produce a Spark 4.0 Preview soon even if
>>> certain features targeting the Delta 4.0 release are still incomplete.
>>>
>>> Thanks!
>>>
>>>
>>> On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan 
>>> wrote:
>>>
 Thank you all for the replies!

 To @Nicholas Chammas  : Thanks for
 cleaning up the error terminology and documentation! I've merged the 
 first
 PR and let's finish others before the 4.0 release.
 To @Dongjoon Hyun  : Thanks for driving
 the ANSI on by default effort! Now the vote has passed, let's flip the
 config and finish the DataFrame error context feature before 4.0.
 To @Jungtaek Lim  : Ack. We can
 treat the Streaming state store data source as completed for 4.0 then.
 To @Cheng Pan  : Yea we definitely should
 have a preview release. Let's collect more feedback on the ongoing 
 projects
 and then we can propose a date for the preview release.

 On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan 
 wrote:

> will we have preview release for 4.0.0 like we did for 2.0.0 and
> 3.0.0?
>
> Thanks,
> Cheng Pan
>
>
> > On Apr 15, 2024, at 09:58, Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
> >
> > W.r.t. state data source - reader (SPARK-45511), there are
> several follow-up tickets, but we don't plan to address them soon. The
> current implementation is the final shape for Spark 4.0.0, unless 
> there are
> demands on the follow-up tickets.
> >
> > We may want to check the plan for transformWithState - my
> understanding is that we want to release the feature to 4.0.0, but 
> there
> are several remaining works to be done. While the tentative timeline 
> for
> releasing is June 2024, what would be the tentative timeline for the 
> RC cut?
> > (cc. Anish to add more context on the plan for
> transformWithState)
> >
> > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan 
> wrote:
> > Hi all,
> >
> > It's close to the previously proposed 4.0.0 release date (June
> 2024), and I think it's time to prepare for it and discuss the ongoing
> projects:
> > •
> > ANSI by default
> > • Spark Connect GA
> > • Structured Logging
> > • Streaming state store data source
> > • new data type VARIANT
> > • STRING collation support
> > • Spark k8s operator versioning
> > Please help to add more items to this list that are missed here.
> I would like to volunteer as the release manager for Apache Spark 
> 4.0.0 if
> there is no objection. Thank you all for the great work that fills 
> Spark
> 4.0!
> >
> > Wenchen Fan
>
>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>


Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Jungtaek Lim
+1 love to see it!

On Thu, May 2, 2024 at 10:08 AM Holden Karau  wrote:

> +1 :) yay previews
>
> On Wed, May 1, 2024 at 5:36 PM Chao Sun  wrote:
>
>> +1
>>
>> On Wed, May 1, 2024 at 5:23 PM Xiao Li  wrote:
>>
>>> +1 for next Monday.
>>>
>>> We can do more previews when the other features are ready for preview.
>>>
>>> Tathagata Das  于2024年5月1日周三 08:46写道:
>>>
 Next week sounds great! Thank you Wenchen!

 On Wed, May 1, 2024 at 11:16 AM Wenchen Fan 
 wrote:

> Yea I think a preview release won't hurt (without a branch cut). We
> don't need to wait for all the ongoing projects to be ready. How about we
> do a 4.0 preview release based on the current master branch next Monday?
>
> On Wed, May 1, 2024 at 11:06 PM Tathagata Das <
> tathagata.das1...@gmail.com> wrote:
>
>> Hey all,
>>
>> Reviving this thread, but Spark master has already accumulated a huge
>> amount of changes.  As a downstream project maintainer, I want to really
>> start testing the new features and other breaking changes, and it's hard 
>> to
>> do that without a Preview release. So the sooner we make a Preview 
>> release,
>> the faster we can start getting feedback for fixing things for a great
>> Spark 4.0 final release.
>>
>> So I urge the community to produce a Spark 4.0 Preview soon even if
>> certain features targeting the Delta 4.0 release are still incomplete.
>>
>> Thanks!
>>
>>
>> On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan 
>> wrote:
>>
>>> Thank you all for the replies!
>>>
>>> To @Nicholas Chammas  : Thanks for
>>> cleaning up the error terminology and documentation! I've merged the 
>>> first
>>> PR and let's finish others before the 4.0 release.
>>> To @Dongjoon Hyun  : Thanks for driving
>>> the ANSI on by default effort! Now the vote has passed, let's flip the
>>> config and finish the DataFrame error context feature before 4.0.
>>> To @Jungtaek Lim  : Ack. We can treat
>>> the Streaming state store data source as completed for 4.0 then.
>>> To @Cheng Pan  : Yea we definitely should have
>>> a preview release. Let's collect more feedback on the ongoing projects 
>>> and
>>> then we can propose a date for the preview release.
>>>
>>> On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan  wrote:
>>>
 will we have preview release for 4.0.0 like we did for 2.0.0 and
 3.0.0?

 Thanks,
 Cheng Pan


 > On Apr 15, 2024, at 09:58, Jungtaek Lim <
 kabhwan.opensou...@gmail.com> wrote:
 >
 > W.r.t. state data source - reader (SPARK-45511), there are
 several follow-up tickets, but we don't plan to address them soon. The
 current implementation is the final shape for Spark 4.0.0, unless 
 there are
 demands on the follow-up tickets.
 >
 > We may want to check the plan for transformWithState - my
 understanding is that we want to release the feature to 4.0.0, but 
 there
 are several remaining works to be done. While the tentative timeline 
 for
 releasing is June 2024, what would be the tentative timeline for the 
 RC cut?
 > (cc. Anish to add more context on the plan for transformWithState)
 >
 > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan 
 wrote:
 > Hi all,
 >
 > It's close to the previously proposed 4.0.0 release date (June
 2024), and I think it's time to prepare for it and discuss the ongoing
 projects:
 > •
 > ANSI by default
 > • Spark Connect GA
 > • Structured Logging
 > • Streaming state store data source
 > • new data type VARIANT
 > • STRING collation support
 > • Spark k8s operator versioning
 > Please help to add more items to this list that are missed here.
 I would like to volunteer as the release manager for Apache Spark 
 4.0.0 if
 there is no objection. Thank you all for the great work that fills 
 Spark
 4.0!
 >
 > Wenchen Fan


>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Holden Karau
+1 :) yay previews

On Wed, May 1, 2024 at 5:36 PM Chao Sun  wrote:

> +1
>
> On Wed, May 1, 2024 at 5:23 PM Xiao Li  wrote:
>
>> +1 for next Monday.
>>
>> We can do more previews when the other features are ready for preview.
>>
>> Tathagata Das  于2024年5月1日周三 08:46写道:
>>
>>> Next week sounds great! Thank you Wenchen!
>>>
>>> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan  wrote:
>>>
 Yea I think a preview release won't hurt (without a branch cut). We
 don't need to wait for all the ongoing projects to be ready. How about we
 do a 4.0 preview release based on the current master branch next Monday?

 On Wed, May 1, 2024 at 11:06 PM Tathagata Das <
 tathagata.das1...@gmail.com> wrote:

> Hey all,
>
> Reviving this thread, but Spark master has already accumulated a huge
> amount of changes.  As a downstream project maintainer, I want to really
> start testing the new features and other breaking changes, and it's hard 
> to
> do that without a Preview release. So the sooner we make a Preview 
> release,
> the faster we can start getting feedback for fixing things for a great
> Spark 4.0 final release.
>
> So I urge the community to produce a Spark 4.0 Preview soon even if
> certain features targeting the Delta 4.0 release are still incomplete.
>
> Thanks!
>
>
> On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan 
> wrote:
>
>> Thank you all for the replies!
>>
>> To @Nicholas Chammas  : Thanks for
>> cleaning up the error terminology and documentation! I've merged the 
>> first
>> PR and let's finish others before the 4.0 release.
>> To @Dongjoon Hyun  : Thanks for driving the
>> ANSI on by default effort! Now the vote has passed, let's flip the config
>> and finish the DataFrame error context feature before 4.0.
>> To @Jungtaek Lim  : Ack. We can treat
>> the Streaming state store data source as completed for 4.0 then.
>> To @Cheng Pan  : Yea we definitely should have
>> a preview release. Let's collect more feedback on the ongoing projects 
>> and
>> then we can propose a date for the preview release.
>>
>> On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan  wrote:
>>
>>> will we have preview release for 4.0.0 like we did for 2.0.0 and
>>> 3.0.0?
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>>
>>> > On Apr 15, 2024, at 09:58, Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>> >
>>> > W.r.t. state data source - reader (SPARK-45511), there are several
>>> follow-up tickets, but we don't plan to address them soon. The current
>>> implementation is the final shape for Spark 4.0.0, unless there are 
>>> demands
>>> on the follow-up tickets.
>>> >
>>> > We may want to check the plan for transformWithState - my
>>> understanding is that we want to release the feature to 4.0.0, but there
>>> are several remaining works to be done. While the tentative timeline for
>>> releasing is June 2024, what would be the tentative timeline for the RC 
>>> cut?
>>> > (cc. Anish to add more context on the plan for transformWithState)
>>> >
>>> > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan 
>>> wrote:
>>> > Hi all,
>>> >
>>> > It's close to the previously proposed 4.0.0 release date (June
>>> 2024), and I think it's time to prepare for it and discuss the ongoing
>>> projects:
>>> > •
>>> > ANSI by default
>>> > • Spark Connect GA
>>> > • Structured Logging
>>> > • Streaming state store data source
>>> > • new data type VARIANT
>>> > • STRING collation support
>>> > • Spark k8s operator versioning
>>> > Please help to add more items to this list that are missed here. I
>>> would like to volunteer as the release manager for Apache Spark 4.0.0 if
>>> there is no objection. Thank you all for the great work that fills Spark
>>> 4.0!
>>> >
>>> > Wenchen Fan
>>>
>>>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Chao Sun
+1

On Wed, May 1, 2024 at 5:23 PM Xiao Li  wrote:

> +1 for next Monday.
>
> We can do more previews when the other features are ready for preview.
>
> Tathagata Das  于2024年5月1日周三 08:46写道:
>
>> Next week sounds great! Thank you Wenchen!
>>
>> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan  wrote:
>>
>>> Yea I think a preview release won't hurt (without a branch cut). We
>>> don't need to wait for all the ongoing projects to be ready. How about we
>>> do a 4.0 preview release based on the current master branch next Monday?
>>>
>>> On Wed, May 1, 2024 at 11:06 PM Tathagata Das <
>>> tathagata.das1...@gmail.com> wrote:
>>>
 Hey all,

 Reviving this thread, but Spark master has already accumulated a huge
 amount of changes.  As a downstream project maintainer, I want to really
 start testing the new features and other breaking changes, and it's hard to
 do that without a Preview release. So the sooner we make a Preview release,
 the faster we can start getting feedback for fixing things for a great
 Spark 4.0 final release.

 So I urge the community to produce a Spark 4.0 Preview soon even if
 certain features targeting the Delta 4.0 release are still incomplete.

 Thanks!


 On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan 
 wrote:

> Thank you all for the replies!
>
> To @Nicholas Chammas  : Thanks for
> cleaning up the error terminology and documentation! I've merged the first
> PR and let's finish others before the 4.0 release.
> To @Dongjoon Hyun  : Thanks for driving the
> ANSI on by default effort! Now the vote has passed, let's flip the config
> and finish the DataFrame error context feature before 4.0.
> To @Jungtaek Lim  : Ack. We can treat
> the Streaming state store data source as completed for 4.0 then.
> To @Cheng Pan  : Yea we definitely should have a
> preview release. Let's collect more feedback on the ongoing projects and
> then we can propose a date for the preview release.
>
> On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan  wrote:
>
>> will we have preview release for 4.0.0 like we did for 2.0.0 and
>> 3.0.0?
>>
>> Thanks,
>> Cheng Pan
>>
>>
>> > On Apr 15, 2024, at 09:58, Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>> >
>> > W.r.t. state data source - reader (SPARK-45511), there are several
>> follow-up tickets, but we don't plan to address them soon. The current
>> implementation is the final shape for Spark 4.0.0, unless there are 
>> demands
>> on the follow-up tickets.
>> >
>> > We may want to check the plan for transformWithState - my
>> understanding is that we want to release the feature to 4.0.0, but there
>> are several remaining works to be done. While the tentative timeline for
>> releasing is June 2024, what would be the tentative timeline for the RC 
>> cut?
>> > (cc. Anish to add more context on the plan for transformWithState)
>> >
>> > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan 
>> wrote:
>> > Hi all,
>> >
>> > It's close to the previously proposed 4.0.0 release date (June
>> 2024), and I think it's time to prepare for it and discuss the ongoing
>> projects:
>> > •
>> > ANSI by default
>> > • Spark Connect GA
>> > • Structured Logging
>> > • Streaming state store data source
>> > • new data type VARIANT
>> > • STRING collation support
>> > • Spark k8s operator versioning
>> > Please help to add more items to this list that are missed here. I
>> would like to volunteer as the release manager for Apache Spark 4.0.0 if
>> there is no objection. Thank you all for the great work that fills Spark
>> 4.0!
>> >
>> > Wenchen Fan
>>
>>


Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Hyukjin Kwon
SGTM

On Thu, 2 May 2024 at 02:06, Dongjoon Hyun  wrote:

> +1 for next Monday.
>
> Dongjoon.
>
> On Wed, May 1, 2024 at 8:46 AM Tathagata Das 
> wrote:
>
>> Next week sounds great! Thank you Wenchen!
>>
>> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan  wrote:
>>
>>> Yea I think a preview release won't hurt (without a branch cut). We
>>> don't need to wait for all the ongoing projects to be ready. How about we
>>> do a 4.0 preview release based on the current master branch next Monday?
>>>
>>> On Wed, May 1, 2024 at 11:06 PM Tathagata Das <
>>> tathagata.das1...@gmail.com> wrote:
>>>
 Hey all,

 Reviving this thread, but Spark master has already accumulated a huge
 amount of changes.  As a downstream project maintainer, I want to really
 start testing the new features and other breaking changes, and it's hard to
 do that without a Preview release. So the sooner we make a Preview release,
 the faster we can start getting feedback for fixing things for a great
 Spark 4.0 final release.

 So I urge the community to produce a Spark 4.0 Preview soon even if
 certain features targeting the Delta 4.0 release are still incomplete.

 Thanks!


 On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan 
 wrote:

> Thank you all for the replies!
>
> To @Nicholas Chammas  : Thanks for
> cleaning up the error terminology and documentation! I've merged the first
> PR and let's finish others before the 4.0 release.
> To @Dongjoon Hyun  : Thanks for driving the
> ANSI on by default effort! Now the vote has passed, let's flip the config
> and finish the DataFrame error context feature before 4.0.
> To @Jungtaek Lim  : Ack. We can treat
> the Streaming state store data source as completed for 4.0 then.
> To @Cheng Pan  : Yea we definitely should have a
> preview release. Let's collect more feedback on the ongoing projects and
> then we can propose a date for the preview release.
>
> On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan  wrote:
>
>> will we have preview release for 4.0.0 like we did for 2.0.0 and
>> 3.0.0?
>>
>> Thanks,
>> Cheng Pan
>>
>>
>> > On Apr 15, 2024, at 09:58, Jungtaek Lim <
>> kabhwan.opensou...@gmail.com> wrote:
>> >
>> > W.r.t. state data source - reader (SPARK-45511), there are several
>> follow-up tickets, but we don't plan to address them soon. The current
>> implementation is the final shape for Spark 4.0.0, unless there are 
>> demands
>> on the follow-up tickets.
>> >
>> > We may want to check the plan for transformWithState - my
>> understanding is that we want to release the feature to 4.0.0, but there
>> are several remaining works to be done. While the tentative timeline for
>> releasing is June 2024, what would be the tentative timeline for the RC 
>> cut?
>> > (cc. Anish to add more context on the plan for transformWithState)
>> >
>> > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan 
>> wrote:
>> > Hi all,
>> >
>> > It's close to the previously proposed 4.0.0 release date (June
>> 2024), and I think it's time to prepare for it and discuss the ongoing
>> projects:
>> > •
>> > ANSI by default
>> > • Spark Connect GA
>> > • Structured Logging
>> > • Streaming state store data source
>> > • new data type VARIANT
>> > • STRING collation support
>> > • Spark k8s operator versioning
>> > Please help to add more items to this list that are missed here. I
>> would like to volunteer as the release manager for Apache Spark 4.0.0 if
>> there is no objection. Thank you all for the great work that fills Spark
>> 4.0!
>> >
>> > Wenchen Fan
>>
>>


Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Xiao Li
+1 for next Monday.

We can do more previews when the other features are ready for preview.

Tathagata Das  于2024年5月1日周三 08:46写道:

> Next week sounds great! Thank you Wenchen!
>
> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan  wrote:
>
>> Yea I think a preview release won't hurt (without a branch cut). We don't
>> need to wait for all the ongoing projects to be ready. How about we do a
>> 4.0 preview release based on the current master branch next Monday?
>>
>> On Wed, May 1, 2024 at 11:06 PM Tathagata Das <
>> tathagata.das1...@gmail.com> wrote:
>>
>>> Hey all,
>>>
>>> Reviving this thread, but Spark master has already accumulated a huge
>>> amount of changes.  As a downstream project maintainer, I want to really
>>> start testing the new features and other breaking changes, and it's hard to
>>> do that without a Preview release. So the sooner we make a Preview release,
>>> the faster we can start getting feedback for fixing things for a great
>>> Spark 4.0 final release.
>>>
>>> So I urge the community to produce a Spark 4.0 Preview soon even if
>>> certain features targeting the Delta 4.0 release are still incomplete.
>>>
>>> Thanks!
>>>
>>>
>>> On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan  wrote:
>>>
 Thank you all for the replies!

 To @Nicholas Chammas  : Thanks for
 cleaning up the error terminology and documentation! I've merged the first
 PR and let's finish others before the 4.0 release.
 To @Dongjoon Hyun  : Thanks for driving the
 ANSI on by default effort! Now the vote has passed, let's flip the config
 and finish the DataFrame error context feature before 4.0.
 To @Jungtaek Lim  : Ack. We can treat
 the Streaming state store data source as completed for 4.0 then.
 To @Cheng Pan  : Yea we definitely should have a
 preview release. Let's collect more feedback on the ongoing projects and
 then we can propose a date for the preview release.

 On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan  wrote:

> will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0?
>
> Thanks,
> Cheng Pan
>
>
> > On Apr 15, 2024, at 09:58, Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
> >
> > W.r.t. state data source - reader (SPARK-45511), there are several
> follow-up tickets, but we don't plan to address them soon. The current
> implementation is the final shape for Spark 4.0.0, unless there are 
> demands
> on the follow-up tickets.
> >
> > We may want to check the plan for transformWithState - my
> understanding is that we want to release the feature to 4.0.0, but there
> are several remaining works to be done. While the tentative timeline for
> releasing is June 2024, what would be the tentative timeline for the RC 
> cut?
> > (cc. Anish to add more context on the plan for transformWithState)
> >
> > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan 
> wrote:
> > Hi all,
> >
> > It's close to the previously proposed 4.0.0 release date (June
> 2024), and I think it's time to prepare for it and discuss the ongoing
> projects:
> > •
> > ANSI by default
> > • Spark Connect GA
> > • Structured Logging
> > • Streaming state store data source
> > • new data type VARIANT
> > • STRING collation support
> > • Spark k8s operator versioning
> > Please help to add more items to this list that are missed here. I
> would like to volunteer as the release manager for Apache Spark 4.0.0 if
> there is no objection. Thank you all for the great work that fills Spark
> 4.0!
> >
> > Wenchen Fan
>
>


Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Dongjoon Hyun
+1 for next Monday.

Dongjoon.

On Wed, May 1, 2024 at 8:46 AM Tathagata Das 
wrote:

> Next week sounds great! Thank you Wenchen!
>
> On Wed, May 1, 2024 at 11:16 AM Wenchen Fan  wrote:
>
>> Yea I think a preview release won't hurt (without a branch cut). We don't
>> need to wait for all the ongoing projects to be ready. How about we do a
>> 4.0 preview release based on the current master branch next Monday?
>>
>> On Wed, May 1, 2024 at 11:06 PM Tathagata Das <
>> tathagata.das1...@gmail.com> wrote:
>>
>>> Hey all,
>>>
>>> Reviving this thread, but Spark master has already accumulated a huge
>>> amount of changes.  As a downstream project maintainer, I want to really
>>> start testing the new features and other breaking changes, and it's hard to
>>> do that without a Preview release. So the sooner we make a Preview release,
>>> the faster we can start getting feedback for fixing things for a great
>>> Spark 4.0 final release.
>>>
>>> So I urge the community to produce a Spark 4.0 Preview soon even if
>>> certain features targeting the Delta 4.0 release are still incomplete.
>>>
>>> Thanks!
>>>
>>>
>>> On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan  wrote:
>>>
 Thank you all for the replies!

 To @Nicholas Chammas  : Thanks for
 cleaning up the error terminology and documentation! I've merged the first
 PR and let's finish others before the 4.0 release.
 To @Dongjoon Hyun  : Thanks for driving the
 ANSI on by default effort! Now the vote has passed, let's flip the config
 and finish the DataFrame error context feature before 4.0.
 To @Jungtaek Lim  : Ack. We can treat
 the Streaming state store data source as completed for 4.0 then.
 To @Cheng Pan  : Yea we definitely should have a
 preview release. Let's collect more feedback on the ongoing projects and
 then we can propose a date for the preview release.

 On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan  wrote:

> will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0?
>
> Thanks,
> Cheng Pan
>
>
> > On Apr 15, 2024, at 09:58, Jungtaek Lim <
> kabhwan.opensou...@gmail.com> wrote:
> >
> > W.r.t. state data source - reader (SPARK-45511), there are several
> follow-up tickets, but we don't plan to address them soon. The current
> implementation is the final shape for Spark 4.0.0, unless there are 
> demands
> on the follow-up tickets.
> >
> > We may want to check the plan for transformWithState - my
> understanding is that we want to release the feature to 4.0.0, but there
> are several remaining works to be done. While the tentative timeline for
> releasing is June 2024, what would be the tentative timeline for the RC 
> cut?
> > (cc. Anish to add more context on the plan for transformWithState)
> >
> > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan 
> wrote:
> > Hi all,
> >
> > It's close to the previously proposed 4.0.0 release date (June
> 2024), and I think it's time to prepare for it and discuss the ongoing
> projects:
> > •
> > ANSI by default
> > • Spark Connect GA
> > • Structured Logging
> > • Streaming state store data source
> > • new data type VARIANT
> > • STRING collation support
> > • Spark k8s operator versioning
> > Please help to add more items to this list that are missed here. I
> would like to volunteer as the release manager for Apache Spark 4.0.0 if
> there is no objection. Thank you all for the great work that fills Spark
> 4.0!
> >
> > Wenchen Fan
>
>


Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Tathagata Das
Next week sounds great! Thank you Wenchen!

On Wed, May 1, 2024 at 11:16 AM Wenchen Fan  wrote:

> Yea I think a preview release won't hurt (without a branch cut). We don't
> need to wait for all the ongoing projects to be ready. How about we do a
> 4.0 preview release based on the current master branch next Monday?
>
> On Wed, May 1, 2024 at 11:06 PM Tathagata Das 
> wrote:
>
>> Hey all,
>>
>> Reviving this thread, but Spark master has already accumulated a huge
>> amount of changes.  As a downstream project maintainer, I want to really
>> start testing the new features and other breaking changes, and it's hard to
>> do that without a Preview release. So the sooner we make a Preview release,
>> the faster we can start getting feedback for fixing things for a great
>> Spark 4.0 final release.
>>
>> So I urge the community to produce a Spark 4.0 Preview soon even if
>> certain features targeting the Delta 4.0 release are still incomplete.
>>
>> Thanks!
>>
>>
>> On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan  wrote:
>>
>>> Thank you all for the replies!
>>>
>>> To @Nicholas Chammas  : Thanks for cleaning
>>> up the error terminology and documentation! I've merged the first PR and
>>> let's finish others before the 4.0 release.
>>> To @Dongjoon Hyun  : Thanks for driving the
>>> ANSI on by default effort! Now the vote has passed, let's flip the config
>>> and finish the DataFrame error context feature before 4.0.
>>> To @Jungtaek Lim  : Ack. We can treat the
>>> Streaming state store data source as completed for 4.0 then.
>>> To @Cheng Pan  : Yea we definitely should have a
>>> preview release. Let's collect more feedback on the ongoing projects and
>>> then we can propose a date for the preview release.
>>>
>>> On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan  wrote:
>>>
 will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0?

 Thanks,
 Cheng Pan


 > On Apr 15, 2024, at 09:58, Jungtaek Lim 
 wrote:
 >
 > W.r.t. state data source - reader (SPARK-45511), there are several
 follow-up tickets, but we don't plan to address them soon. The current
 implementation is the final shape for Spark 4.0.0, unless there are demands
 on the follow-up tickets.
 >
 > We may want to check the plan for transformWithState - my
 understanding is that we want to release the feature to 4.0.0, but there
 are several remaining works to be done. While the tentative timeline for
 releasing is June 2024, what would be the tentative timeline for the RC 
 cut?
 > (cc. Anish to add more context on the plan for transformWithState)
 >
 > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan 
 wrote:
 > Hi all,
 >
 > It's close to the previously proposed 4.0.0 release date (June 2024),
 and I think it's time to prepare for it and discuss the ongoing projects:
 > •
 > ANSI by default
 > • Spark Connect GA
 > • Structured Logging
 > • Streaming state store data source
 > • new data type VARIANT
 > • STRING collation support
 > • Spark k8s operator versioning
 > Please help to add more items to this list that are missed here. I
 would like to volunteer as the release manager for Apache Spark 4.0.0 if
 there is no objection. Thank you all for the great work that fills Spark
 4.0!
 >
 > Wenchen Fan




Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Wenchen Fan
Yea I think a preview release won't hurt (without a branch cut). We don't
need to wait for all the ongoing projects to be ready. How about we do a
4.0 preview release based on the current master branch next Monday?

On Wed, May 1, 2024 at 11:06 PM Tathagata Das 
wrote:

> Hey all,
>
> Reviving this thread, but Spark master has already accumulated a huge
> amount of changes.  As a downstream project maintainer, I want to really
> start testing the new features and other breaking changes, and it's hard to
> do that without a Preview release. So the sooner we make a Preview release,
> the faster we can start getting feedback for fixing things for a great
> Spark 4.0 final release.
>
> So I urge the community to produce a Spark 4.0 Preview soon even if
> certain features targeting the Delta 4.0 release are still incomplete.
>
> Thanks!
>
>
> On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan  wrote:
>
>> Thank you all for the replies!
>>
>> To @Nicholas Chammas  : Thanks for cleaning
>> up the error terminology and documentation! I've merged the first PR and
>> let's finish others before the 4.0 release.
>> To @Dongjoon Hyun  : Thanks for driving the
>> ANSI on by default effort! Now the vote has passed, let's flip the config
>> and finish the DataFrame error context feature before 4.0.
>> To @Jungtaek Lim  : Ack. We can treat the
>> Streaming state store data source as completed for 4.0 then.
>> To @Cheng Pan  : Yea we definitely should have a
>> preview release. Let's collect more feedback on the ongoing projects and
>> then we can propose a date for the preview release.
>>
>> On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan  wrote:
>>
>>> will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0?
>>>
>>> Thanks,
>>> Cheng Pan
>>>
>>>
>>> > On Apr 15, 2024, at 09:58, Jungtaek Lim 
>>> wrote:
>>> >
>>> > W.r.t. state data source - reader (SPARK-45511), there are several
>>> follow-up tickets, but we don't plan to address them soon. The current
>>> implementation is the final shape for Spark 4.0.0, unless there are demands
>>> on the follow-up tickets.
>>> >
>>> > We may want to check the plan for transformWithState - my
>>> understanding is that we want to release the feature to 4.0.0, but there
>>> are several remaining works to be done. While the tentative timeline for
>>> releasing is June 2024, what would be the tentative timeline for the RC cut?
>>> > (cc. Anish to add more context on the plan for transformWithState)
>>> >
>>> > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan 
>>> wrote:
>>> > Hi all,
>>> >
>>> > It's close to the previously proposed 4.0.0 release date (June 2024),
>>> and I think it's time to prepare for it and discuss the ongoing projects:
>>> > •
>>> > ANSI by default
>>> > • Spark Connect GA
>>> > • Structured Logging
>>> > • Streaming state store data source
>>> > • new data type VARIANT
>>> > • STRING collation support
>>> > • Spark k8s operator versioning
>>> > Please help to add more items to this list that are missed here. I
>>> would like to volunteer as the release manager for Apache Spark 4.0.0 if
>>> there is no objection. Thank you all for the great work that fills Spark
>>> 4.0!
>>> >
>>> > Wenchen Fan
>>>
>>>


Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Tathagata Das
Hey all,

Reviving this thread, but Spark master has already accumulated a huge
amount of changes.  As a downstream project maintainer, I want to really
start testing the new features and other breaking changes, and it's hard to
do that without a Preview release. So the sooner we make a Preview release,
the faster we can start getting feedback for fixing things for a great
Spark 4.0 final release.

So I urge the community to produce a Spark 4.0 Preview soon even if certain
features targeting the Delta 4.0 release are still incomplete.

Thanks!


On Wed, Apr 17, 2024 at 8:35 AM Wenchen Fan  wrote:

> Thank you all for the replies!
>
> To @Nicholas Chammas  : Thanks for cleaning
> up the error terminology and documentation! I've merged the first PR and
> let's finish others before the 4.0 release.
> To @Dongjoon Hyun  : Thanks for driving the ANSI
> on by default effort! Now the vote has passed, let's flip the config and
> finish the DataFrame error context feature before 4.0.
> To @Jungtaek Lim  : Ack. We can treat the
> Streaming state store data source as completed for 4.0 then.
> To @Cheng Pan  : Yea we definitely should have a
> preview release. Let's collect more feedback on the ongoing projects and
> then we can propose a date for the preview release.
>
> On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan  wrote:
>
>> will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0?
>>
>> Thanks,
>> Cheng Pan
>>
>>
>> > On Apr 15, 2024, at 09:58, Jungtaek Lim 
>> wrote:
>> >
>> > W.r.t. state data source - reader (SPARK-45511), there are several
>> follow-up tickets, but we don't plan to address them soon. The current
>> implementation is the final shape for Spark 4.0.0, unless there are demands
>> on the follow-up tickets.
>> >
>> > We may want to check the plan for transformWithState - my understanding
>> is that we want to release the feature to 4.0.0, but there are several
>> remaining works to be done. While the tentative timeline for releasing is
>> June 2024, what would be the tentative timeline for the RC cut?
>> > (cc. Anish to add more context on the plan for transformWithState)
>> >
>> > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan 
>> wrote:
>> > Hi all,
>> >
>> > It's close to the previously proposed 4.0.0 release date (June 2024),
>> and I think it's time to prepare for it and discuss the ongoing projects:
>> > •
>> > ANSI by default
>> > • Spark Connect GA
>> > • Structured Logging
>> > • Streaming state store data source
>> > • new data type VARIANT
>> > • STRING collation support
>> > • Spark k8s operator versioning
>> > Please help to add more items to this list that are missed here. I
>> would like to volunteer as the release manager for Apache Spark 4.0.0 if
>> there is no objection. Thank you all for the great work that fills Spark
>> 4.0!
>> >
>> > Wenchen Fan
>>
>>


Re: [DISCUSS] Spark 4.0.0 release

2024-04-17 Thread Wenchen Fan
Thank you all for the replies!

To @Nicholas Chammas  : Thanks for cleaning up
the error terminology and documentation! I've merged the first PR and let's
finish others before the 4.0 release.
To @Dongjoon Hyun  : Thanks for driving the ANSI
on by default effort! Now the vote has passed, let's flip the config and
finish the DataFrame error context feature before 4.0.
To @Jungtaek Lim  : Ack. We can treat the
Streaming state store data source as completed for 4.0 then.
To @Cheng Pan  : Yea we definitely should have a
preview release. Let's collect more feedback on the ongoing projects and
then we can propose a date for the preview release.

On Wed, Apr 17, 2024 at 1:22 PM Cheng Pan  wrote:

> will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0?
>
> Thanks,
> Cheng Pan
>
>
> > On Apr 15, 2024, at 09:58, Jungtaek Lim 
> wrote:
> >
> > W.r.t. state data source - reader (SPARK-45511), there are several
> follow-up tickets, but we don't plan to address them soon. The current
> implementation is the final shape for Spark 4.0.0, unless there are demands
> on the follow-up tickets.
> >
> > We may want to check the plan for transformWithState - my understanding
> is that we want to release the feature to 4.0.0, but there are several
> remaining works to be done. While the tentative timeline for releasing is
> June 2024, what would be the tentative timeline for the RC cut?
> > (cc. Anish to add more context on the plan for transformWithState)
> >
> > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan  wrote:
> > Hi all,
> >
> > It's close to the previously proposed 4.0.0 release date (June 2024),
> and I think it's time to prepare for it and discuss the ongoing projects:
> > •
> > ANSI by default
> > • Spark Connect GA
> > • Structured Logging
> > • Streaming state store data source
> > • new data type VARIANT
> > • STRING collation support
> > • Spark k8s operator versioning
> > Please help to add more items to this list that are missed here. I would
> like to volunteer as the release manager for Apache Spark 4.0.0 if there is
> no objection. Thank you all for the great work that fills Spark 4.0!
> >
> > Wenchen Fan
>
>


Re: [DISCUSS] Spark 4.0.0 release

2024-04-16 Thread Cheng Pan
will we have preview release for 4.0.0 like we did for 2.0.0 and 3.0.0?

Thanks,
Cheng Pan


> On Apr 15, 2024, at 09:58, Jungtaek Lim  wrote:
> 
> W.r.t. state data source - reader (SPARK-45511), there are several follow-up 
> tickets, but we don't plan to address them soon. The current implementation 
> is the final shape for Spark 4.0.0, unless there are demands on the follow-up 
> tickets.
> 
> We may want to check the plan for transformWithState - my understanding is 
> that we want to release the feature to 4.0.0, but there are several remaining 
> works to be done. While the tentative timeline for releasing is June 2024, 
> what would be the tentative timeline for the RC cut?
> (cc. Anish to add more context on the plan for transformWithState)
> 
> On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan  wrote:
> Hi all,
> 
> It's close to the previously proposed 4.0.0 release date (June 2024), and I 
> think it's time to prepare for it and discuss the ongoing projects:
> • 
> ANSI by default
> • Spark Connect GA
> • Structured Logging
> • Streaming state store data source
> • new data type VARIANT
> • STRING collation support
> • Spark k8s operator versioning
> Please help to add more items to this list that are missed here. I would like 
> to volunteer as the release manager for Apache Spark 4.0.0 if there is no 
> objection. Thank you all for the great work that fills Spark 4.0!
> 
> Wenchen Fan


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Spark 4.0.0 release

2024-04-14 Thread Jungtaek Lim
W.r.t. state data source - reader (SPARK-45511
), there are several
follow-up tickets, but we don't plan to address them soon. The current
implementation is the final shape for Spark 4.0.0, unless there are demands
on the follow-up tickets.

We may want to check the plan for transformWithState - my understanding is
that we want to release the feature to 4.0.0, but there are several
remaining works to be done. While the tentative timeline for releasing is
June 2024, what would be the tentative timeline for the RC cut?
(cc. Anish to add more context on the plan for transformWithState)

On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan  wrote:

> Hi all,
>
> It's close to the previously proposed 4.0.0 release date (June 2024), and
> I think it's time to prepare for it and discuss the ongoing projects:
>
>- ANSI by default
>- Spark Connect GA
>- Structured Logging
>- Streaming state store data source
>- new data type VARIANT
>- STRING collation support
>- Spark k8s operator versioning
>
> Please help to add more items to this list that are missed here. I would
> like to volunteer as the release manager for Apache Spark 4.0.0 if there is
> no objection. Thank you all for the great work that fills Spark 4.0!
>
> Wenchen Fan
>


Re: [DISCUSS] Spark 4.0.0 release

2024-04-12 Thread Dongjoon Hyun
Thank you for volunteering, Wenchen.

Dongjoon.

On 2024/04/12 15:11:04 Wenchen Fan wrote:
> Hi all,
> 
> It's close to the previously proposed 4.0.0 release date (June 2024), and I
> think it's time to prepare for it and discuss the ongoing projects:
> 
>- ANSI by default
>- Spark Connect GA
>- Structured Logging
>- Streaming state store data source
>- new data type VARIANT
>- STRING collation support
>- Spark k8s operator versioning
> 
> Please help to add more items to this list that are missed here. I would
> like to volunteer as the release manager for Apache Spark 4.0.0 if there is
> no objection. Thank you all for the great work that fills Spark 4.0!
> 
> Wenchen Fan
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org