Re: Time for Spark 3.4.0 release?

2023-01-24 Thread yangjie01
Thanks Xinrong,

发件人: Dongjoon Hyun 
日期: 2023年1月25日 星期三 15:49
收件人: Hyukjin Kwon 
抄送: Xinrong Meng , "dev@spark.apache.org" 

主题: Re: Time for Spark 3.4.0 release?

Great! Thank you so much, Xinrong!

Dongjoon

On Tue, Jan 24, 2023 at 7:17 PM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
Thanks Xinrong.

On Wed, 25 Jan 2023 at 12:01, Xinrong Meng 
mailto:xinrong.apa...@gmail.com>> wrote:
Hi All,

Apache Spark 3.4 is cut as 
https://github.com/apache/spark/tree/branch-3.4.

Thanks,

Xinrong Meng

On Wed, Jan 18, 2023 at 3:45 PM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
Yeah, these more look like something we should discuss around RC timing. See 
"Spark 3.4 release window" in 
https://spark.apache.org/versioning-policy.html

On Wed, 18 Jan 2023 at 16:28, Enrico Minack 
mailto:i...@enrico.minack.dev>> wrote:
You are saying the RCs are cut from that branch at a later point? What is the 
estimate deadline for that?

Enrico


Am 18.01.23 um 07:59 schrieb Hyukjin Kwon:
These look like we can fix it after the branch-cut so should be fine.

On Wed, 18 Jan 2023 at 15:57, Enrico Minack 
mailto:i...@enrico.minack.dev>> wrote:
Hi Xinrong,

what about regression issue 
https://issues.apache.org/jira/browse/SPARK-40819
and correctness issue 
https://issues.apache.org/jira/browse/SPARK-40885?

The latter gets fixed by either 
https://issues.apache.org/jira/browse/SPARK-41959
 or 
https://issues.apache.org/jira/browse/SPARK-42049.

Are those considered important?

Cheers,
Enrico


Am 18.01.23 um 04:29 schrieb Xinrong Meng:
Hi All,

Considering there are still important issues unresolved (some are as shown 
below), I would suggest to be conservative, we delay the branch-3.4's cut for 
one week.

https://issues.apache.org/jira/browse/SPARK-39375
https://issues.apache.org/jira/browse/SPARK-41589
https://issues.apache.org/jira/browse/SPARK-42075
https://issues.apache.org/jira/browse/SPARK-25299
https://issues.apache.org/jira/browse/SPARK-41053

I plan to cut branch-3.4 at 18:30 PT, January 24, 2023. Please ensure your 
changes for Apache Spark 3.4 to be ready by that time.

Feel free to reply to the email if you have other ongoing big items for Spark 
3.4.

Thanks,

Xinrong Meng

On Sat, Jan 7, 2023 at 9:16 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
Thanks Xinrong.

On Sat, Jan 7, 2023 at 9:18 AM Xinrong Meng 
mailto:xinrong.apa...@gmail.com>> wrote:
The release window for Apache Spark 3.4.0 is updated per 
https://github.com/apache/spark-website/pull/430.

Thank you all!

On Thu, Jan 5, 2023 at 2:10 PM Maxim Gekk 
mailto:maxim.g...@databricks.com>> wrote:
+1

On Thu, Jan 5, 2023 at 12:25 AM huaxin gao 
mailto:huaxin.ga...@gmail.com>> wrote:
+1 Thanks!

On Wed, Jan 4, 2023 at 10:19 AM L. C. Hsieh 
mailto:vii...@gmail.com>> wrote:
+1

Thank you!

On Wed, Jan 4, 2023 at 9:13 AM Chao Sun 
mailto:sunc...@apache.org>> wrote:
+1, thanks!

Chao

On Wed, Jan 4, 2023 at 1:56 AM Mridul Muralidharan 
mailto:mri...@gmail.com>> wrote:

+1, Thanks !

Regards,
Mridul

On Wed, Jan 4, 2023 at 2:20 AM Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:
+1, thanks for driving the release!


Gengliang

On Tue, Jan 3, 2023 at 10:55 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
+1

Thank you!

Dongjoon

On Tue, Jan 3, 2023 at 9:44 PM Rui Wang 
mailto:amaliu...@apache.org>> wrote:
+1 to cut the branch starting from a workday!

Great to see this is happening!

Thanks Xinrong!

-Rui

On Tue, Jan 3, 2023 at 9:21 PM 416161...@qq.com 
mailto:ruife...@foxmail.com>> wrote:

Re: Time for Spark 3.4.0 release?

2023-01-24 Thread Dongjoon Hyun
Great! Thank you so much, Xinrong!

Dongjoon

On Tue, Jan 24, 2023 at 7:17 PM Hyukjin Kwon  wrote:

> Thanks Xinrong.
>
> On Wed, 25 Jan 2023 at 12:01, Xinrong Meng 
> wrote:
>
>> Hi All,
>>
>> Apache Spark 3.4 is cut as
>> https://github.com/apache/spark/tree/branch-3.4.
>>
>> Thanks,
>>
>> Xinrong Meng
>>
>> On Wed, Jan 18, 2023 at 3:45 PM Hyukjin Kwon  wrote:
>>
>>> Yeah, these more look like something we should discuss around RC timing.
>>> See "Spark 3.4 release window" in
>>> https://spark.apache.org/versioning-policy.html
>>>
>>> On Wed, 18 Jan 2023 at 16:28, Enrico Minack 
>>> wrote:
>>>
 You are saying the RCs are cut from that branch at a later point? What
 is the estimate deadline for that?

 Enrico


 Am 18.01.23 um 07:59 schrieb Hyukjin Kwon:

 These look like we can fix it after the branch-cut so should be fine.

 On Wed, 18 Jan 2023 at 15:57, Enrico Minack 
 wrote:

> Hi Xinrong,
>
> what about regression issue
> https://issues.apache.org/jira/browse/SPARK-40819
> and correctness issue
> https://issues.apache.org/jira/browse/SPARK-40885?
>
> The latter gets fixed by either
> https://issues.apache.org/jira/browse/SPARK-41959 or
> https://issues.apache.org/jira/browse/SPARK-42049.
>
> Are those considered important?
>
> Cheers,
> Enrico
>
>
> Am 18.01.23 um 04:29 schrieb Xinrong Meng:
>
> Hi All,
>
> Considering there are still important issues unresolved (some are as
> shown below), I would suggest to be conservative, we delay the 
> branch-3.4's
> cut for one week.
>
> https://issues.apache.org/jira/browse/SPARK-39375
> https://issues.apache.org/jira/browse/SPARK-41589
> https://issues.apache.org/jira/browse/SPARK-42075
> https://issues.apache.org/jira/browse/SPARK-25299
> https://issues.apache.org/jira/browse/SPARK-41053
>
> I plan to cut *branch-3.4* at *18:30 PT, January 24, 2023*. Please
> ensure your changes for Apache Spark 3.4 to be ready by that time.
>
> Feel free to reply to the email if you have other ongoing big items
> for Spark 3.4.
>
> Thanks,
>
> Xinrong Meng
>
> On Sat, Jan 7, 2023 at 9:16 AM Hyukjin Kwon 
> wrote:
>
>> Thanks Xinrong.
>>
>> On Sat, Jan 7, 2023 at 9:18 AM Xinrong Meng 
>> wrote:
>>
>>> The release window for Apache Spark 3.4.0 is updated per
>>> https://github.com/apache/spark-website/pull/430.
>>>
>>> Thank you all!
>>>
>>> On Thu, Jan 5, 2023 at 2:10 PM Maxim Gekk 
>>> wrote:
>>>
 +1

 On Thu, Jan 5, 2023 at 12:25 AM huaxin gao 
 wrote:

> +1 Thanks!
>
> On Wed, Jan 4, 2023 at 10:19 AM L. C. Hsieh 
> wrote:
>
>> +1
>>
>> Thank you!
>>
>> On Wed, Jan 4, 2023 at 9:13 AM Chao Sun 
>> wrote:
>>
>>> +1, thanks!
>>>
>>> Chao
>>>
>>> On Wed, Jan 4, 2023 at 1:56 AM Mridul Muralidharan <
>>> mri...@gmail.com> wrote:
>>>

 +1, Thanks !

 Regards,
 Mridul

 On Wed, Jan 4, 2023 at 2:20 AM Gengliang Wang 
 wrote:

> +1, thanks for driving the release!
>
>
> Gengliang
>
> On Tue, Jan 3, 2023 at 10:55 PM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
>
>> +1
>>
>> Thank you!
>>
>> Dongjoon
>>
>> On Tue, Jan 3, 2023 at 9:44 PM Rui Wang 
>> wrote:
>>
>>> +1 to cut the branch starting from a workday!
>>>
>>> Great to see this is happening!
>>>
>>> Thanks Xinrong!
>>>
>>> -Rui
>>>
>>> On Tue, Jan 3, 2023 at 9:21 PM 416161...@qq.com <
>>> ruife...@foxmail.com> wrote:
>>>
 +1, thank you Xinrong for driving this release!

 --
 Ruifeng Zheng
 ruife...@foxmail.com

 



 -- Original --
 *From:* "Hyukjin Kwon" ;
 *Date:* Wed, Jan 4, 2023 01:15 PM
 *To:* "Xinrong Meng";
 *Cc:* "dev";
 *Subject:* Re: Time for Spark 3.4.0 release?

 

Re: Time for Spark 3.4.0 release?

2023-01-24 Thread Hyukjin Kwon
Thanks Xinrong.

On Wed, 25 Jan 2023 at 12:01, Xinrong Meng  wrote:

> Hi All,
>
> Apache Spark 3.4 is cut as https://github.com/apache/spark/tree/branch-3.4
> .
>
> Thanks,
>
> Xinrong Meng
>
> On Wed, Jan 18, 2023 at 3:45 PM Hyukjin Kwon  wrote:
>
>> Yeah, these more look like something we should discuss around RC timing.
>> See "Spark 3.4 release window" in
>> https://spark.apache.org/versioning-policy.html
>>
>> On Wed, 18 Jan 2023 at 16:28, Enrico Minack 
>> wrote:
>>
>>> You are saying the RCs are cut from that branch at a later point? What
>>> is the estimate deadline for that?
>>>
>>> Enrico
>>>
>>>
>>> Am 18.01.23 um 07:59 schrieb Hyukjin Kwon:
>>>
>>> These look like we can fix it after the branch-cut so should be fine.
>>>
>>> On Wed, 18 Jan 2023 at 15:57, Enrico Minack 
>>> wrote:
>>>
 Hi Xinrong,

 what about regression issue
 https://issues.apache.org/jira/browse/SPARK-40819
 and correctness issue https://issues.apache.org/jira/browse/SPARK-40885
 ?

 The latter gets fixed by either
 https://issues.apache.org/jira/browse/SPARK-41959 or
 https://issues.apache.org/jira/browse/SPARK-42049.

 Are those considered important?

 Cheers,
 Enrico


 Am 18.01.23 um 04:29 schrieb Xinrong Meng:

 Hi All,

 Considering there are still important issues unresolved (some are as
 shown below), I would suggest to be conservative, we delay the branch-3.4's
 cut for one week.

 https://issues.apache.org/jira/browse/SPARK-39375
 https://issues.apache.org/jira/browse/SPARK-41589
 https://issues.apache.org/jira/browse/SPARK-42075
 https://issues.apache.org/jira/browse/SPARK-25299
 https://issues.apache.org/jira/browse/SPARK-41053

 I plan to cut *branch-3.4* at *18:30 PT, January 24, 2023*. Please
 ensure your changes for Apache Spark 3.4 to be ready by that time.

 Feel free to reply to the email if you have other ongoing big items for
 Spark 3.4.

 Thanks,

 Xinrong Meng

 On Sat, Jan 7, 2023 at 9:16 AM Hyukjin Kwon 
 wrote:

> Thanks Xinrong.
>
> On Sat, Jan 7, 2023 at 9:18 AM Xinrong Meng 
> wrote:
>
>> The release window for Apache Spark 3.4.0 is updated per
>> https://github.com/apache/spark-website/pull/430.
>>
>> Thank you all!
>>
>> On Thu, Jan 5, 2023 at 2:10 PM Maxim Gekk 
>> wrote:
>>
>>> +1
>>>
>>> On Thu, Jan 5, 2023 at 12:25 AM huaxin gao 
>>> wrote:
>>>
 +1 Thanks!

 On Wed, Jan 4, 2023 at 10:19 AM L. C. Hsieh 
 wrote:

> +1
>
> Thank you!
>
> On Wed, Jan 4, 2023 at 9:13 AM Chao Sun 
> wrote:
>
>> +1, thanks!
>>
>> Chao
>>
>> On Wed, Jan 4, 2023 at 1:56 AM Mridul Muralidharan <
>> mri...@gmail.com> wrote:
>>
>>>
>>> +1, Thanks !
>>>
>>> Regards,
>>> Mridul
>>>
>>> On Wed, Jan 4, 2023 at 2:20 AM Gengliang Wang 
>>> wrote:
>>>
 +1, thanks for driving the release!


 Gengliang

 On Tue, Jan 3, 2023 at 10:55 PM Dongjoon Hyun <
 dongjoon.h...@gmail.com> wrote:

> +1
>
> Thank you!
>
> Dongjoon
>
> On Tue, Jan 3, 2023 at 9:44 PM Rui Wang 
> wrote:
>
>> +1 to cut the branch starting from a workday!
>>
>> Great to see this is happening!
>>
>> Thanks Xinrong!
>>
>> -Rui
>>
>> On Tue, Jan 3, 2023 at 9:21 PM 416161...@qq.com <
>> ruife...@foxmail.com> wrote:
>>
>>> +1, thank you Xinrong for driving this release!
>>>
>>> --
>>> Ruifeng Zheng
>>> ruife...@foxmail.com
>>>
>>> 
>>>
>>>
>>>
>>> -- Original --
>>> *From:* "Hyukjin Kwon" ;
>>> *Date:* Wed, Jan 4, 2023 01:15 PM
>>> *To:* "Xinrong Meng";
>>> *Cc:* "dev";
>>> *Subject:* Re: Time for Spark 3.4.0 release?
>>>
>>> SGTM +1
>>>
>>> On Wed, Jan 4, 2023 at 2:13 PM Xinrong Meng <
>>> xinrong.apa...@gmail.com> wrote:
>>>
 Hi All,

 Shall we cut *branch-3.4* on *January 16th, 2023*? We

Re: Time for Spark 3.4.0 release?

2023-01-24 Thread Xinrong Meng
Hi All,

Apache Spark 3.4 is cut as https://github.com/apache/spark/tree/branch-3.4.

Thanks,

Xinrong Meng

On Wed, Jan 18, 2023 at 3:45 PM Hyukjin Kwon  wrote:

> Yeah, these more look like something we should discuss around RC timing.
> See "Spark 3.4 release window" in
> https://spark.apache.org/versioning-policy.html
>
> On Wed, 18 Jan 2023 at 16:28, Enrico Minack 
> wrote:
>
>> You are saying the RCs are cut from that branch at a later point? What is
>> the estimate deadline for that?
>>
>> Enrico
>>
>>
>> Am 18.01.23 um 07:59 schrieb Hyukjin Kwon:
>>
>> These look like we can fix it after the branch-cut so should be fine.
>>
>> On Wed, 18 Jan 2023 at 15:57, Enrico Minack 
>> wrote:
>>
>>> Hi Xinrong,
>>>
>>> what about regression issue
>>> https://issues.apache.org/jira/browse/SPARK-40819
>>> and correctness issue https://issues.apache.org/jira/browse/SPARK-40885?
>>>
>>> The latter gets fixed by either
>>> https://issues.apache.org/jira/browse/SPARK-41959 or
>>> https://issues.apache.org/jira/browse/SPARK-42049.
>>>
>>> Are those considered important?
>>>
>>> Cheers,
>>> Enrico
>>>
>>>
>>> Am 18.01.23 um 04:29 schrieb Xinrong Meng:
>>>
>>> Hi All,
>>>
>>> Considering there are still important issues unresolved (some are as
>>> shown below), I would suggest to be conservative, we delay the branch-3.4's
>>> cut for one week.
>>>
>>> https://issues.apache.org/jira/browse/SPARK-39375
>>> https://issues.apache.org/jira/browse/SPARK-41589
>>> https://issues.apache.org/jira/browse/SPARK-42075
>>> https://issues.apache.org/jira/browse/SPARK-25299
>>> https://issues.apache.org/jira/browse/SPARK-41053
>>>
>>> I plan to cut *branch-3.4* at *18:30 PT, January 24, 2023*. Please
>>> ensure your changes for Apache Spark 3.4 to be ready by that time.
>>>
>>> Feel free to reply to the email if you have other ongoing big items for
>>> Spark 3.4.
>>>
>>> Thanks,
>>>
>>> Xinrong Meng
>>>
>>> On Sat, Jan 7, 2023 at 9:16 AM Hyukjin Kwon  wrote:
>>>
 Thanks Xinrong.

 On Sat, Jan 7, 2023 at 9:18 AM Xinrong Meng 
 wrote:

> The release window for Apache Spark 3.4.0 is updated per
> https://github.com/apache/spark-website/pull/430.
>
> Thank you all!
>
> On Thu, Jan 5, 2023 at 2:10 PM Maxim Gekk 
> wrote:
>
>> +1
>>
>> On Thu, Jan 5, 2023 at 12:25 AM huaxin gao 
>> wrote:
>>
>>> +1 Thanks!
>>>
>>> On Wed, Jan 4, 2023 at 10:19 AM L. C. Hsieh 
>>> wrote:
>>>
 +1

 Thank you!

 On Wed, Jan 4, 2023 at 9:13 AM Chao Sun  wrote:

> +1, thanks!
>
> Chao
>
> On Wed, Jan 4, 2023 at 1:56 AM Mridul Muralidharan <
> mri...@gmail.com> wrote:
>
>>
>> +1, Thanks !
>>
>> Regards,
>> Mridul
>>
>> On Wed, Jan 4, 2023 at 2:20 AM Gengliang Wang 
>> wrote:
>>
>>> +1, thanks for driving the release!
>>>
>>>
>>> Gengliang
>>>
>>> On Tue, Jan 3, 2023 at 10:55 PM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>>
 +1

 Thank you!

 Dongjoon

 On Tue, Jan 3, 2023 at 9:44 PM Rui Wang 
 wrote:

> +1 to cut the branch starting from a workday!
>
> Great to see this is happening!
>
> Thanks Xinrong!
>
> -Rui
>
> On Tue, Jan 3, 2023 at 9:21 PM 416161...@qq.com <
> ruife...@foxmail.com> wrote:
>
>> +1, thank you Xinrong for driving this release!
>>
>> --
>> Ruifeng Zheng
>> ruife...@foxmail.com
>>
>> 
>>
>>
>>
>> -- Original --
>> *From:* "Hyukjin Kwon" ;
>> *Date:* Wed, Jan 4, 2023 01:15 PM
>> *To:* "Xinrong Meng";
>> *Cc:* "dev";
>> *Subject:* Re: Time for Spark 3.4.0 release?
>>
>> SGTM +1
>>
>> On Wed, Jan 4, 2023 at 2:13 PM Xinrong Meng <
>> xinrong.apa...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> Shall we cut *branch-3.4* on *January 16th, 2023*? We
>>> proposed January 15th per
>>> https://spark.apache.org/versioning-policy.html, but I
>>> would suggest we postpone one day since January 15th is a 
>>> Sunday.
>>>
>>> I would like to volunteer as the 

Pandas UDF cogroup.applyInPandas with multiple dataframes

2023-01-24 Thread Santosh Pingale
Hey all

I have an interesting problem in hand. We have cases where we want to pass 
multiple(20 to 30) data frames to cogroup.applyInPandas function.

RDD currently supports cogroup with upto 4 dataframes (ZippedPartitionsRDD4)  
where as cogroup with pandas can handle only 2 dataframes (with 
ZippedPartitionsRDD2). In our use case, we do not have much control over how 
many data frames we may need in the cogroup.applyInPandas function.

To achieve this, we can:
(a) Implement ZippedPartitionsRDD5, 
ZippedPartitionsRDD..ZippedPartitionsRDD30..ZippedPartitionsRDD50 with 
respective iterators, serializers and so on. This ensures we keep type safety 
intact but a lot more boilerplate code has to be written to achieve this.
(b) Do not use cogroup.applyInPandas, rather use RDD.keyBy.cogroup and then 
getItem in a nested fashion. Then convert data to pandas df in the python 
function. This looks like a good workaround but mistakes are very easy to 
happen. We also don't look at typesafety here from user's point of view.
(c) Implement ZippedPartitionsRDDN and NaryLike with childrenNodes type set to 
Seq[T] which allows for arbitrary number of children to be set. Here we have 
very little boilerplate but we sacrifice type safety.
(d) ... some new suggestions... ?

I have done preliminary work on option (c). It works like a charm but before I 
proceed, is my concern about sacrificed type safety overblown, and do we have 
an approach (d)?
(a) is something that is too much of an investment for it to be useful. (b) is 
okay enough workaround, but it is not very efficient.



signature.asc
Description: Message signed with OpenPGP