Re: Time for Spark 3.3.1 release?

2022-09-14 Thread Dongjoon Hyun
Although it's irrelevant to Apache Spark 3.3.1 release discussion because
3.3.1 is a maintenance release to 3.3.0, you may want to lead it for Apache
Spark 3.4 in a separate thread. For your info, Apache Spark 3.3.1 RC1 does
not include Hadoop 3.3.4 either.

Previously, since we don't want to introduce any risks (or regressions) due
to new Hadoop2 changes, we started to distribute
Hadoop3-distribution additionally and have been enhancing it. As of today,
we recommend to use Hadoop3 distributions on all environments, or recommend
to build custom distributions based on the user environments (if Hadoop3
distribution is not applicable).

Apache Spark community has been highly interested in the blockers where the
users cannot use the official Hadoop3 distribution in user environments.
Please let us know if there exist issues.

Dongjoon.


On Wed, Sep 14, 2022 at 11:42 AM Bjørn Jørgensen 
wrote:

> At least we should upgrade hadoop to the latest version
> https://hadoop.apache.org/release/2.10.2.html
>
> Are there some spesial reasons why we have a hadoop version that is 7
> years old?
>
> ons. 14. sep. 2022, 20:25 skrev Dongjoon Hyun :
>
>> Ya, +1 for Sean's comment.
>>
>> In addition, all Apache Spark's Maven artifacts are depending on Hadoop
>> 3.3.x already.
>>
>>
>> https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/3.3.0
>>
>> https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.13/3.3.0
>>
>> Apache Spark has been moving away from Hadoop 2 due to many many reasons.
>>
>> Dongjoon.
>>
>>
>> On Wed, Sep 14, 2022 at 10:54 AM Sean Owen  wrote:
>>
>>> Yeah we're not going to make convenience binaries for all possible
>>> combinations. It's a pretty good assumption that anyone moving to later
>>> Scala versions is also off old Hadoop versions.
>>> You can of course build the combo you like.
>>>
>>> On Wed, Sep 14, 2022 at 11:26 AM Denis Bolshakov <
>>> bolshakov.de...@gmail.com> wrote:
>>>
 Unfortunately it's for hadoop 3 only.

 ср, 14 сент. 2022 г., 19:04 Dongjoon Hyun :

> Hi, Denis.
>
> Apache Spark community already provides both Scala 2.12 and 2.13
> pre-built distributions.
> Please check the distribution site and Apache Spark download page.
>
> https://dlcdn.apache.org/spark/spark-3.3.0/
>
> spark-3.3.0-bin-hadoop3-scala2.13.tgz
> spark-3.3.0-bin-hadoop3.tgz
>
> [image: Screenshot 2022-09-14 at 9.03.27 AM.png]
>
> Dongjoon.
>
> On Wed, Sep 14, 2022 at 12:31 AM Denis Bolshakov <
> bolshakov.de...@gmail.com> wrote:
>
>> Hello,
>>
>> It would be great if it's possible to provide a spark distro for both
>> scala 2.12 and scala 2.13.
>>
>> It will encourage spark users to switch to scala 2.13.
>>
>> I know that spark jar artifacts available for both scala versions,
>> but it does not make sense to migrate to scala 2.13 while there is no 
>> spark
>> distro for this version.
>>
>> Kind regards,
>> Denis
>>
>> On Tue, 13 Sept 2022 at 17:38, Yuming Wang  wrote:
>>
>>> Thank you all.
>>>
>>> I will be preparing 3.3.1 RC1 soon.
>>>
>>> On Tue, Sep 13, 2022 at 12:09 PM John Zhuge 
>>> wrote:
>>>
 +1

 On Mon, Sep 12, 2022 at 9:08 PM Yang,Jie(INF) 
 wrote:

> +1
>
>
>
> Thanks Yuming ~
>
>
>
> *发件人**: *Hyukjin Kwon 
> *日期**: *2022年9月13日 星期二 08:19
> *收件人**: *Gengliang Wang 
> *抄送**: *"L. C. Hsieh" , Dongjoon Hyun <
> dongjoon.h...@gmail.com>, Yuming Wang , dev <
> dev@spark.apache.org>
> *主题**: *Re: Time for Spark 3.3.1 release?
>
>
>
> +1
>
>
>
> On Tue, 13 Sept 2022 at 06:45, Gengliang Wang 
> wrote:
>
> +1.
>
> Thank you, Yuming!
>
>
>
> On Mon, Sep 12, 2022 at 12:10 PM L. C. Hsieh 
> wrote:
>
> +1
>
> Thanks Yuming!
>
> On Mon, Sep 12, 2022 at 11:50 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> >
> > +1
> >
> > Thanks,
> > Dongjoon.
> >
> > On Mon, Sep 12, 2022 at 6:38 AM Yuming Wang 
> wrote:
> >>
> >> Hi, All.
> >>
> >>
> >>
> >> Since Apache Spark 3.3.0 tag creation (Jun 10), new 138 patches
> including 7 correctness patches arrived at branch-3.3.
> >>
> >>
> >>
> >> Shall we make a new release, Apache Spark 3.3.1, as the second
> release at branch-3.3? I'd like to volunteer as the release manager 
> for
> Apache Spark 3.3.1.
> >>
> >>
> >>
> >> All changes:
> >>
> >> 

Re: Jupyter notebook on Dataproc versus GKE

2022-09-14 Thread Bjørn Jørgensen
Mitch: Why I'm switching from Jupyter Notebooks to JupyterLab...Such a
better experience! DegreeTutors.com 

tir. 6. sep. 2022 kl. 20:28 skrev Holden Karau :

> I’ve used Argo for K8s scheduling, for awhile it’s also what Kubeflow used
> underneath for scheduling.
>
> On Tue, Sep 6, 2022 at 10:01 AM Mich Talebzadeh 
> wrote:
>
>> Thank you all.
>>
>> Has anyone used Argo for k8s scheduler by any chance?
>>
>> On Tue, 6 Sep 2022 at 13:41, Bjørn Jørgensen 
>> wrote:
>>
>>> "*JupyterLab is the next-generation user interface for Project Jupyter
>>> offering all the familiar building blocks of the classic Jupyter Notebook
>>> (notebook, terminal, text editor, file browser, rich outputs, etc.) in a
>>> flexible and powerful user interface.*"
>>> https://github.com/jupyterlab/jupyterlab
>>>
>>> You will find them both at https://jupyter.org
>>>
>>> man. 5. sep. 2022 kl. 23:40 skrev Mich Talebzadeh <
>>> mich.talebza...@gmail.com>:
>>>
 Thanks Bjorn,

 What are the differences and the functionality Jupyerlab brings in on
 top of Jupyter notebook?



view my Linkedin profile
 


  https://en.everybodywiki.com/Mich_Talebzadeh



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Mon, 5 Sept 2022 at 20:58, Bjørn Jørgensen 
 wrote:

> Jupyter notebook is replaced with jupyterlab :)
>
> man. 5. sep. 2022 kl. 21:10 skrev Holden Karau :
>
>>
>>
>> On Mon, Sep 5, 2022 at 9:00 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Thanks for that.
>>>
>>> How do you rate the performance of Jupyter W/Spark on K8s compared
>>> to the same on  a cluster of VMs (example Dataproc).
>>>
>>> Also somehow a related question (may be naive as well). For example,
>>> Google offers a lot of standard ML libraries for example built into a 
>>> data
>>> warehouse like BigQuery. What does the Jupyter notebook offer that 
>>> others
>>> don't?
>>>
>> Jupyter notebook doesn’t offer any particular set of libraries,
>> although you can add your own to the container etc.
>>
>>>
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>> for any loss, damage or destruction of data or any other property which 
>>> may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary 
>>> damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Mon, 5 Sept 2022 at 12:47, Holden Karau 
>>> wrote:
>>>
 I’ve run Jupyter w/Spark on K8s, haven’t tried it with Dataproc
 personally.

 The Spark K8s pod scheduler is now more pluggable for Yunikorn and
 Volcano can be used with less effort.

 On Mon, Sep 5, 2022 at 7:44 AM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

>
> Hi,
>
>
> Has anyone got experience of running Jupyter on dataproc versus
> Jupyter notebook on GKE (k8).
>
>
> I have not looked at this for a while but my understanding is that
> Spark on GKE/k8 is not yet performed. This is classic Spark with
> Python/Pyspark.
>
>
> Also I would like to know the state of spark with Volcano. Has
> progress made on that front.
>
>
> Regards,
>
>
> Mich
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility
> for any loss, damage or destruction of data or any other property 
> which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary 
> damages
> arising from such loss, damage or destruction.
>
>
>
 --
 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 

Re: Time for Spark 3.3.1 release?

2022-09-14 Thread Bjørn Jørgensen
At least we should upgrade hadoop to the latest version
https://hadoop.apache.org/release/2.10.2.html

Are there some spesial reasons why we have a hadoop version that is 7
years old?

ons. 14. sep. 2022, 20:25 skrev Dongjoon Hyun :

> Ya, +1 for Sean's comment.
>
> In addition, all Apache Spark's Maven artifacts are depending on Hadoop
> 3.3.x already.
>
>
> https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/3.3.0
>
> https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.13/3.3.0
>
> Apache Spark has been moving away from Hadoop 2 due to many many reasons.
>
> Dongjoon.
>
>
> On Wed, Sep 14, 2022 at 10:54 AM Sean Owen  wrote:
>
>> Yeah we're not going to make convenience binaries for all possible
>> combinations. It's a pretty good assumption that anyone moving to later
>> Scala versions is also off old Hadoop versions.
>> You can of course build the combo you like.
>>
>> On Wed, Sep 14, 2022 at 11:26 AM Denis Bolshakov <
>> bolshakov.de...@gmail.com> wrote:
>>
>>> Unfortunately it's for hadoop 3 only.
>>>
>>> ср, 14 сент. 2022 г., 19:04 Dongjoon Hyun :
>>>
 Hi, Denis.

 Apache Spark community already provides both Scala 2.12 and 2.13
 pre-built distributions.
 Please check the distribution site and Apache Spark download page.

 https://dlcdn.apache.org/spark/spark-3.3.0/

 spark-3.3.0-bin-hadoop3-scala2.13.tgz
 spark-3.3.0-bin-hadoop3.tgz

 [image: Screenshot 2022-09-14 at 9.03.27 AM.png]

 Dongjoon.

 On Wed, Sep 14, 2022 at 12:31 AM Denis Bolshakov <
 bolshakov.de...@gmail.com> wrote:

> Hello,
>
> It would be great if it's possible to provide a spark distro for both
> scala 2.12 and scala 2.13.
>
> It will encourage spark users to switch to scala 2.13.
>
> I know that spark jar artifacts available for both scala versions, but
> it does not make sense to migrate to scala 2.13 while there is no spark
> distro for this version.
>
> Kind regards,
> Denis
>
> On Tue, 13 Sept 2022 at 17:38, Yuming Wang  wrote:
>
>> Thank you all.
>>
>> I will be preparing 3.3.1 RC1 soon.
>>
>> On Tue, Sep 13, 2022 at 12:09 PM John Zhuge 
>> wrote:
>>
>>> +1
>>>
>>> On Mon, Sep 12, 2022 at 9:08 PM Yang,Jie(INF) 
>>> wrote:
>>>
 +1



 Thanks Yuming ~



 *发件人**: *Hyukjin Kwon 
 *日期**: *2022年9月13日 星期二 08:19
 *收件人**: *Gengliang Wang 
 *抄送**: *"L. C. Hsieh" , Dongjoon Hyun <
 dongjoon.h...@gmail.com>, Yuming Wang , dev <
 dev@spark.apache.org>
 *主题**: *Re: Time for Spark 3.3.1 release?



 +1



 On Tue, 13 Sept 2022 at 06:45, Gengliang Wang 
 wrote:

 +1.

 Thank you, Yuming!



 On Mon, Sep 12, 2022 at 12:10 PM L. C. Hsieh 
 wrote:

 +1

 Thanks Yuming!

 On Mon, Sep 12, 2022 at 11:50 AM Dongjoon Hyun <
 dongjoon.h...@gmail.com> wrote:
 >
 > +1
 >
 > Thanks,
 > Dongjoon.
 >
 > On Mon, Sep 12, 2022 at 6:38 AM Yuming Wang 
 wrote:
 >>
 >> Hi, All.
 >>
 >>
 >>
 >> Since Apache Spark 3.3.0 tag creation (Jun 10), new 138 patches
 including 7 correctness patches arrived at branch-3.3.
 >>
 >>
 >>
 >> Shall we make a new release, Apache Spark 3.3.1, as the second
 release at branch-3.3? I'd like to volunteer as the release manager for
 Apache Spark 3.3.1.
 >>
 >>
 >>
 >> All changes:
 >>
 >> https://github.com/apache/spark/compare/v3.3.0...branch-3.3
 
 >>
 >>
 >>
 >> Correctness issues:
 >>
 >> SPARK-40149: Propagate metadata columns through Project
 >>
 >> SPARK-40002: Don't push down limit through window using ntile
 >>
 >> SPARK-39976: ArrayIntersect should handle null in left
 expression correctly
 >>
 >> SPARK-39833: Disable Parquet column index in DSv1 to fix a
 correctness issue in the case of overlapping partition and data columns
 >>
 >> SPARK-39061: Set nullable correctly for Inline output attributes
 >>
 >> SPARK-39887: RemoveRedundantAliases should keep aliases that
 make the output of projection nodes unique
 >>
 >> SPARK-38614: Don't push down limit through window that's using
 percent_rank


 

Re: Time for Spark 3.3.1 release?

2022-09-14 Thread Dongjoon Hyun
Ya, +1 for Sean's comment.

In addition, all Apache Spark's Maven artifacts are depending on Hadoop
3.3.x already.


https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/3.3.0

https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.13/3.3.0

Apache Spark has been moving away from Hadoop 2 due to many many reasons.

Dongjoon.


On Wed, Sep 14, 2022 at 10:54 AM Sean Owen  wrote:

> Yeah we're not going to make convenience binaries for all possible
> combinations. It's a pretty good assumption that anyone moving to later
> Scala versions is also off old Hadoop versions.
> You can of course build the combo you like.
>
> On Wed, Sep 14, 2022 at 11:26 AM Denis Bolshakov <
> bolshakov.de...@gmail.com> wrote:
>
>> Unfortunately it's for hadoop 3 only.
>>
>> ср, 14 сент. 2022 г., 19:04 Dongjoon Hyun :
>>
>>> Hi, Denis.
>>>
>>> Apache Spark community already provides both Scala 2.12 and 2.13
>>> pre-built distributions.
>>> Please check the distribution site and Apache Spark download page.
>>>
>>> https://dlcdn.apache.org/spark/spark-3.3.0/
>>>
>>> spark-3.3.0-bin-hadoop3-scala2.13.tgz
>>> spark-3.3.0-bin-hadoop3.tgz
>>>
>>> [image: Screenshot 2022-09-14 at 9.03.27 AM.png]
>>>
>>> Dongjoon.
>>>
>>> On Wed, Sep 14, 2022 at 12:31 AM Denis Bolshakov <
>>> bolshakov.de...@gmail.com> wrote:
>>>
 Hello,

 It would be great if it's possible to provide a spark distro for both
 scala 2.12 and scala 2.13.

 It will encourage spark users to switch to scala 2.13.

 I know that spark jar artifacts available for both scala versions, but
 it does not make sense to migrate to scala 2.13 while there is no spark
 distro for this version.

 Kind regards,
 Denis

 On Tue, 13 Sept 2022 at 17:38, Yuming Wang  wrote:

> Thank you all.
>
> I will be preparing 3.3.1 RC1 soon.
>
> On Tue, Sep 13, 2022 at 12:09 PM John Zhuge  wrote:
>
>> +1
>>
>> On Mon, Sep 12, 2022 at 9:08 PM Yang,Jie(INF) 
>> wrote:
>>
>>> +1
>>>
>>>
>>>
>>> Thanks Yuming ~
>>>
>>>
>>>
>>> *发件人**: *Hyukjin Kwon 
>>> *日期**: *2022年9月13日 星期二 08:19
>>> *收件人**: *Gengliang Wang 
>>> *抄送**: *"L. C. Hsieh" , Dongjoon Hyun <
>>> dongjoon.h...@gmail.com>, Yuming Wang , dev <
>>> dev@spark.apache.org>
>>> *主题**: *Re: Time for Spark 3.3.1 release?
>>>
>>>
>>>
>>> +1
>>>
>>>
>>>
>>> On Tue, 13 Sept 2022 at 06:45, Gengliang Wang 
>>> wrote:
>>>
>>> +1.
>>>
>>> Thank you, Yuming!
>>>
>>>
>>>
>>> On Mon, Sep 12, 2022 at 12:10 PM L. C. Hsieh 
>>> wrote:
>>>
>>> +1
>>>
>>> Thanks Yuming!
>>>
>>> On Mon, Sep 12, 2022 at 11:50 AM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>> >
>>> > +1
>>> >
>>> > Thanks,
>>> > Dongjoon.
>>> >
>>> > On Mon, Sep 12, 2022 at 6:38 AM Yuming Wang 
>>> wrote:
>>> >>
>>> >> Hi, All.
>>> >>
>>> >>
>>> >>
>>> >> Since Apache Spark 3.3.0 tag creation (Jun 10), new 138 patches
>>> including 7 correctness patches arrived at branch-3.3.
>>> >>
>>> >>
>>> >>
>>> >> Shall we make a new release, Apache Spark 3.3.1, as the second
>>> release at branch-3.3? I'd like to volunteer as the release manager for
>>> Apache Spark 3.3.1.
>>> >>
>>> >>
>>> >>
>>> >> All changes:
>>> >>
>>> >> https://github.com/apache/spark/compare/v3.3.0...branch-3.3
>>> 
>>> >>
>>> >>
>>> >>
>>> >> Correctness issues:
>>> >>
>>> >> SPARK-40149: Propagate metadata columns through Project
>>> >>
>>> >> SPARK-40002: Don't push down limit through window using ntile
>>> >>
>>> >> SPARK-39976: ArrayIntersect should handle null in left expression
>>> correctly
>>> >>
>>> >> SPARK-39833: Disable Parquet column index in DSv1 to fix a
>>> correctness issue in the case of overlapping partition and data columns
>>> >>
>>> >> SPARK-39061: Set nullable correctly for Inline output attributes
>>> >>
>>> >> SPARK-39887: RemoveRedundantAliases should keep aliases that make
>>> the output of projection nodes unique
>>> >>
>>> >> SPARK-38614: Don't push down limit through window that's using
>>> percent_rank
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>> --
>> John Zhuge
>>
>

 --
 //with Best Regards
 --Denis Bolshakov
 e-mail: bolshakov.de...@gmail.com

>>>


Re: Time for Spark 3.3.1 release?

2022-09-14 Thread Sean Owen
Yeah we're not going to make convenience binaries for all possible
combinations. It's a pretty good assumption that anyone moving to later
Scala versions is also off old Hadoop versions.
You can of course build the combo you like.

On Wed, Sep 14, 2022 at 11:26 AM Denis Bolshakov 
wrote:

> Unfortunately it's for hadoop 3 only.
>
> ср, 14 сент. 2022 г., 19:04 Dongjoon Hyun :
>
>> Hi, Denis.
>>
>> Apache Spark community already provides both Scala 2.12 and 2.13
>> pre-built distributions.
>> Please check the distribution site and Apache Spark download page.
>>
>> https://dlcdn.apache.org/spark/spark-3.3.0/
>>
>> spark-3.3.0-bin-hadoop3-scala2.13.tgz
>> spark-3.3.0-bin-hadoop3.tgz
>>
>> [image: Screenshot 2022-09-14 at 9.03.27 AM.png]
>>
>> Dongjoon.
>>
>> On Wed, Sep 14, 2022 at 12:31 AM Denis Bolshakov <
>> bolshakov.de...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> It would be great if it's possible to provide a spark distro for both
>>> scala 2.12 and scala 2.13.
>>>
>>> It will encourage spark users to switch to scala 2.13.
>>>
>>> I know that spark jar artifacts available for both scala versions, but
>>> it does not make sense to migrate to scala 2.13 while there is no spark
>>> distro for this version.
>>>
>>> Kind regards,
>>> Denis
>>>
>>> On Tue, 13 Sept 2022 at 17:38, Yuming Wang  wrote:
>>>
 Thank you all.

 I will be preparing 3.3.1 RC1 soon.

 On Tue, Sep 13, 2022 at 12:09 PM John Zhuge  wrote:

> +1
>
> On Mon, Sep 12, 2022 at 9:08 PM Yang,Jie(INF) 
> wrote:
>
>> +1
>>
>>
>>
>> Thanks Yuming ~
>>
>>
>>
>> *发件人**: *Hyukjin Kwon 
>> *日期**: *2022年9月13日 星期二 08:19
>> *收件人**: *Gengliang Wang 
>> *抄送**: *"L. C. Hsieh" , Dongjoon Hyun <
>> dongjoon.h...@gmail.com>, Yuming Wang , dev <
>> dev@spark.apache.org>
>> *主题**: *Re: Time for Spark 3.3.1 release?
>>
>>
>>
>> +1
>>
>>
>>
>> On Tue, 13 Sept 2022 at 06:45, Gengliang Wang 
>> wrote:
>>
>> +1.
>>
>> Thank you, Yuming!
>>
>>
>>
>> On Mon, Sep 12, 2022 at 12:10 PM L. C. Hsieh 
>> wrote:
>>
>> +1
>>
>> Thanks Yuming!
>>
>> On Mon, Sep 12, 2022 at 11:50 AM Dongjoon Hyun <
>> dongjoon.h...@gmail.com> wrote:
>> >
>> > +1
>> >
>> > Thanks,
>> > Dongjoon.
>> >
>> > On Mon, Sep 12, 2022 at 6:38 AM Yuming Wang 
>> wrote:
>> >>
>> >> Hi, All.
>> >>
>> >>
>> >>
>> >> Since Apache Spark 3.3.0 tag creation (Jun 10), new 138 patches
>> including 7 correctness patches arrived at branch-3.3.
>> >>
>> >>
>> >>
>> >> Shall we make a new release, Apache Spark 3.3.1, as the second
>> release at branch-3.3? I'd like to volunteer as the release manager for
>> Apache Spark 3.3.1.
>> >>
>> >>
>> >>
>> >> All changes:
>> >>
>> >> https://github.com/apache/spark/compare/v3.3.0...branch-3.3
>> 
>> >>
>> >>
>> >>
>> >> Correctness issues:
>> >>
>> >> SPARK-40149: Propagate metadata columns through Project
>> >>
>> >> SPARK-40002: Don't push down limit through window using ntile
>> >>
>> >> SPARK-39976: ArrayIntersect should handle null in left expression
>> correctly
>> >>
>> >> SPARK-39833: Disable Parquet column index in DSv1 to fix a
>> correctness issue in the case of overlapping partition and data columns
>> >>
>> >> SPARK-39061: Set nullable correctly for Inline output attributes
>> >>
>> >> SPARK-39887: RemoveRedundantAliases should keep aliases that make
>> the output of projection nodes unique
>> >>
>> >> SPARK-38614: Don't push down limit through window that's using
>> percent_rank
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>> --
> John Zhuge
>

>>>
>>> --
>>> //with Best Regards
>>> --Denis Bolshakov
>>> e-mail: bolshakov.de...@gmail.com
>>>
>>


Re: Time for Spark 3.3.1 release?

2022-09-14 Thread Denis Bolshakov
Unfortunately it's for hadoop 3 only.

ср, 14 сент. 2022 г., 19:04 Dongjoon Hyun :

> Hi, Denis.
>
> Apache Spark community already provides both Scala 2.12 and 2.13 pre-built
> distributions.
> Please check the distribution site and Apache Spark download page.
>
> https://dlcdn.apache.org/spark/spark-3.3.0/
>
> spark-3.3.0-bin-hadoop3-scala2.13.tgz
> spark-3.3.0-bin-hadoop3.tgz
>
> [image: Screenshot 2022-09-14 at 9.03.27 AM.png]
>
> Dongjoon.
>
> On Wed, Sep 14, 2022 at 12:31 AM Denis Bolshakov <
> bolshakov.de...@gmail.com> wrote:
>
>> Hello,
>>
>> It would be great if it's possible to provide a spark distro for both
>> scala 2.12 and scala 2.13.
>>
>> It will encourage spark users to switch to scala 2.13.
>>
>> I know that spark jar artifacts available for both scala versions, but it
>> does not make sense to migrate to scala 2.13 while there is no spark distro
>> for this version.
>>
>> Kind regards,
>> Denis
>>
>> On Tue, 13 Sept 2022 at 17:38, Yuming Wang  wrote:
>>
>>> Thank you all.
>>>
>>> I will be preparing 3.3.1 RC1 soon.
>>>
>>> On Tue, Sep 13, 2022 at 12:09 PM John Zhuge  wrote:
>>>
 +1

 On Mon, Sep 12, 2022 at 9:08 PM Yang,Jie(INF) 
 wrote:

> +1
>
>
>
> Thanks Yuming ~
>
>
>
> *发件人**: *Hyukjin Kwon 
> *日期**: *2022年9月13日 星期二 08:19
> *收件人**: *Gengliang Wang 
> *抄送**: *"L. C. Hsieh" , Dongjoon Hyun <
> dongjoon.h...@gmail.com>, Yuming Wang , dev <
> dev@spark.apache.org>
> *主题**: *Re: Time for Spark 3.3.1 release?
>
>
>
> +1
>
>
>
> On Tue, 13 Sept 2022 at 06:45, Gengliang Wang 
> wrote:
>
> +1.
>
> Thank you, Yuming!
>
>
>
> On Mon, Sep 12, 2022 at 12:10 PM L. C. Hsieh  wrote:
>
> +1
>
> Thanks Yuming!
>
> On Mon, Sep 12, 2022 at 11:50 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
> >
> > +1
> >
> > Thanks,
> > Dongjoon.
> >
> > On Mon, Sep 12, 2022 at 6:38 AM Yuming Wang 
> wrote:
> >>
> >> Hi, All.
> >>
> >>
> >>
> >> Since Apache Spark 3.3.0 tag creation (Jun 10), new 138 patches
> including 7 correctness patches arrived at branch-3.3.
> >>
> >>
> >>
> >> Shall we make a new release, Apache Spark 3.3.1, as the second
> release at branch-3.3? I'd like to volunteer as the release manager for
> Apache Spark 3.3.1.
> >>
> >>
> >>
> >> All changes:
> >>
> >> https://github.com/apache/spark/compare/v3.3.0...branch-3.3
> 
> >>
> >>
> >>
> >> Correctness issues:
> >>
> >> SPARK-40149: Propagate metadata columns through Project
> >>
> >> SPARK-40002: Don't push down limit through window using ntile
> >>
> >> SPARK-39976: ArrayIntersect should handle null in left expression
> correctly
> >>
> >> SPARK-39833: Disable Parquet column index in DSv1 to fix a
> correctness issue in the case of overlapping partition and data columns
> >>
> >> SPARK-39061: Set nullable correctly for Inline output attributes
> >>
> >> SPARK-39887: RemoveRedundantAliases should keep aliases that make
> the output of projection nodes unique
> >>
> >> SPARK-38614: Don't push down limit through window that's using
> percent_rank
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
> --
 John Zhuge

>>>
>>
>> --
>> //with Best Regards
>> --Denis Bolshakov
>> e-mail: bolshakov.de...@gmail.com
>>
>


Re: Time for Spark 3.3.1 release?

2022-09-14 Thread Dongjoon Hyun
Hi, Denis.

Apache Spark community already provides both Scala 2.12 and 2.13 pre-built
distributions.
Please check the distribution site and Apache Spark download page.

https://dlcdn.apache.org/spark/spark-3.3.0/

spark-3.3.0-bin-hadoop3-scala2.13.tgz
spark-3.3.0-bin-hadoop3.tgz

[image: Screenshot 2022-09-14 at 9.03.27 AM.png]

Dongjoon.

On Wed, Sep 14, 2022 at 12:31 AM Denis Bolshakov 
wrote:

> Hello,
>
> It would be great if it's possible to provide a spark distro for both
> scala 2.12 and scala 2.13.
>
> It will encourage spark users to switch to scala 2.13.
>
> I know that spark jar artifacts available for both scala versions, but it
> does not make sense to migrate to scala 2.13 while there is no spark distro
> for this version.
>
> Kind regards,
> Denis
>
> On Tue, 13 Sept 2022 at 17:38, Yuming Wang  wrote:
>
>> Thank you all.
>>
>> I will be preparing 3.3.1 RC1 soon.
>>
>> On Tue, Sep 13, 2022 at 12:09 PM John Zhuge  wrote:
>>
>>> +1
>>>
>>> On Mon, Sep 12, 2022 at 9:08 PM Yang,Jie(INF) 
>>> wrote:
>>>
 +1



 Thanks Yuming ~



 *发件人**: *Hyukjin Kwon 
 *日期**: *2022年9月13日 星期二 08:19
 *收件人**: *Gengliang Wang 
 *抄送**: *"L. C. Hsieh" , Dongjoon Hyun <
 dongjoon.h...@gmail.com>, Yuming Wang , dev <
 dev@spark.apache.org>
 *主题**: *Re: Time for Spark 3.3.1 release?



 +1



 On Tue, 13 Sept 2022 at 06:45, Gengliang Wang  wrote:

 +1.

 Thank you, Yuming!



 On Mon, Sep 12, 2022 at 12:10 PM L. C. Hsieh  wrote:

 +1

 Thanks Yuming!

 On Mon, Sep 12, 2022 at 11:50 AM Dongjoon Hyun 
 wrote:
 >
 > +1
 >
 > Thanks,
 > Dongjoon.
 >
 > On Mon, Sep 12, 2022 at 6:38 AM Yuming Wang  wrote:
 >>
 >> Hi, All.
 >>
 >>
 >>
 >> Since Apache Spark 3.3.0 tag creation (Jun 10), new 138 patches
 including 7 correctness patches arrived at branch-3.3.
 >>
 >>
 >>
 >> Shall we make a new release, Apache Spark 3.3.1, as the second
 release at branch-3.3? I'd like to volunteer as the release manager for
 Apache Spark 3.3.1.
 >>
 >>
 >>
 >> All changes:
 >>
 >> https://github.com/apache/spark/compare/v3.3.0...branch-3.3
 
 >>
 >>
 >>
 >> Correctness issues:
 >>
 >> SPARK-40149: Propagate metadata columns through Project
 >>
 >> SPARK-40002: Don't push down limit through window using ntile
 >>
 >> SPARK-39976: ArrayIntersect should handle null in left expression
 correctly
 >>
 >> SPARK-39833: Disable Parquet column index in DSv1 to fix a
 correctness issue in the case of overlapping partition and data columns
 >>
 >> SPARK-39061: Set nullable correctly for Inline output attributes
 >>
 >> SPARK-39887: RemoveRedundantAliases should keep aliases that make
 the output of projection nodes unique
 >>
 >> SPARK-38614: Don't push down limit through window that's using
 percent_rank

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

 --
>>> John Zhuge
>>>
>>
>
> --
> //with Best Regards
> --Denis Bolshakov
> e-mail: bolshakov.de...@gmail.com
>


Re: Time for Spark 3.3.1 release?

2022-09-14 Thread Denis Bolshakov
Hello,

It would be great if it's possible to provide a spark distro for both scala
2.12 and scala 2.13.

It will encourage spark users to switch to scala 2.13.

I know that spark jar artifacts available for both scala versions, but it
does not make sense to migrate to scala 2.13 while there is no spark distro
for this version.

Kind regards,
Denis

On Tue, 13 Sept 2022 at 17:38, Yuming Wang  wrote:

> Thank you all.
>
> I will be preparing 3.3.1 RC1 soon.
>
> On Tue, Sep 13, 2022 at 12:09 PM John Zhuge  wrote:
>
>> +1
>>
>> On Mon, Sep 12, 2022 at 9:08 PM Yang,Jie(INF) 
>> wrote:
>>
>>> +1
>>>
>>>
>>>
>>> Thanks Yuming ~
>>>
>>>
>>>
>>> *发件人**: *Hyukjin Kwon 
>>> *日期**: *2022年9月13日 星期二 08:19
>>> *收件人**: *Gengliang Wang 
>>> *抄送**: *"L. C. Hsieh" , Dongjoon Hyun <
>>> dongjoon.h...@gmail.com>, Yuming Wang , dev <
>>> dev@spark.apache.org>
>>> *主题**: *Re: Time for Spark 3.3.1 release?
>>>
>>>
>>>
>>> +1
>>>
>>>
>>>
>>> On Tue, 13 Sept 2022 at 06:45, Gengliang Wang  wrote:
>>>
>>> +1.
>>>
>>> Thank you, Yuming!
>>>
>>>
>>>
>>> On Mon, Sep 12, 2022 at 12:10 PM L. C. Hsieh  wrote:
>>>
>>> +1
>>>
>>> Thanks Yuming!
>>>
>>> On Mon, Sep 12, 2022 at 11:50 AM Dongjoon Hyun 
>>> wrote:
>>> >
>>> > +1
>>> >
>>> > Thanks,
>>> > Dongjoon.
>>> >
>>> > On Mon, Sep 12, 2022 at 6:38 AM Yuming Wang  wrote:
>>> >>
>>> >> Hi, All.
>>> >>
>>> >>
>>> >>
>>> >> Since Apache Spark 3.3.0 tag creation (Jun 10), new 138 patches
>>> including 7 correctness patches arrived at branch-3.3.
>>> >>
>>> >>
>>> >>
>>> >> Shall we make a new release, Apache Spark 3.3.1, as the second
>>> release at branch-3.3? I'd like to volunteer as the release manager for
>>> Apache Spark 3.3.1.
>>> >>
>>> >>
>>> >>
>>> >> All changes:
>>> >>
>>> >> https://github.com/apache/spark/compare/v3.3.0...branch-3.3
>>> 
>>> >>
>>> >>
>>> >>
>>> >> Correctness issues:
>>> >>
>>> >> SPARK-40149: Propagate metadata columns through Project
>>> >>
>>> >> SPARK-40002: Don't push down limit through window using ntile
>>> >>
>>> >> SPARK-39976: ArrayIntersect should handle null in left expression
>>> correctly
>>> >>
>>> >> SPARK-39833: Disable Parquet column index in DSv1 to fix a
>>> correctness issue in the case of overlapping partition and data columns
>>> >>
>>> >> SPARK-39061: Set nullable correctly for Inline output attributes
>>> >>
>>> >> SPARK-39887: RemoveRedundantAliases should keep aliases that make the
>>> output of projection nodes unique
>>> >>
>>> >> SPARK-38614: Don't push down limit through window that's using
>>> percent_rank
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>> --
>> John Zhuge
>>
>

-- 
//with Best Regards
--Denis Bolshakov
e-mail: bolshakov.de...@gmail.com