Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-14 Thread Holden Karau
I’m going to drink a celebratory afternoon coffee :)

On Tue, Jul 14, 2020 at 12:26 PM shane knapp ☠  wrote:

> this is seriously great news!  let's all take a moment and welcome apache
> spark's python support to the present.  ;)
>
> On Mon, Jul 13, 2020 at 7:26 PM Holden Karau  wrote:
>
>> Awesome, thanks you for driving this forward :)
>>
>> On Mon, Jul 13, 2020 at 7:25 PM Hyukjin Kwon  wrote:
>>
>>> Thank you all. Python 2, 3.4 and 3.5 are dropped now in the master
>>> branch at https://github.com/apache/spark/pull/28957
>>>
>>> 2020년 7월 3일 (금) 오전 10:01, Hyukjin Kwon 님이 작성:
>>>
 Thanks Dongjoon. That makes much more sense now!

 2020년 7월 3일 (금) 오전 12:11, Dongjoon Hyun 님이 작성:

> Thank you, Hyukjin.
>
> According to the Python community, Python 3.5 is also EOF at
> 2020-09-13 (only two months left).
>
> - https://www.python.org/downloads/
>
> So, targeting live Python versions at Apache Spark 3.1.0 (December
> 2020) looks reasonable to me.
>
> For old Python versions, we still have Apache Spark 2.4 LTS and also
> Apache Spark 3.0.x will work.
>
> Bests,
> Dongjoon.
>
>
> On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li 
> wrote:
>
>> +1, especially Python 2
>>
>> Holden Karau  于2020年7月2日周四 上午10:20写道:
>>
>>> I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward.
>>> It will be exciting to get to use more recent Python features. The most
>>> recent Ubuntu LTS ships with 3.7, and while the previous LTS ships with
>>> 3.5, if folks really can’t upgrade there’s conda.
>>>
>>> Is there anyone with a large Python 3.5 fleet who can’t use conda?
>>>
>>> On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon 
>>> wrote:
>>>
 Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think
 we should make such changes in maintenance releases

 2020년 7월 2일 (목) 오전 11:13, Holden Karau 님이 작성:

> To be clear the plan is to drop them in Spark 3.1 onwards, yes?
>
> On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> I would like to discuss dropping deprecated Python versions 2,
>> 3.4 and 3.5 at https://github.com/apache/spark/pull/28957. I
>> assume people support it in general
>> but I am writing this to make sure everybody is happy.
>>
>> Fokko made a very good investigation on it, see
>> https://github.com/apache/spark/pull/28957#issuecomment-652022449
>> .
>> Assuming from the statistics, I think we're pretty safe to drop
>> them.
>> Also note that dropping Python 2 was actually declared at
>> https://python3statement.org/
>>
>> Roughly speaking, there are many main advantages by dropping them:
>>   1. It removes a bunch of hacks we added around 700 lines in
>> PySpark.
>>   2. PyPy2 has a critical bug that causes a flaky test,
>> https://issues.apache.org/jira/browse/SPARK-28358 given my
>> testing and investigation.
>>   3. Users can use Python type hints with Pandas UDFs without
>> thinking about Python version
>>   4. Users can leverage one latest cloudpickle,
>> https://github.com/apache/spark/pull/28950. With Python 3.8+ it
>> can also leverage C pickle.
>>   5. ...
>>
>> So it benefits both users and dev. WDYT guys?
>>
>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
 --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>
-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-14 Thread shane knapp ☠
this is seriously great news!  let's all take a moment and welcome apache
spark's python support to the present.  ;)

On Mon, Jul 13, 2020 at 7:26 PM Holden Karau  wrote:

> Awesome, thanks you for driving this forward :)
>
> On Mon, Jul 13, 2020 at 7:25 PM Hyukjin Kwon  wrote:
>
>> Thank you all. Python 2, 3.4 and 3.5 are dropped now in the master branch
>> at https://github.com/apache/spark/pull/28957
>>
>> 2020년 7월 3일 (금) 오전 10:01, Hyukjin Kwon 님이 작성:
>>
>>> Thanks Dongjoon. That makes much more sense now!
>>>
>>> 2020년 7월 3일 (금) 오전 12:11, Dongjoon Hyun 님이 작성:
>>>
 Thank you, Hyukjin.

 According to the Python community, Python 3.5 is also EOF at 2020-09-13
 (only two months left).

 - https://www.python.org/downloads/

 So, targeting live Python versions at Apache Spark 3.1.0 (December
 2020) looks reasonable to me.

 For old Python versions, we still have Apache Spark 2.4 LTS and also
 Apache Spark 3.0.x will work.

 Bests,
 Dongjoon.


 On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li 
 wrote:

> +1, especially Python 2
>
> Holden Karau  于2020年7月2日周四 上午10:20写道:
>
>> I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward.
>> It will be exciting to get to use more recent Python features. The most
>> recent Ubuntu LTS ships with 3.7, and while the previous LTS ships with
>> 3.5, if folks really can’t upgrade there’s conda.
>>
>> Is there anyone with a large Python 3.5 fleet who can’t use conda?
>>
>> On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon 
>> wrote:
>>
>>> Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think
>>> we should make such changes in maintenance releases
>>>
>>> 2020년 7월 2일 (목) 오전 11:13, Holden Karau 님이 작성:
>>>
 To be clear the plan is to drop them in Spark 3.1 onwards, yes?

 On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon 
 wrote:

> Hi all,
>
> I would like to discuss dropping deprecated Python versions 2, 3.4
> and 3.5 at https://github.com/apache/spark/pull/28957. I assume
> people support it in general
> but I am writing this to make sure everybody is happy.
>
> Fokko made a very good investigation on it, see
> https://github.com/apache/spark/pull/28957#issuecomment-652022449.
> Assuming from the statistics, I think we're pretty safe to drop
> them.
> Also note that dropping Python 2 was actually declared at
> https://python3statement.org/
>
> Roughly speaking, there are many main advantages by dropping them:
>   1. It removes a bunch of hacks we added around 700 lines in
> PySpark.
>   2. PyPy2 has a critical bug that causes a flaky test,
> https://issues.apache.org/jira/browse/SPARK-28358 given my
> testing and investigation.
>   3. Users can use Python type hints with Pandas UDFs without
> thinking about Python version
>   4. Users can leverage one latest cloudpickle,
> https://github.com/apache/spark/pull/28950. With Python 3.8+ it
> can also leverage C pickle.
>   5. ...
>
> So it benefits both users and dev. WDYT guys?
>
>
> --
 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-13 Thread Holden Karau
Awesome, thanks you for driving this forward :)

On Mon, Jul 13, 2020 at 7:25 PM Hyukjin Kwon  wrote:

> Thank you all. Python 2, 3.4 and 3.5 are dropped now in the master branch
> at https://github.com/apache/spark/pull/28957
>
> 2020년 7월 3일 (금) 오전 10:01, Hyukjin Kwon 님이 작성:
>
>> Thanks Dongjoon. That makes much more sense now!
>>
>> 2020년 7월 3일 (금) 오전 12:11, Dongjoon Hyun 님이 작성:
>>
>>> Thank you, Hyukjin.
>>>
>>> According to the Python community, Python 3.5 is also EOF at 2020-09-13
>>> (only two months left).
>>>
>>> - https://www.python.org/downloads/
>>>
>>> So, targeting live Python versions at Apache Spark 3.1.0 (December 2020)
>>> looks reasonable to me.
>>>
>>> For old Python versions, we still have Apache Spark 2.4 LTS and also
>>> Apache Spark 3.0.x will work.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li 
>>> wrote:
>>>
 +1, especially Python 2

 Holden Karau  于2020年7月2日周四 上午10:20写道:

> I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward.
> It will be exciting to get to use more recent Python features. The most
> recent Ubuntu LTS ships with 3.7, and while the previous LTS ships with
> 3.5, if folks really can’t upgrade there’s conda.
>
> Is there anyone with a large Python 3.5 fleet who can’t use conda?
>
> On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon 
> wrote:
>
>> Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we
>> should make such changes in maintenance releases
>>
>> 2020년 7월 2일 (목) 오전 11:13, Holden Karau 님이 작성:
>>
>>> To be clear the plan is to drop them in Spark 3.1 onwards, yes?
>>>
>>> On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon 
>>> wrote:
>>>
 Hi all,

 I would like to discuss dropping deprecated Python versions 2, 3.4
 and 3.5 at https://github.com/apache/spark/pull/28957. I assume
 people support it in general
 but I am writing this to make sure everybody is happy.

 Fokko made a very good investigation on it, see
 https://github.com/apache/spark/pull/28957#issuecomment-652022449.
 Assuming from the statistics, I think we're pretty safe to drop
 them.
 Also note that dropping Python 2 was actually declared at
 https://python3statement.org/

 Roughly speaking, there are many main advantages by dropping them:
   1. It removes a bunch of hacks we added around 700 lines in
 PySpark.
   2. PyPy2 has a critical bug that causes a flaky test,
 https://issues.apache.org/jira/browse/SPARK-28358 given my testing
 and investigation.
   3. Users can use Python type hints with Pandas UDFs without
 thinking about Python version
   4. Users can leverage one latest cloudpickle,
 https://github.com/apache/spark/pull/28950. With Python 3.8+ it
 can also leverage C pickle.
   5. ...

 So it benefits both users and dev. WDYT guys?


 --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-13 Thread Hyukjin Kwon
Thank you all. Python 2, 3.4 and 3.5 are dropped now in the master branch
at https://github.com/apache/spark/pull/28957

2020년 7월 3일 (금) 오전 10:01, Hyukjin Kwon 님이 작성:

> Thanks Dongjoon. That makes much more sense now!
>
> 2020년 7월 3일 (금) 오전 12:11, Dongjoon Hyun 님이 작성:
>
>> Thank you, Hyukjin.
>>
>> According to the Python community, Python 3.5 is also EOF at 2020-09-13
>> (only two months left).
>>
>> - https://www.python.org/downloads/
>>
>> So, targeting live Python versions at Apache Spark 3.1.0 (December 2020)
>> looks reasonable to me.
>>
>> For old Python versions, we still have Apache Spark 2.4 LTS and also
>> Apache Spark 3.0.x will work.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li 
>> wrote:
>>
>>> +1, especially Python 2
>>>
>>> Holden Karau  于2020年7月2日周四 上午10:20写道:
>>>
 I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It
 will be exciting to get to use more recent Python features. The most recent
 Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if
 folks really can’t upgrade there’s conda.

 Is there anyone with a large Python 3.5 fleet who can’t use conda?

 On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon 
 wrote:

> Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we
> should make such changes in maintenance releases
>
> 2020년 7월 2일 (목) 오전 11:13, Holden Karau 님이 작성:
>
>> To be clear the plan is to drop them in Spark 3.1 onwards, yes?
>>
>> On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon 
>> wrote:
>>
>>> Hi all,
>>>
>>> I would like to discuss dropping deprecated Python versions 2, 3.4
>>> and 3.5 at https://github.com/apache/spark/pull/28957. I assume
>>> people support it in general
>>> but I am writing this to make sure everybody is happy.
>>>
>>> Fokko made a very good investigation on it, see
>>> https://github.com/apache/spark/pull/28957#issuecomment-652022449.
>>> Assuming from the statistics, I think we're pretty safe to drop them.
>>> Also note that dropping Python 2 was actually declared at
>>> https://python3statement.org/
>>>
>>> Roughly speaking, there are many main advantages by dropping them:
>>>   1. It removes a bunch of hacks we added around 700 lines in
>>> PySpark.
>>>   2. PyPy2 has a critical bug that causes a flaky test,
>>> https://issues.apache.org/jira/browse/SPARK-28358 given my testing
>>> and investigation.
>>>   3. Users can use Python type hints with Pandas UDFs without
>>> thinking about Python version
>>>   4. Users can leverage one latest cloudpickle,
>>> https://github.com/apache/spark/pull/28950. With Python 3.8+ it can
>>> also leverage C pickle.
>>>   5. ...
>>>
>>> So it benefits both users and dev. WDYT guys?
>>>
>>>
>>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
> --
 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>>


Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-02 Thread Hyukjin Kwon
Thanks Dongjoon. That makes much more sense now!

2020년 7월 3일 (금) 오전 12:11, Dongjoon Hyun 님이 작성:

> Thank you, Hyukjin.
>
> According to the Python community, Python 3.5 is also EOF at 2020-09-13
> (only two months left).
>
> - https://www.python.org/downloads/
>
> So, targeting live Python versions at Apache Spark 3.1.0 (December 2020)
> looks reasonable to me.
>
> For old Python versions, we still have Apache Spark 2.4 LTS and also
> Apache Spark 3.0.x will work.
>
> Bests,
> Dongjoon.
>
>
> On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li 
> wrote:
>
>> +1, especially Python 2
>>
>> Holden Karau  于2020年7月2日周四 上午10:20写道:
>>
>>> I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It
>>> will be exciting to get to use more recent Python features. The most recent
>>> Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if
>>> folks really can’t upgrade there’s conda.
>>>
>>> Is there anyone with a large Python 3.5 fleet who can’t use conda?
>>>
>>> On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon  wrote:
>>>
 Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we
 should make such changes in maintenance releases

 2020년 7월 2일 (목) 오전 11:13, Holden Karau 님이 작성:

> To be clear the plan is to drop them in Spark 3.1 onwards, yes?
>
> On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon 
> wrote:
>
>> Hi all,
>>
>> I would like to discuss dropping deprecated Python versions 2, 3.4
>> and 3.5 at https://github.com/apache/spark/pull/28957. I assume
>> people support it in general
>> but I am writing this to make sure everybody is happy.
>>
>> Fokko made a very good investigation on it, see
>> https://github.com/apache/spark/pull/28957#issuecomment-652022449.
>> Assuming from the statistics, I think we're pretty safe to drop them.
>> Also note that dropping Python 2 was actually declared at
>> https://python3statement.org/
>>
>> Roughly speaking, there are many main advantages by dropping them:
>>   1. It removes a bunch of hacks we added around 700 lines in PySpark.
>>   2. PyPy2 has a critical bug that causes a flaky test,
>> https://issues.apache.org/jira/browse/SPARK-28358 given my testing
>> and investigation.
>>   3. Users can use Python type hints with Pandas UDFs without
>> thinking about Python version
>>   4. Users can leverage one latest cloudpickle,
>> https://github.com/apache/spark/pull/28950. With Python 3.8+ it can
>> also leverage C pickle.
>>   5. ...
>>
>> So it benefits both users and dev. WDYT guys?
>>
>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>
 --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>


Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-02 Thread Dongjoon Hyun
Thank you, Hyukjin.

According to the Python community, Python 3.5 is also EOF at 2020-09-13
(only two months left).

- https://www.python.org/downloads/

So, targeting live Python versions at Apache Spark 3.1.0 (December 2020)
looks reasonable to me.

For old Python versions, we still have Apache Spark 2.4 LTS and also Apache
Spark 3.0.x will work.

Bests,
Dongjoon.


On Wed, Jul 1, 2020 at 10:50 PM Yuanjian Li  wrote:

> +1, especially Python 2
>
> Holden Karau  于2020年7月2日周四 上午10:20写道:
>
>> I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It
>> will be exciting to get to use more recent Python features. The most recent
>> Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if
>> folks really can’t upgrade there’s conda.
>>
>> Is there anyone with a large Python 3.5 fleet who can’t use conda?
>>
>> On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon  wrote:
>>
>>> Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we
>>> should make such changes in maintenance releases
>>>
>>> 2020년 7월 2일 (목) 오전 11:13, Holden Karau 님이 작성:
>>>
 To be clear the plan is to drop them in Spark 3.1 onwards, yes?

 On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon 
 wrote:

> Hi all,
>
> I would like to discuss dropping deprecated Python versions 2, 3.4 and
> 3.5 at https://github.com/apache/spark/pull/28957. I assume people
> support it in general
> but I am writing this to make sure everybody is happy.
>
> Fokko made a very good investigation on it, see
> https://github.com/apache/spark/pull/28957#issuecomment-652022449.
> Assuming from the statistics, I think we're pretty safe to drop them.
> Also note that dropping Python 2 was actually declared at
> https://python3statement.org/
>
> Roughly speaking, there are many main advantages by dropping them:
>   1. It removes a bunch of hacks we added around 700 lines in PySpark.
>   2. PyPy2 has a critical bug that causes a flaky test,
> https://issues.apache.org/jira/browse/SPARK-28358 given my testing
> and investigation.
>   3. Users can use Python type hints with Pandas UDFs without thinking
> about Python version
>   4. Users can leverage one latest cloudpickle,
> https://github.com/apache/spark/pull/28950. With Python 3.8+ it can
> also leverage C pickle.
>   5. ...
>
> So it benefits both users and dev. WDYT guys?
>
>
> --
 Twitter: https://twitter.com/holdenkarau
 Books (Learning Spark, High Performance Spark, etc.):
 https://amzn.to/2MaRAG9  
 YouTube Live Streams: https://www.youtube.com/user/holdenkarau

>>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>


Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-01 Thread Yuanjian Li
+1, especially Python 2

Holden Karau  于2020年7月2日周四 上午10:20写道:

> I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It
> will be exciting to get to use more recent Python features. The most recent
> Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if
> folks really can’t upgrade there’s conda.
>
> Is there anyone with a large Python 3.5 fleet who can’t use conda?
>
> On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon  wrote:
>
>> Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we
>> should make such changes in maintenance releases
>>
>> 2020년 7월 2일 (목) 오전 11:13, Holden Karau 님이 작성:
>>
>>> To be clear the plan is to drop them in Spark 3.1 onwards, yes?
>>>
>>> On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon  wrote:
>>>
 Hi all,

 I would like to discuss dropping deprecated Python versions 2, 3.4 and
 3.5 at https://github.com/apache/spark/pull/28957. I assume people
 support it in general
 but I am writing this to make sure everybody is happy.

 Fokko made a very good investigation on it, see
 https://github.com/apache/spark/pull/28957#issuecomment-652022449.
 Assuming from the statistics, I think we're pretty safe to drop them.
 Also note that dropping Python 2 was actually declared at
 https://python3statement.org/

 Roughly speaking, there are many main advantages by dropping them:
   1. It removes a bunch of hacks we added around 700 lines in PySpark.
   2. PyPy2 has a critical bug that causes a flaky test,
 https://issues.apache.org/jira/browse/SPARK-28358 given my testing and
 investigation.
   3. Users can use Python type hints with Pandas UDFs without thinking
 about Python version
   4. Users can leverage one latest cloudpickle,
 https://github.com/apache/spark/pull/28950. With Python 3.8+ it can
 also leverage C pickle.
   5. ...

 So it benefits both users and dev. WDYT guys?


 --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-01 Thread Holden Karau
I’m ok with us dropping Python 2, 3.4, and 3.5 in Spark 3.1 forward. It
will be exciting to get to use more recent Python features. The most recent
Ubuntu LTS ships with 3.7, and while the previous LTS ships with 3.5, if
folks really can’t upgrade there’s conda.

Is there anyone with a large Python 3.5 fleet who can’t use conda?

On Wed, Jul 1, 2020 at 7:15 PM Hyukjin Kwon  wrote:

> Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we
> should make such changes in maintenance releases
>
> 2020년 7월 2일 (목) 오전 11:13, Holden Karau 님이 작성:
>
>> To be clear the plan is to drop them in Spark 3.1 onwards, yes?
>>
>> On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon  wrote:
>>
>>> Hi all,
>>>
>>> I would like to discuss dropping deprecated Python versions 2, 3.4 and
>>> 3.5 at https://github.com/apache/spark/pull/28957. I assume people
>>> support it in general
>>> but I am writing this to make sure everybody is happy.
>>>
>>> Fokko made a very good investigation on it, see
>>> https://github.com/apache/spark/pull/28957#issuecomment-652022449.
>>> Assuming from the statistics, I think we're pretty safe to drop them.
>>> Also note that dropping Python 2 was actually declared at
>>> https://python3statement.org/
>>>
>>> Roughly speaking, there are many main advantages by dropping them:
>>>   1. It removes a bunch of hacks we added around 700 lines in PySpark.
>>>   2. PyPy2 has a critical bug that causes a flaky test,
>>> https://issues.apache.org/jira/browse/SPARK-28358 given my testing and
>>> investigation.
>>>   3. Users can use Python type hints with Pandas UDFs without thinking
>>> about Python version
>>>   4. Users can leverage one latest cloudpickle,
>>> https://github.com/apache/spark/pull/28950. With Python 3.8+ it can
>>> also leverage C pickle.
>>>   5. ...
>>>
>>> So it benefits both users and dev. WDYT guys?
>>>
>>>
>>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-01 Thread Hyukjin Kwon
Yeah, sure. It will be dropped at Spark 3.1 onwards. I don't think we
should make such changes in maintenance releases

2020년 7월 2일 (목) 오전 11:13, Holden Karau 님이 작성:

> To be clear the plan is to drop them in Spark 3.1 onwards, yes?
>
> On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon  wrote:
>
>> Hi all,
>>
>> I would like to discuss dropping deprecated Python versions 2, 3.4 and
>> 3.5 at https://github.com/apache/spark/pull/28957. I assume people
>> support it in general
>> but I am writing this to make sure everybody is happy.
>>
>> Fokko made a very good investigation on it, see
>> https://github.com/apache/spark/pull/28957#issuecomment-652022449.
>> Assuming from the statistics, I think we're pretty safe to drop them.
>> Also note that dropping Python 2 was actually declared at
>> https://python3statement.org/
>>
>> Roughly speaking, there are many main advantages by dropping them:
>>   1. It removes a bunch of hacks we added around 700 lines in PySpark.
>>   2. PyPy2 has a critical bug that causes a flaky test,
>> https://issues.apache.org/jira/browse/SPARK-28358 given my testing and
>> investigation.
>>   3. Users can use Python type hints with Pandas UDFs without thinking
>> about Python version
>>   4. Users can leverage one latest cloudpickle,
>> https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also
>> leverage C pickle.
>>   5. ...
>>
>> So it benefits both users and dev. WDYT guys?
>>
>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>


Re: [DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-01 Thread Holden Karau
To be clear the plan is to drop them in Spark 3.1 onwards, yes?

On Wed, Jul 1, 2020 at 7:11 PM Hyukjin Kwon  wrote:

> Hi all,
>
> I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5
> at https://github.com/apache/spark/pull/28957. I assume people support it
> in general
> but I am writing this to make sure everybody is happy.
>
> Fokko made a very good investigation on it, see
> https://github.com/apache/spark/pull/28957#issuecomment-652022449.
> Assuming from the statistics, I think we're pretty safe to drop them.
> Also note that dropping Python 2 was actually declared at
> https://python3statement.org/
>
> Roughly speaking, there are many main advantages by dropping them:
>   1. It removes a bunch of hacks we added around 700 lines in PySpark.
>   2. PyPy2 has a critical bug that causes a flaky test,
> https://issues.apache.org/jira/browse/SPARK-28358 given my testing and
> investigation.
>   3. Users can use Python type hints with Pandas UDFs without thinking
> about Python version
>   4. Users can leverage one latest cloudpickle,
> https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also
> leverage C pickle.
>   5. ...
>
> So it benefits both users and dev. WDYT guys?
>
>
> --
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


[DISCUSS] Drop Python 2, 3.4 and 3.5

2020-07-01 Thread Hyukjin Kwon
Hi all,

I would like to discuss dropping deprecated Python versions 2, 3.4 and 3.5
at https://github.com/apache/spark/pull/28957. I assume people support it
in general
but I am writing this to make sure everybody is happy.

Fokko made a very good investigation on it, see
https://github.com/apache/spark/pull/28957#issuecomment-652022449.
Assuming from the statistics, I think we're pretty safe to drop them.
Also note that dropping Python 2 was actually declared at
https://python3statement.org/

Roughly speaking, there are many main advantages by dropping them:
  1. It removes a bunch of hacks we added around 700 lines in PySpark.
  2. PyPy2 has a critical bug that causes a flaky test,
https://issues.apache.org/jira/browse/SPARK-28358 given my testing and
investigation.
  3. Users can use Python type hints with Pandas UDFs without thinking
about Python version
  4. Users can leverage one latest cloudpickle,
https://github.com/apache/spark/pull/28950. With Python 3.8+ it can also
leverage C pickle.
  5. ...

So it benefits both users and dev. WDYT guys?