Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-28 Thread Shane Knapp
+1 to testing the absolute minimum number of python variants as possible.
;)

On Mon, Oct 28, 2019 at 7:46 PM Hyukjin Kwon  wrote:

> +1 from me as well.
>
> 2019년 10월 29일 (화) 오전 5:34, Xiangrui Meng 님이 작성:
>
>> +1. And we should start testing 3.7 and maybe 3.8 in Jenkins.
>>
>> On Thu, Oct 24, 2019 at 9:34 AM Dongjoon Hyun 
>> wrote:
>>
>>> Thank you for starting the thread.
>>>
>>> In addition to that, we currently are testing Python 3.6 only in Apache
>>> Spark Jenkins environment.
>>>
>>> Given that Python 3.8 is already out and Apache Spark 3.0.0 RC1 will
>>> start next January
>>> (https://spark.apache.org/versioning-policy.html), I'm +1 for the
>>> deprecation (Python < 3.6) at Apache Spark 3.0.0.
>>>
>>> It's just a deprecation to prepare the next-step development cycle.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Thu, Oct 24, 2019 at 1:10 AM Maciej Szymkiewicz <
>>> mszymkiew...@gmail.com> wrote:
>>>
 Hi everyone,

 While deprecation of Python 2 in 3.0.0 has been announced
 ,
 there is no clear statement about specific continuing support of different
 Python 3 version.

 Specifically:

- Python 3.4 has been retired this year.
- Python 3.5 is already in the "security fixes only" mode and
should be retired in the middle of 2020.

 Continued support of these two blocks adoption of many new Python
 features (PEP 468)  and it is hard to justify beyond 2020.

 Should these two be deprecated in 3.0.0 as well?

 --
 Best regards,
 Maciej



-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-28 Thread Hyukjin Kwon
+1 from me as well.

2019년 10월 29일 (화) 오전 5:34, Xiangrui Meng 님이 작성:

> +1. And we should start testing 3.7 and maybe 3.8 in Jenkins.
>
> On Thu, Oct 24, 2019 at 9:34 AM Dongjoon Hyun 
> wrote:
>
>> Thank you for starting the thread.
>>
>> In addition to that, we currently are testing Python 3.6 only in Apache
>> Spark Jenkins environment.
>>
>> Given that Python 3.8 is already out and Apache Spark 3.0.0 RC1 will
>> start next January
>> (https://spark.apache.org/versioning-policy.html), I'm +1 for the
>> deprecation (Python < 3.6) at Apache Spark 3.0.0.
>>
>> It's just a deprecation to prepare the next-step development cycle.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Thu, Oct 24, 2019 at 1:10 AM Maciej Szymkiewicz <
>> mszymkiew...@gmail.com> wrote:
>>
>>> Hi everyone,
>>>
>>> While deprecation of Python 2 in 3.0.0 has been announced
>>> ,
>>> there is no clear statement about specific continuing support of different
>>> Python 3 version.
>>>
>>> Specifically:
>>>
>>>- Python 3.4 has been retired this year.
>>>- Python 3.5 is already in the "security fixes only" mode and should
>>>be retired in the middle of 2020.
>>>
>>> Continued support of these two blocks adoption of many new Python
>>> features (PEP 468)  and it is hard to justify beyond 2020.
>>>
>>> Should these two be deprecated in 3.0.0 as well?
>>>
>>> --
>>> Best regards,
>>> Maciej
>>>
>>>


Re: [DISCUSS] Deprecate Python < 3.6 in Spark 3.0

2019-10-28 Thread Xiangrui Meng
+1. And we should start testing 3.7 and maybe 3.8 in Jenkins.

On Thu, Oct 24, 2019 at 9:34 AM Dongjoon Hyun 
wrote:

> Thank you for starting the thread.
>
> In addition to that, we currently are testing Python 3.6 only in Apache
> Spark Jenkins environment.
>
> Given that Python 3.8 is already out and Apache Spark 3.0.0 RC1 will start
> next January
> (https://spark.apache.org/versioning-policy.html), I'm +1 for the
> deprecation (Python < 3.6) at Apache Spark 3.0.0.
>
> It's just a deprecation to prepare the next-step development cycle.
>
> Bests,
> Dongjoon.
>
>
> On Thu, Oct 24, 2019 at 1:10 AM Maciej Szymkiewicz 
> wrote:
>
>> Hi everyone,
>>
>> While deprecation of Python 2 in 3.0.0 has been announced
>> ,
>> there is no clear statement about specific continuing support of different
>> Python 3 version.
>>
>> Specifically:
>>
>>- Python 3.4 has been retired this year.
>>- Python 3.5 is already in the "security fixes only" mode and should
>>be retired in the middle of 2020.
>>
>> Continued support of these two blocks adoption of many new Python
>> features (PEP 468)  and it is hard to justify beyond 2020.
>>
>> Should these two be deprecated in 3.0.0 as well?
>>
>> --
>> Best regards,
>> Maciej
>>
>>


Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-10-28 Thread Dongjoon Hyun
Thank you for the feedback, Sean and Xiao.

Bests,
Dongjoon.

On Mon, Oct 28, 2019 at 12:52 PM Xiao Li  wrote:

> The stability and quality of Hadoop 3.2 profile are unknown. The changes
> are massive, including Hive execution and a new version of Hive
> thriftserver.
>
> To reduce the risk, I would like to keep the current default version
> unchanged. When it becomes stable, we can change the default profile to
> Hadoop-3.2.
>
> Cheers,
>
> Xiao
>
> On Mon, Oct 28, 2019 at 12:51 PM Sean Owen  wrote:
>
>> I'm OK with that, but don't have a strong opinion nor info about the
>> implications.
>> That said my guess is we're close to the point where we don't need to
>> support Hadoop 2.x anyway, so, yeah.
>>
>> On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun 
>> wrote:
>> >
>> > Hi, All.
>> >
>> > There was a discussion on publishing artifacts built with Hadoop 3 .
>> > But, we are still publishing with Hadoop 2.7.3 and `3.0-preview` will
>> be the same because we didn't change anything yet.
>> >
>> > Technically, we need to change two places for publishing.
>> >
>> > 1. Jenkins Snapshot Publishing
>> >
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
>> >
>> > 2. Release Snapshot/Release Publishing
>> >
>> https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh
>> >
>> > To minimize the change, we need to switch our default Hadoop profile.
>> >
>> > Currently, the default is `hadoop-2.7 (2.7.4)` profile and `hadoop-3.2
>> (3.2.0)` is optional.
>> > We had better use `hadoop-3.2` profile by default and `hadoop-2.7`
>> optionally.
>> >
>> > Note that this means we use Hive 2.3.6 by default. Only `hadoop-2.7`
>> distribution will use `Hive 1.2.1` like Apache Spark 2.4.x.
>> >
>> > Bests,
>> > Dongjoon.
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> [image: Databricks Summit - Watch the talks]
> 
>


Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-10-28 Thread Xiao Li
The stability and quality of Hadoop 3.2 profile are unknown. The changes
are massive, including Hive execution and a new version of Hive
thriftserver.

To reduce the risk, I would like to keep the current default version
unchanged. When it becomes stable, we can change the default profile to
Hadoop-3.2.

Cheers,

Xiao

On Mon, Oct 28, 2019 at 12:51 PM Sean Owen  wrote:

> I'm OK with that, but don't have a strong opinion nor info about the
> implications.
> That said my guess is we're close to the point where we don't need to
> support Hadoop 2.x anyway, so, yeah.
>
> On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun 
> wrote:
> >
> > Hi, All.
> >
> > There was a discussion on publishing artifacts built with Hadoop 3 .
> > But, we are still publishing with Hadoop 2.7.3 and `3.0-preview` will be
> the same because we didn't change anything yet.
> >
> > Technically, we need to change two places for publishing.
> >
> > 1. Jenkins Snapshot Publishing
> >
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
> >
> > 2. Release Snapshot/Release Publishing
> >
> https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh
> >
> > To minimize the change, we need to switch our default Hadoop profile.
> >
> > Currently, the default is `hadoop-2.7 (2.7.4)` profile and `hadoop-3.2
> (3.2.0)` is optional.
> > We had better use `hadoop-3.2` profile by default and `hadoop-2.7`
> optionally.
> >
> > Note that this means we use Hive 2.3.6 by default. Only `hadoop-2.7`
> distribution will use `Hive 1.2.1` like Apache Spark 2.4.x.
> >
> > Bests,
> > Dongjoon.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
[image: Databricks Summit - Watch the talks]



Re: Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-10-28 Thread Sean Owen
I'm OK with that, but don't have a strong opinion nor info about the
implications.
That said my guess is we're close to the point where we don't need to
support Hadoop 2.x anyway, so, yeah.

On Mon, Oct 28, 2019 at 2:33 PM Dongjoon Hyun  wrote:
>
> Hi, All.
>
> There was a discussion on publishing artifacts built with Hadoop 3 .
> But, we are still publishing with Hadoop 2.7.3 and `3.0-preview` will be the 
> same because we didn't change anything yet.
>
> Technically, we need to change two places for publishing.
>
> 1. Jenkins Snapshot Publishing
> 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
>
> 2. Release Snapshot/Release Publishing
> 
> https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh
>
> To minimize the change, we need to switch our default Hadoop profile.
>
> Currently, the default is `hadoop-2.7 (2.7.4)` profile and `hadoop-3.2 
> (3.2.0)` is optional.
> We had better use `hadoop-3.2` profile by default and `hadoop-2.7` optionally.
>
> Note that this means we use Hive 2.3.6 by default. Only `hadoop-2.7` 
> distribution will use `Hive 1.2.1` like Apache Spark 2.4.x.
>
> Bests,
> Dongjoon.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Use Hadoop-3.2 as a default Hadoop profile in 3.0.0?

2019-10-28 Thread Dongjoon Hyun
Hi, All.

There was a discussion on publishing artifacts built with Hadoop 3 .
But, we are still publishing with Hadoop 2.7.3 and `3.0-preview` will be
the same because we didn't change anything yet.

Technically, we need to change two places for publishing.

1. Jenkins Snapshot Publishing

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/

2. Release Snapshot/Release Publishing

https://github.com/apache/spark/blob/master/dev/create-release/release-build.sh

To minimize the change, we need to switch our default Hadoop profile.

Currently, the default is `hadoop-2.7 (2.7.4)` profile and `hadoop-3.2
(3.2.0)` is optional.
We had better use `hadoop-3.2` profile by default and `hadoop-2.7`
optionally.

Note that this means we use Hive 2.3.6 by default. Only `hadoop-2.7`
distribution will use `Hive 1.2.1` like Apache Spark 2.4.x.

Bests,
Dongjoon.


Re: [build system] intermittent network issues + potential power shutoff over the weekend

2019-10-28 Thread Shane Knapp
we're back up and building!

On Mon, Oct 28, 2019 at 8:35 AM Shane Knapp  wrote:

> ok, it looks like the colo will have power until monday morning, and
>> it will be shut down from 8am to noon to perform some maintenance.
>>
>> this means jenkins will be up all weekend, but down monday morning.
>>
>> jenkins is currently down due to colo maintenance.  expect it to return
> in ~3.5 hours.
>
> shane
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [build system] intermittent network issues + potential power shutoff over the weekend

2019-10-28 Thread Dongjoon Hyun
Thank you for fixing the worker ENVs, Shane.

Bests,
Dongjoon.

On Mon, Oct 28, 2019 at 10:47 AM Shane Knapp  wrote:

> i will need to restart jenkins -- the worker's ENV vars got borked when
> they came back up.
>
> this is happening NOW.
>
> shane
>
> On Mon, Oct 28, 2019 at 10:37 AM Shane Knapp  wrote:
>
>> we're back up and building!
>>
>> On Mon, Oct 28, 2019 at 8:35 AM Shane Knapp  wrote:
>>
>>> ok, it looks like the colo will have power until monday morning, and
 it will be shut down from 8am to noon to perform some maintenance.

 this means jenkins will be up all weekend, but down monday morning.

 jenkins is currently down due to colo maintenance.  expect it to return
>>> in ~3.5 hours.
>>>
>>> shane
>>> --
>>> Shane Knapp
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: [build system] intermittent network issues + potential power shutoff over the weekend

2019-10-28 Thread Shane Knapp
i will need to restart jenkins -- the worker's ENV vars got borked when
they came back up.

this is happening NOW.

shane

On Mon, Oct 28, 2019 at 10:37 AM Shane Knapp  wrote:

> we're back up and building!
>
> On Mon, Oct 28, 2019 at 8:35 AM Shane Knapp  wrote:
>
>> ok, it looks like the colo will have power until monday morning, and
>>> it will be shut down from 8am to noon to perform some maintenance.
>>>
>>> this means jenkins will be up all weekend, but down monday morning.
>>>
>>> jenkins is currently down due to colo maintenance.  expect it to return
>> in ~3.5 hours.
>>
>> shane
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [build system] intermittent network issues + potential power shutoff over the weekend

2019-10-28 Thread Shane Knapp
>
> ok, it looks like the colo will have power until monday morning, and
> it will be shut down from 8am to noon to perform some maintenance.
>
> this means jenkins will be up all weekend, but down monday morning.
>
> jenkins is currently down due to colo maintenance.  expect it to return in
~3.5 hours.

shane
-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Spark 3.0 and S3A

2019-10-28 Thread Sean Owen
There will be a "Hadoop 3.x" version of 3.0, as it's essential to get
a JDK 11-compatible build. you can see the hadoop-3.2 profile.
hadoop-aws is pulled in in the hadoop-cloud module I believe, so bears
checking whether the profile updates the versions there too.

On Mon, Oct 28, 2019 at 10:34 AM Nicholas Chammas
 wrote:
>
> Howdy folks,
>
> I have a question about what is happening with the 3.0 release in relation to 
> Hadoop and hadoop-aws.
>
> Today, among other builds, we release a build of Spark built against Hadoop 
> 2.7 and another one built without Hadoop. In Spark 3+, will we continue to 
> release Hadoop 2.7 builds as one of the primary downloads on the download 
> page? Or will we start building Spark against a newer version of Hadoop?
>
> The reason I ask is because successive versions of hadoop-aws have made 
> significant usability improvements to S3A. To get those, users need to 
> download the Hadoop-free build of Spark and then link Spark to a version of 
> Hadoop newer than 2.7. There are various dependency and runtime issues with 
> trying to pair Spark built against Hadoop 2.7 with hadoop-aws 2.8 or newer.
>
> If we start releasing builds of Spark built against Hadoop 3.2 (or another 
> recent version), users can get the latest S3A improvements via --packages 
> "org.apache.hadoop:hadoop-aws:3.2.1" without needing to download Hadoop 
> separately.
>
> Nick

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Spark 3.0 and S3A

2019-10-28 Thread Nicholas Chammas
Howdy folks,

I have a question about what is happening with the 3.0 release in relation
to Hadoop and hadoop-aws

.

Today, among other builds, we release a build of Spark built against Hadoop
2.7 and another one built without Hadoop. In Spark 3+, will we continue to
release Hadoop 2.7 builds as one of the primary downloads on the download
page ? Or will we start building
Spark against a newer version of Hadoop?

The reason I ask is because successive versions of hadoop-aws have made
significant usability improvements to S3A. To get those, users need to
download the Hadoop-free build of Spark
 and then link
Spark to a version of Hadoop newer than 2.7. There are various dependency
and runtime issues with trying to pair Spark built against Hadoop 2.7 with
hadoop-aws 2.8 or newer.

If we start releasing builds of Spark built against Hadoop 3.2 (or another
recent version), users can get the latest S3A improvements via --packages
"org.apache.hadoop:hadoop-aws:3.2.1" without needing to download Hadoop
separately.

Nick