Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-17 Thread DB Tsai
I'll +1 on removing those legacy mllib code. Many users are confused about the 
APIs, and some of them have weird behaviors (for example, in gradient descent, 
the intercept is regularized which supports not to). 

DB Tsai  |  Siri Open Source Technologies [not a contribution]  |   Apple, Inc

> On Oct 17, 2018, at 7:42 AM, Erik Erlandson  wrote:
> 
> My understanding was that the legacy mllib api was frozen, with all new dev 
> going to ML, but it was not going to be removed. Although removing it would 
> get rid of a lot of `OldXxx` shims.
> 
> On Wed, Oct 17, 2018 at 12:55 AM Marco Gaido  wrote:
> Hi all,
> 
> I think a very big topic on this would be: what do we want to do with the old 
> mllib API? For long I have been told that it was going to be removed on 3.0. 
> Is this still the plan?
> 
> Thanks,
> Marco
> 
> Il giorno mer 17 ott 2018 alle ore 03:11 Marcelo Vanzin 
>  ha scritto:
> Might be good to take a look at things marked "@DeveloperApi" and
> whether they should stay that way.
> 
> e.g. I was looking at SparkHadoopUtil and I've always wanted to just
> make it private to Spark. I don't see why apps would need any of those
> methods.
> On Tue, Oct 16, 2018 at 10:18 AM Sean Owen  wrote:
>> 
>> There was already agreement to delete deprecated things like Flume and
>> Kafka 0.8 support in master. I've got several more on my radar, and
>> wanted to highlight them and solicit general opinions on where we
>> should accept breaking changes.
>> 
>> For example how about removing accumulator v1?
>> https://github.com/apache/spark/pull/22730
>> 
>> Or using the standard Java Optional?
>> https://github.com/apache/spark/pull/22383
>> 
>> Or cleaning up some old workarounds and APIs while at it?
>> https://github.com/apache/spark/pull/22729 (still in progress)
>> 
>> I think I talked myself out of replacing Java function interfaces with
>> java.util.function because...
>> https://issues.apache.org/jira/browse/SPARK-25369
>> 
>> There are also, say, old json and csv and avro reading method
>> deprecated since 1.4. Remove?
>> Anything deprecated since 2.0.0?
>> 
>> Interested in general thoughts on these.
>> 
>> Here are some more items targeted to 3.0:
>> https://issues.apache.org/jira/browse/SPARK-17875?jql=project%3D%22SPARK%22%20AND%20%22Target%20Version%2Fs%22%3D%223.0.0%22%20ORDER%20BY%20priority%20ASC
>> 
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> 
> 
> 
> -- 
> Marcelo
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-17 Thread Erik Erlandson
My understanding was that the legacy mllib api was frozen, with all new dev
going to ML, but it was not going to be removed. Although removing it would
get rid of a lot of `OldXxx` shims.

On Wed, Oct 17, 2018 at 12:55 AM Marco Gaido  wrote:

> Hi all,
>
> I think a very big topic on this would be: what do we want to do with the
> old mllib API? For long I have been told that it was going to be removed on
> 3.0. Is this still the plan?
>
> Thanks,
> Marco
>
> Il giorno mer 17 ott 2018 alle ore 03:11 Marcelo Vanzin
>  ha scritto:
>
>> Might be good to take a look at things marked "@DeveloperApi" and
>> whether they should stay that way.
>>
>> e.g. I was looking at SparkHadoopUtil and I've always wanted to just
>> make it private to Spark. I don't see why apps would need any of those
>> methods.
>> On Tue, Oct 16, 2018 at 10:18 AM Sean Owen  wrote:
>> >
>> > There was already agreement to delete deprecated things like Flume and
>> > Kafka 0.8 support in master. I've got several more on my radar, and
>> > wanted to highlight them and solicit general opinions on where we
>> > should accept breaking changes.
>> >
>> > For example how about removing accumulator v1?
>> > https://github.com/apache/spark/pull/22730
>> >
>> > Or using the standard Java Optional?
>> > https://github.com/apache/spark/pull/22383
>> >
>> > Or cleaning up some old workarounds and APIs while at it?
>> > https://github.com/apache/spark/pull/22729 (still in progress)
>> >
>> > I think I talked myself out of replacing Java function interfaces with
>> > java.util.function because...
>> > https://issues.apache.org/jira/browse/SPARK-25369
>> >
>> > There are also, say, old json and csv and avro reading method
>> > deprecated since 1.4. Remove?
>> > Anything deprecated since 2.0.0?
>> >
>> > Interested in general thoughts on these.
>> >
>> > Here are some more items targeted to 3.0:
>> >
>> https://issues.apache.org/jira/browse/SPARK-17875?jql=project%3D%22SPARK%22%20AND%20%22Target%20Version%2Fs%22%3D%223.0.0%22%20ORDER%20BY%20priority%20ASC
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-17 Thread Marco Gaido
Hi all,

I think a very big topic on this would be: what do we want to do with the
old mllib API? For long I have been told that it was going to be removed on
3.0. Is this still the plan?

Thanks,
Marco

Il giorno mer 17 ott 2018 alle ore 03:11 Marcelo Vanzin
 ha scritto:

> Might be good to take a look at things marked "@DeveloperApi" and
> whether they should stay that way.
>
> e.g. I was looking at SparkHadoopUtil and I've always wanted to just
> make it private to Spark. I don't see why apps would need any of those
> methods.
> On Tue, Oct 16, 2018 at 10:18 AM Sean Owen  wrote:
> >
> > There was already agreement to delete deprecated things like Flume and
> > Kafka 0.8 support in master. I've got several more on my radar, and
> > wanted to highlight them and solicit general opinions on where we
> > should accept breaking changes.
> >
> > For example how about removing accumulator v1?
> > https://github.com/apache/spark/pull/22730
> >
> > Or using the standard Java Optional?
> > https://github.com/apache/spark/pull/22383
> >
> > Or cleaning up some old workarounds and APIs while at it?
> > https://github.com/apache/spark/pull/22729 (still in progress)
> >
> > I think I talked myself out of replacing Java function interfaces with
> > java.util.function because...
> > https://issues.apache.org/jira/browse/SPARK-25369
> >
> > There are also, say, old json and csv and avro reading method
> > deprecated since 1.4. Remove?
> > Anything deprecated since 2.0.0?
> >
> > Interested in general thoughts on these.
> >
> > Here are some more items targeted to 3.0:
> >
> https://issues.apache.org/jira/browse/SPARK-17875?jql=project%3D%22SPARK%22%20AND%20%22Target%20Version%2Fs%22%3D%223.0.0%22%20ORDER%20BY%20priority%20ASC
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: Starting to make changes for Spark 3 -- what can we delete?

2018-10-16 Thread Marcelo Vanzin
Might be good to take a look at things marked "@DeveloperApi" and
whether they should stay that way.

e.g. I was looking at SparkHadoopUtil and I've always wanted to just
make it private to Spark. I don't see why apps would need any of those
methods.
On Tue, Oct 16, 2018 at 10:18 AM Sean Owen  wrote:
>
> There was already agreement to delete deprecated things like Flume and
> Kafka 0.8 support in master. I've got several more on my radar, and
> wanted to highlight them and solicit general opinions on where we
> should accept breaking changes.
>
> For example how about removing accumulator v1?
> https://github.com/apache/spark/pull/22730
>
> Or using the standard Java Optional?
> https://github.com/apache/spark/pull/22383
>
> Or cleaning up some old workarounds and APIs while at it?
> https://github.com/apache/spark/pull/22729 (still in progress)
>
> I think I talked myself out of replacing Java function interfaces with
> java.util.function because...
> https://issues.apache.org/jira/browse/SPARK-25369
>
> There are also, say, old json and csv and avro reading method
> deprecated since 1.4. Remove?
> Anything deprecated since 2.0.0?
>
> Interested in general thoughts on these.
>
> Here are some more items targeted to 3.0:
> https://issues.apache.org/jira/browse/SPARK-17875?jql=project%3D%22SPARK%22%20AND%20%22Target%20Version%2Fs%22%3D%223.0.0%22%20ORDER%20BY%20priority%20ASC
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Starting to make changes for Spark 3 -- what can we delete?

2018-10-16 Thread Sean Owen
There was already agreement to delete deprecated things like Flume and
Kafka 0.8 support in master. I've got several more on my radar, and
wanted to highlight them and solicit general opinions on where we
should accept breaking changes.

For example how about removing accumulator v1?
https://github.com/apache/spark/pull/22730

Or using the standard Java Optional?
https://github.com/apache/spark/pull/22383

Or cleaning up some old workarounds and APIs while at it?
https://github.com/apache/spark/pull/22729 (still in progress)

I think I talked myself out of replacing Java function interfaces with
java.util.function because...
https://issues.apache.org/jira/browse/SPARK-25369

There are also, say, old json and csv and avro reading method
deprecated since 1.4. Remove?
Anything deprecated since 2.0.0?

Interested in general thoughts on these.

Here are some more items targeted to 3.0:
https://issues.apache.org/jira/browse/SPARK-17875?jql=project%3D%22SPARK%22%20AND%20%22Target%20Version%2Fs%22%3D%223.0.0%22%20ORDER%20BY%20priority%20ASC

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org