Re: [DISCUSS] Remove Combinable Annotation from DataSet API

2016-01-15 Thread Fabian Hueske
OK, I think we got a clear picture here.

I'll update the corresponding JIRA issue FLINK-1045.

Thanks, Fabian

2016-01-14 18:53 GMT+01:00 Henry Saputra :

> +1 for approach #1
>
>
>
> - Henry
>
> On Thu, Jan 14, 2016 at 5:35 AM, Chiwan Park 
> wrote:
>
> > I’m also for approach #1. Now is nice time to apply some API-breaking
> > changes.
> >
> > > On Jan 14, 2016, at 1:19 AM, Aljoscha Krettek 
> > wrote:
> > >
> > > I’m also for Approach #1. I like simplifying things.
> > >> On 13 Jan 2016, at 14:25, Vasiliki Kalavri  >
> > wrote:
> > >>
> > >> Hi,
> > >>
> > >> ​+1 for removing the Combinable annotation​. Approach 1 sounds like
> the
> > >> best option to me.
> > >>
> > >>
> > >> On 13 January 2016 at 14:11, Till Rohrmann 
> > wrote:
> > >>
> > >>> Hi Fabian,
> > >>>
> > >>> thanks for bringing this issue up. I agree with you that it would be
> > nice
> > >>> to remove the Combinable annotation if it is not really needed. Now
> if
> > >>> people are not aware of the Combinable interface then they might
> think
> > that
> > >>> they are actually using combiners by simply implementing
> > CombineFunction.
> > >>>
> > >>>
> > >> ​has happened to me :S​
> > >>
> > >>
> > >>
> > >>> I would also be in favour of approach 1 combined with a migration
> guide
> > >>> where we highlight possible problems with a faulty combine
> > implementation.
> > >>>
> > >>>
> > >> Migration guide is a ​good idea​!
> > >>
> > >>
> > >>
> > >>> Cheers,
> > >>> Till
> > >>> ​
> > >>>
> > >>>
> > >> ​-Vasia.​
> > >>
> > >>
> > >>
> > >>> On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske 
> > wrote:
> > >>>
> >  Hi everybody,
> > 
> > 
> > 
> >  As part of the upcoming 1.0 release we are stabilizing Flink's APIs.
> > 
> >  I would like to start a discussion about removing the Combinable
> > >>> annotation
> >  from the DataSet API.
> > 
> > 
> > 
> >  # The Current State:
> > 
> >  The DataSet API offers two interface for combine functions,
> > >>> CombineFunction
> >  and GroupCombineFunction, which can be added to a
> GroupReduceFunctions
> >  (GRF).
> > 
> > 
> >  However, implementing a combine interface does not make a GRF
> > combinable.
> >  In addition, the GRF class needs to be annotated with the Combinable
> >  annotation. The RichGroupReduceFunction has a default implementation
> > of
> >  combine, which forwards the combine parameters to the reduce method.
> > This
> >  default implementation is not used, unless the class that extends
> RGRF
> > >>> has
> >  the Combinable annotation.
> > 
> > 
> > 
> >  In addition to the Combinable annotation, the DataSet API
> >  GroupReduceOperator features the setCombinable(boolean) method to
> > >>> override
> >  a missing or existing Combinable annotation.
> > 
> > 
> > 
> >  # My Proposal:
> > 
> >  I propose to remove the Combinable annotation because it is not
> > required
> >  and complicates the definition of combiners. Instead, the combine
> > method
> > >>> of
> >  a GroupReduceFunction should be executed if it implements one of the
> >  combine function interfaces. This would require to remove the
> default
> >  combine implementation of the RichGroupReduceFunction as well.
> > 
> >  This would be an API breaking change and has a few implications.
> > 
> > 
> > 
> >  # Possible Implementation:
> > 
> >  There are a few ways to implement this change.
> > 
> >  - Remove Combinable annotation or mark it deprecated (and keep
> effect)
> > 
> >  - Remove default combine method from RichGroupReduceFunction or
> > deprecate
> >  it
> > 
> > 
> > 
> >  Approach 1:
> >  - Remove Combinable annotation
> >  - Remove default combine() method from RichGroupReduceFunction
> >  - Effect:
> >    - All RichGroupReduceFunctions that do either have the Combinable
> >  annotation or implement the combine method do not compile anymore
> >    - GroupReduceFunctions that have the Combinable annotation do not
> >  compile anymore
> >    - GroupReduceFunctions that implement a combine interface without
> >  having the annotation (and not being set to combinable during
> program
> >  construction) will execute the previously not executed combine
> method.
> > >>> This
> >  might change the behavior of the program. In worst case, the program
> > >>> might
> >  silently produce wrong results (or crash) if the combiner
> > implementation
> >  was faulty. In best case, the program executes faster.
> > 
> > 
> > 
> >  Approach 2:
> >  - As Approach 1
> >  - In addition extend both combine interfaces with a deprecated
> marker
> >  method. This will ensure that all 

Re: [DISCUSS] Remove Combinable Annotation from DataSet API

2016-01-14 Thread Henry Saputra
+1 for approach #1



- Henry

On Thu, Jan 14, 2016 at 5:35 AM, Chiwan Park  wrote:

> I’m also for approach #1. Now is nice time to apply some API-breaking
> changes.
>
> > On Jan 14, 2016, at 1:19 AM, Aljoscha Krettek 
> wrote:
> >
> > I’m also for Approach #1. I like simplifying things.
> >> On 13 Jan 2016, at 14:25, Vasiliki Kalavri 
> wrote:
> >>
> >> Hi,
> >>
> >> ​+1 for removing the Combinable annotation​. Approach 1 sounds like the
> >> best option to me.
> >>
> >>
> >> On 13 January 2016 at 14:11, Till Rohrmann 
> wrote:
> >>
> >>> Hi Fabian,
> >>>
> >>> thanks for bringing this issue up. I agree with you that it would be
> nice
> >>> to remove the Combinable annotation if it is not really needed. Now if
> >>> people are not aware of the Combinable interface then they might think
> that
> >>> they are actually using combiners by simply implementing
> CombineFunction.
> >>>
> >>>
> >> ​has happened to me :S​
> >>
> >>
> >>
> >>> I would also be in favour of approach 1 combined with a migration guide
> >>> where we highlight possible problems with a faulty combine
> implementation.
> >>>
> >>>
> >> Migration guide is a ​good idea​!
> >>
> >>
> >>
> >>> Cheers,
> >>> Till
> >>> ​
> >>>
> >>>
> >> ​-Vasia.​
> >>
> >>
> >>
> >>> On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske 
> wrote:
> >>>
>  Hi everybody,
> 
> 
> 
>  As part of the upcoming 1.0 release we are stabilizing Flink's APIs.
> 
>  I would like to start a discussion about removing the Combinable
> >>> annotation
>  from the DataSet API.
> 
> 
> 
>  # The Current State:
> 
>  The DataSet API offers two interface for combine functions,
> >>> CombineFunction
>  and GroupCombineFunction, which can be added to a GroupReduceFunctions
>  (GRF).
> 
> 
>  However, implementing a combine interface does not make a GRF
> combinable.
>  In addition, the GRF class needs to be annotated with the Combinable
>  annotation. The RichGroupReduceFunction has a default implementation
> of
>  combine, which forwards the combine parameters to the reduce method.
> This
>  default implementation is not used, unless the class that extends RGRF
> >>> has
>  the Combinable annotation.
> 
> 
> 
>  In addition to the Combinable annotation, the DataSet API
>  GroupReduceOperator features the setCombinable(boolean) method to
> >>> override
>  a missing or existing Combinable annotation.
> 
> 
> 
>  # My Proposal:
> 
>  I propose to remove the Combinable annotation because it is not
> required
>  and complicates the definition of combiners. Instead, the combine
> method
> >>> of
>  a GroupReduceFunction should be executed if it implements one of the
>  combine function interfaces. This would require to remove the default
>  combine implementation of the RichGroupReduceFunction as well.
> 
>  This would be an API breaking change and has a few implications.
> 
> 
> 
>  # Possible Implementation:
> 
>  There are a few ways to implement this change.
> 
>  - Remove Combinable annotation or mark it deprecated (and keep effect)
> 
>  - Remove default combine method from RichGroupReduceFunction or
> deprecate
>  it
> 
> 
> 
>  Approach 1:
>  - Remove Combinable annotation
>  - Remove default combine() method from RichGroupReduceFunction
>  - Effect:
>    - All RichGroupReduceFunctions that do either have the Combinable
>  annotation or implement the combine method do not compile anymore
>    - GroupReduceFunctions that have the Combinable annotation do not
>  compile anymore
>    - GroupReduceFunctions that implement a combine interface without
>  having the annotation (and not being set to combinable during program
>  construction) will execute the previously not executed combine method.
> >>> This
>  might change the behavior of the program. In worst case, the program
> >>> might
>  silently produce wrong results (or crash) if the combiner
> implementation
>  was faulty. In best case, the program executes faster.
> 
> 
> 
>  Approach 2:
>  - As Approach 1
>  - In addition extend both combine interfaces with a deprecated marker
>  method. This will ensure that all functions that implement a
> combinable
>  interface do not compile anymore and need to be fixed. This could
> prevent
>  the silent failure as in Approach 1, but would also cause an
> additional
> >>> API
>  breaking change once the deprecated marker method is removed again.
> 
> 
> 
>  Approach 3:
>  - Mark Combinable annotation deprecated
>  - Mark combine() method in RichGroupReduceFunction as deprecated
>  - Effect:
>    - There'll be a couple of deprecation 

Re: [DISCUSS] Remove Combinable Annotation from DataSet API

2016-01-14 Thread Chiwan Park
I’m also for approach #1. Now is nice time to apply some API-breaking changes.

> On Jan 14, 2016, at 1:19 AM, Aljoscha Krettek  wrote:
> 
> I’m also for Approach #1. I like simplifying things.
>> On 13 Jan 2016, at 14:25, Vasiliki Kalavri  wrote:
>> 
>> Hi,
>> 
>> ​+1 for removing the Combinable annotation​. Approach 1 sounds like the
>> best option to me.
>> 
>> 
>> On 13 January 2016 at 14:11, Till Rohrmann  wrote:
>> 
>>> Hi Fabian,
>>> 
>>> thanks for bringing this issue up. I agree with you that it would be nice
>>> to remove the Combinable annotation if it is not really needed. Now if
>>> people are not aware of the Combinable interface then they might think that
>>> they are actually using combiners by simply implementing CombineFunction.
>>> 
>>> 
>> ​has happened to me :S​
>> 
>> 
>> 
>>> I would also be in favour of approach 1 combined with a migration guide
>>> where we highlight possible problems with a faulty combine implementation.
>>> 
>>> 
>> Migration guide is a ​good idea​!
>> 
>> 
>> 
>>> Cheers,
>>> Till
>>> ​
>>> 
>>> 
>> ​-Vasia.​
>> 
>> 
>> 
>>> On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske  wrote:
>>> 
 Hi everybody,
 
 
 
 As part of the upcoming 1.0 release we are stabilizing Flink's APIs.
 
 I would like to start a discussion about removing the Combinable
>>> annotation
 from the DataSet API.
 
 
 
 # The Current State:
 
 The DataSet API offers two interface for combine functions,
>>> CombineFunction
 and GroupCombineFunction, which can be added to a GroupReduceFunctions
 (GRF).
 
 
 However, implementing a combine interface does not make a GRF combinable.
 In addition, the GRF class needs to be annotated with the Combinable
 annotation. The RichGroupReduceFunction has a default implementation of
 combine, which forwards the combine parameters to the reduce method. This
 default implementation is not used, unless the class that extends RGRF
>>> has
 the Combinable annotation.
 
 
 
 In addition to the Combinable annotation, the DataSet API
 GroupReduceOperator features the setCombinable(boolean) method to
>>> override
 a missing or existing Combinable annotation.
 
 
 
 # My Proposal:
 
 I propose to remove the Combinable annotation because it is not required
 and complicates the definition of combiners. Instead, the combine method
>>> of
 a GroupReduceFunction should be executed if it implements one of the
 combine function interfaces. This would require to remove the default
 combine implementation of the RichGroupReduceFunction as well.
 
 This would be an API breaking change and has a few implications.
 
 
 
 # Possible Implementation:
 
 There are a few ways to implement this change.
 
 - Remove Combinable annotation or mark it deprecated (and keep effect)
 
 - Remove default combine method from RichGroupReduceFunction or deprecate
 it
 
 
 
 Approach 1:
 - Remove Combinable annotation
 - Remove default combine() method from RichGroupReduceFunction
 - Effect:
   - All RichGroupReduceFunctions that do either have the Combinable
 annotation or implement the combine method do not compile anymore
   - GroupReduceFunctions that have the Combinable annotation do not
 compile anymore
   - GroupReduceFunctions that implement a combine interface without
 having the annotation (and not being set to combinable during program
 construction) will execute the previously not executed combine method.
>>> This
 might change the behavior of the program. In worst case, the program
>>> might
 silently produce wrong results (or crash) if the combiner implementation
 was faulty. In best case, the program executes faster.
 
 
 
 Approach 2:
 - As Approach 1
 - In addition extend both combine interfaces with a deprecated marker
 method. This will ensure that all functions that implement a combinable
 interface do not compile anymore and need to be fixed. This could prevent
 the silent failure as in Approach 1, but would also cause an additional
>>> API
 breaking change once the deprecated marker method is removed again.
 
 
 
 Approach 3:
 - Mark Combinable annotation deprecated
 - Mark combine() method in RichGroupReduceFunction as deprecated
 - Effect:
   - There'll be a couple of deprecation warnings.
   - We face the same problem with silent failures as in Approach 1.
   - We have to check if RichGroupReduceFunction's override combine or
>>> not
 (can be done with reflection). If the method is not overridden we do not
 execute it (unless there is a Combinable annotation) and we are fine. If
>>> it
 is overridden and no Combinable 

Re: [DISCUSS] Remove Combinable Annotation from DataSet API

2016-01-13 Thread Till Rohrmann
Hi Fabian,

thanks for bringing this issue up. I agree with you that it would be nice
to remove the Combinable annotation if it is not really needed. Now if
people are not aware of the Combinable interface then they might think that
they are actually using combiners by simply implementing CombineFunction.

I would also be in favour of approach 1 combined with a migration guide
where we highlight possible problems with a faulty combine implementation.

Cheers,
Till
​

On Wed, Jan 13, 2016 at 1:53 PM, Fabian Hueske  wrote:

> Hi everybody,
>
>
>
> As part of the upcoming 1.0 release we are stabilizing Flink's APIs.
>
> I would like to start a discussion about removing the Combinable annotation
> from the DataSet API.
>
>
>
> # The Current State:
>
> The DataSet API offers two interface for combine functions, CombineFunction
> and GroupCombineFunction, which can be added to a GroupReduceFunctions
> (GRF).
>
>
> However, implementing a combine interface does not make a GRF combinable.
> In addition, the GRF class needs to be annotated with the Combinable
> annotation. The RichGroupReduceFunction has a default implementation of
> combine, which forwards the combine parameters to the reduce method. This
> default implementation is not used, unless the class that extends RGRF has
> the Combinable annotation.
>
>
>
> In addition to the Combinable annotation, the DataSet API
> GroupReduceOperator features the setCombinable(boolean) method to override
> a missing or existing Combinable annotation.
>
>
>
> # My Proposal:
>
> I propose to remove the Combinable annotation because it is not required
> and complicates the definition of combiners. Instead, the combine method of
> a GroupReduceFunction should be executed if it implements one of the
> combine function interfaces. This would require to remove the default
> combine implementation of the RichGroupReduceFunction as well.
>
> This would be an API breaking change and has a few implications.
>
>
>
> # Possible Implementation:
>
> There are a few ways to implement this change.
>
> - Remove Combinable annotation or mark it deprecated (and keep effect)
>
> - Remove default combine method from RichGroupReduceFunction or deprecate
> it
>
>
>
> Approach 1:
> - Remove Combinable annotation
> - Remove default combine() method from RichGroupReduceFunction
> - Effect:
> - All RichGroupReduceFunctions that do either have the Combinable
> annotation or implement the combine method do not compile anymore
> - GroupReduceFunctions that have the Combinable annotation do not
> compile anymore
> - GroupReduceFunctions that implement a combine interface without
> having the annotation (and not being set to combinable during program
> construction) will execute the previously not executed combine method. This
> might change the behavior of the program. In worst case, the program might
> silently produce wrong results (or crash) if the combiner implementation
> was faulty. In best case, the program executes faster.
>
>
>
> Approach 2:
> - As Approach 1
> - In addition extend both combine interfaces with a deprecated marker
> method. This will ensure that all functions that implement a combinable
> interface do not compile anymore and need to be fixed. This could prevent
> the silent failure as in Approach 1, but would also cause an additional API
> breaking change once the deprecated marker method is removed again.
>
>
>
> Approach 3:
> - Mark Combinable annotation deprecated
> - Mark combine() method in RichGroupReduceFunction as deprecated
> - Effect:
> - There'll be a couple of deprecation warnings.
> - We face the same problem with silent failures as in Approach 1.
> - We have to check if RichGroupReduceFunction's override combine or not
> (can be done with reflection). If the method is not overridden we do not
> execute it (unless there is a Combinable annotation) and we are fine. If it
> is overridden and no Combinable annotation has been defined, we have the
> same problem with silent failures as before.
> - After we remove the deprecated annotation and method, we have the
> same effect as with Approach 1.
>
>
>
> There are more alternatives, but these are the most viable, IMO.
>
>
>
> I think, if we want to remove the combinable annotation, we should do it
> now.
>
> Given the three options, would go for Approach 1. Yes, breaks a lot of code
> and yes there is the possibility of computing incorrect results. Approach 2
> is safer but would mean another API breaking change in the future. Approach
> 3 comes with fewer breaking changes but has the same problem of silent
> failures.
>
> IMO, the breaking API changes of Approach 1 are even desirable because they
> will make users aware that this feature changed.
>
>
>
> What do you think?
>
>
>
> Cheers, Fabian
>