[ 
https://issues.apache.org/jira/browse/FLINK-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-1043.
---------------------------------------
    Resolution: Fixed

This is now possible via GroupCombine followed by a GroupReduce. The 
GroupCombineFunction for the GroupCombine is typed from IN to OUT.

http://ci.apache.org/projects/flink/flink-docs-master/dataset_transformations.html#groupcombine-on-a-grouped-dataset

> Alternative combine interface
> -----------------------------
>
>                 Key: FLINK-1043
>                 URL: https://issues.apache.org/jira/browse/FLINK-1043
>             Project: Flink
>          Issue Type: Wish
>            Reporter: Sebastian Kruse
>            Priority: Minor
>
> The GroupReduce allows for the following combination reduce step: 
> {{InputType}} -> combine -> {{InputType}} -> reduce -> {{OutputType}}. 
> However, in the use cases I have stumbled upon so far, it would make more 
> sense to have the following steps: {{InputType}} -> {{OutputType}} -> 
> {{OutputType}}. It seems more intuitive to me to create a set of partial 
> results with the combiners that will finally merged within the reduce phase 
> into an overall result. This sometimes bars me from using a combiner.
> I provide some examples for this intuition.
> * WordCount
> ** If you want to implement WordCount with as a combinable GroupReduce, then 
> you have to preprocess all words as {{Tuple2<String, 1>}}. This could be 
> avoided if the combination result was not necessarily equal to the input type.
> * create a Bloom filter
> ** Bloom filters can be created locally on each node and later on be merged 
> into a final, global Bloom filter, thus lend themselves for a combine-reduce 
> proceeding. Doing this with a combinable GroupReduce would currently require 
> to turn each input element into a singleton Bloom filter before the 
> combination phase.
> Therefore, it would be nice to have the ability to use {{OutputType}} as the 
> combiner result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to