[ https://issues.apache.org/jira/browse/FLINK-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Maximilian Michels resolved FLINK-1043. --------------------------------------- Resolution: Fixed This is now possible via GroupCombine followed by a GroupReduce. The GroupCombineFunction for the GroupCombine is typed from IN to OUT. http://ci.apache.org/projects/flink/flink-docs-master/dataset_transformations.html#groupcombine-on-a-grouped-dataset > Alternative combine interface > ----------------------------- > > Key: FLINK-1043 > URL: https://issues.apache.org/jira/browse/FLINK-1043 > Project: Flink > Issue Type: Wish > Reporter: Sebastian Kruse > Priority: Minor > > The GroupReduce allows for the following combination reduce step: > {{InputType}} -> combine -> {{InputType}} -> reduce -> {{OutputType}}. > However, in the use cases I have stumbled upon so far, it would make more > sense to have the following steps: {{InputType}} -> {{OutputType}} -> > {{OutputType}}. It seems more intuitive to me to create a set of partial > results with the combiners that will finally merged within the reduce phase > into an overall result. This sometimes bars me from using a combiner. > I provide some examples for this intuition. > * WordCount > ** If you want to implement WordCount with as a combinable GroupReduce, then > you have to preprocess all words as {{Tuple2<String, 1>}}. This could be > avoided if the combination result was not necessarily equal to the input type. > * create a Bloom filter > ** Bloom filters can be created locally on each node and later on be merged > into a final, global Bloom filter, thus lend themselves for a combine-reduce > proceeding. Doing this with a combinable GroupReduce would currently require > to turn each input element into a singleton Bloom filter before the > combination phase. > Therefore, it would be nice to have the ability to use {{OutputType}} as the > combiner result. -- This message was sent by Atlassian JIRA (v6.3.4#6332)