You can still do parallelDo on a PGroupedTable to allow it to map to a different type. Just would be new DoFn<Pair<Key, Set<String>>, Pair<Key, Integer>>
On Tue, May 17, 2016 at 2:01 AM Stan Rosenberg <[email protected]> wrote: > Hi, > > I couldn't seem to find sufficient documentation or examples of using > combiners in non-trivial ways. Say my map emits values of type Set<String>; > after grouping by key I want to emit the _size_ of the union of the sets of > strings, i.e., size(union(Iterable<Set<String>>)) Thus, the combiner's > type is Iterable<Set<String>> -> Set<String> but the reduce's type is > Iterable<Set<String>> -> Int > > To my knowledge, both MapReduce and Spark allow a combiner to have a > result type different from reducer's. However, unless I missed something, > this is not expressible in Crunch. Shouldn't PGroupedTable.combineValues > return PGroupedTable to allow composition with mapValues? > > Thanks, > > stan >
