Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-31 Thread Manu Zhang
Thanks guys. My original confusion comes from that if the API allows me to have different types of input and output, why not make it easier. It's clear now. Do you think it's better to hide some interfaces we don't expect users to use ? The Combine API has lured me a lot to do more than it

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-31 Thread Robert Bradshaw
On Mon, Oct 31, 2016 at 8:39 PM, Kenneth Knowles wrote: > Manu, I think your critique about user interface clarity is valid. > CombineFn conflates a few operations and is not that clear about what it is > doing or why. You seem to be concerned about CombineFn versus >

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-31 Thread Kenneth Knowles
Manu, I think your critique about user interface clarity is valid. CombineFn conflates a few operations and is not that clear about what it is doing or why. You seem to be concerned about CombineFn versus SerializableFunction constructors for the Combine family of transforms. I thought I'd respond

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-31 Thread Manu Zhang
I'm a bit confused here because neither of them requires same type of input and output. Also, the Javadoc of Globally says "It is common for {@code *InputT == OutputT}, but not required" *If associative and commutative is expected, why don't they have restrictions like

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-31 Thread Lukasz Cwik
GlobalCombineFn and PerKeyCombineFn still expect an associative and commutative function when accumulating. GlobalCombineFn is shorthand for assigning everything to a single key, doing the combine, and then discarding the key and extracting the single output. PerKeyCombineFn is shorthand for doing

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-28 Thread Lukasz Cwik
For it to be considered a combiner, the function needs to be associative and commutative. The issue is that from an API perspective it would be easy to have a Combine.perKey(SerializableFunction). But many people in the data processing world expect that this

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-26 Thread Lukasz Cwik
Combine.perKey takes a single SerializableFunction which knows how to convert from Iterable to V. It turns out that many runners implement optimizations which allow them to run the combine operation across several machines to parallelize the work and potentially reduce the amount of data they

Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-26 Thread Manu Zhang
Hi all, I'm wondering why `Combine.perKey(SerializableFunction)` requires input and output to be of the same type while `Combine.PerKey` doesn't have this restriction. Thanks, Manu