Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-31 Thread Manu Zhang
Thanks guys. My original confusion comes from that if the API allows me to have different types of input and output, why not make it easier. It's clear now. Do you think it's better to hide some interfaces we don't expect users to use ? The Combine API has lured me a lot to do more than it

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-31 Thread Robert Bradshaw
On Mon, Oct 31, 2016 at 8:39 PM, Kenneth Knowles wrote: > Manu, I think your critique about user interface clarity is valid. > CombineFn conflates a few operations and is not that clear about what it is > doing or why. You seem to be concerned about CombineFn versus >

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-31 Thread Kenneth Knowles
Manu, I think your critique about user interface clarity is valid. CombineFn conflates a few operations and is not that clear about what it is doing or why. You seem to be concerned about CombineFn versus SerializableFunction constructors for the Combine family of transforms. I thought I'd respond

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-31 Thread Manu Zhang
I'm a bit confused here because neither of them requires same type of input and output. Also, the Javadoc of Globally says "It is common for {@code *InputT == OutputT}, but not required" *If associative and commutative is expected, why don't they have restrictions like

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-31 Thread Lukasz Cwik
GlobalCombineFn and PerKeyCombineFn still expect an associative and commutative function when accumulating. GlobalCombineFn is shorthand for assigning everything to a single key, doing the combine, and then discarding the key and extracting the single output. PerKeyCombineFn is shorthand for doing

Re: Why does `Combine.perKey(SerializableFunction)` require same input and output type

2016-10-27 Thread Manu Zhang
Thanks for the thorough explanation. I see the benefits for such a function. My follow-up question is whether this is a hard requirement. There are computations that don't satisfy this (I think it's monoid rule) but possible and easier to write with Combine.perKey(SerializableFunction