Thanks guys. My original confusion comes from that if the API allows me to
have different types of input and output, why not make it easier. It's
clear now.
Do you think it's better to hide some interfaces we don't expect users to
use ? The Combine API has lured me a lot to do more than it
On Mon, Oct 31, 2016 at 8:39 PM, Kenneth Knowles
wrote:
> Manu, I think your critique about user interface clarity is valid.
> CombineFn conflates a few operations and is not that clear about what it is
> doing or why. You seem to be concerned about CombineFn versus
>
Manu, I think your critique about user interface clarity is valid.
CombineFn conflates a few operations and is not that clear about what it is
doing or why. You seem to be concerned about CombineFn versus
SerializableFunction constructors for the Combine family of transforms. I
thought I'd respond
I'm a bit confused here because neither of them requires same type of input
and output. Also, the Javadoc of Globally says "It is common for {@code *InputT
== OutputT}, but not required" *If associative and commutative is expected,
why don't they have restrictions like
GlobalCombineFn and PerKeyCombineFn still expect an associative and
commutative function when accumulating.
GlobalCombineFn is shorthand for assigning everything to a single key,
doing the combine, and then discarding the key and extracting the single
output.
PerKeyCombineFn is shorthand for doing
For it to be considered a combiner, the function needs to be associative
and commutative.
The issue is that from an API perspective it would be easy to have a
Combine.perKey(SerializableFunction). But many
people in the data processing world expect that this
Combine.perKey takes a single SerializableFunction which knows how to
convert from Iterable to V.
It turns out that many runners implement optimizations which allow them to
run the combine operation across several machines to parallelize the work
and potentially reduce the amount of data they
Hi all,
I'm wondering why `Combine.perKey(SerializableFunction)` requires input and
output to be of the same type while `Combine.PerKey` doesn't have this
restriction.
Thanks,
Manu