GitHub user jkff opened a pull request:
https://github.com/apache/beam/pull/3890
Introduces Reshuffle.viaRandomKey()
It's a commonly used pattern for breaking fusion
https://cloud.google.com/dataflow/service/dataflow-service-desc#fusion-optimization
viaRandomKey() only abstracts away the current commonly used pattern. It
has the same caveats as using Reshuffle.of() directly - the semantics are
technically not guaranteed by the Beam model, but it works in practice, and
this is the pattern we keep recommending to users.
The naming is deliberately operational rather than semantic, to emphasize
that we don't have the semantics figured out, and the transform promises only
that it expands into exactly the sequence "pair with random key, reshuffle,
drop key". The goal of this change is just to reduce copy-paste.
See prior discussion at
https://lists.apache.org/thread.html/ac34c9ac665a8d9f67b0254015e44c59ea65ecc1360d4014b95d3b2e@%3Cdev.beam.apache.org%3E
This change also converts several existing usages to use it, and adds
another one in Match.
R: @bjchambers
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jkff/incubator-beam match-fusion-break
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/beam/pull/3890.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3890
commit 0b4d801e4afb0be1463b196f419e1293265b68c1
Author: Eugene Kirpichov
Date: 2017-09-22T22:24:36Z
Introduces Reshuffle.viaRandomKey()
It's a commonly used pattern for breaking fusion
https://cloud.google.com/dataflow/service/dataflow-service-desc#fusion-optimization
viaRandomKey() only abstracts away the current commonly used pattern.
It has the same caveats as using Reshuffle.of() directly - the semantics
are technically not guaranteed by the Beam model, but it works in
practice, and this is the pattern we keep recommending to users.
The naming is deliberately operational rather than semantic, to
emphasize that we don't have the semantics figured out, and the
transform promises only that it expands into exactly the sequence
"pair with random key, reshuffle, drop key".
The goal of this change is just to reduce copy-paste.
See prior discussion at
https://lists.apache.org/thread.html/ac34c9ac665a8d9f67b0254015e44c59ea65ecc1360d4014b95d3b2e@%3Cdev.beam.apache.org%3E
This change also converts several existing usages to use it, and adds
another
one in Match.
---