Closing the loop, I went with two URNs and an associated payload in
https://github.com/apache/beam/pull/30545
Kenn
On Wed, Mar 6, 2024 at 10:54 AM Kenneth Knowles wrote:
> OK of course hacking this up there's already combinatorial 2x2 that
> perhaps people were alluding to but I missed.
>
>
OK of course hacking this up there's already combinatorial 2x2 that perhaps
people were alluding to but I missed.
RedistributeByKey (user's choice)
RedistributeArbitrarily (runner's choice! default may be random keys but
that is not required)
RedistributeArbitrarilyAllowingDuplicates (this is
Was that only October? Wow.
Option 2 SGTM, with the adjustment to making the core of the URN
"redistribute_allowing_duplicates" instead of building from the unspecified
Reshuffle semantics.
Transforms getting updated to use the new transform can have their
@RequiresStableInputs annotation added
On Wed, Feb 7, 2024 at 5:15 PM Robert Burke wrote:
> OK, so my stance is a configurable Reshuffle might be interesting, so my
> vote is +1, along the following lines.
>
> 1. Use a new URN (beam:transform:reshuffle:v2) and attach a new
> ReshufflePayload to it.
>
Ah, I see there's more than one
OK, so my stance is a configurable Reshuffle might be interesting, so my vote
is +1, along the following lines.
1. Use a new URN (beam:transform:reshuffle:v2) and attach a new
ReshufflePayload to it.
-> Unknown "urns for composite transforms" already default to the
subtransform graph
On Wed, Jan 31, 2024 at 4:21 AM Jan Lukavský wrote:
> Hi,
>
> if I understand this proposal correctly, the motivation is actually
> reducing latency by bypassing bundle atomic guarantees, bundles after "at
> least once" Reshuffle would be reconstructed independently of the
> pre-shuffle
On Tue, Jan 30, 2024 at 5:22 PM Robert Burke wrote:
> Is the benefit of this proposal just the bounded deviation from the
> existing reshuffle?
>
> Reshuffle is already rather dictated by arbitrary runner choice, from
> simply ignoring the node, to forcing a materialization break, to a full
>
Hi,
if I understand this proposal correctly, the motivation is actually
reducing latency by bypassing bundle atomic guarantees, bundles after
"at least once" Reshuffle would be reconstructed independently of the
pre-shuffle bundling. Provided this is correct, it seems that the
behavior is
Is the benefit of this proposal just the bounded deviation from the
existing reshuffle?
Reshuffle is already rather dictated by arbitrary runner choice, from
simply ignoring the node, to forcing a materialization break, to a full
shuffle implementation which has additional side effects.
But