Re: [DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-03-13 Thread Kenneth Knowles
Closing the loop, I went with two URNs and an associated payload in https://github.com/apache/beam/pull/30545 Kenn On Wed, Mar 6, 2024 at 10:54 AM Kenneth Knowles wrote: > OK of course hacking this up there's already combinatorial 2x2 that > perhaps people were alluding to but I missed. > >

Re: [DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-03-06 Thread Kenneth Knowles
OK of course hacking this up there's already combinatorial 2x2 that perhaps people were alluding to but I missed. RedistributeByKey (user's choice) RedistributeArbitrarily (runner's choice! default may be random keys but that is not required) RedistributeArbitrarilyAllowingDuplicates (this is

Re: [DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-02-08 Thread Robert Burke
Was that only October? Wow. Option 2 SGTM, with the adjustment to making the core of the URN "redistribute_allowing_duplicates" instead of building from the unspecified Reshuffle semantics. Transforms getting updated to use the new transform can have their @RequiresStableInputs annotation added

Re: [DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-02-08 Thread Kenneth Knowles
On Wed, Feb 7, 2024 at 5:15 PM Robert Burke wrote: > OK, so my stance is a configurable Reshuffle might be interesting, so my > vote is +1, along the following lines. > > 1. Use a new URN (beam:transform:reshuffle:v2) and attach a new > ReshufflePayload to it. > Ah, I see there's more than one

Re: [DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-02-07 Thread Robert Burke
OK, so my stance is a configurable Reshuffle might be interesting, so my vote is +1, along the following lines. 1. Use a new URN (beam:transform:reshuffle:v2) and attach a new ReshufflePayload to it. -> Unknown "urns for composite transforms" already default to the subtransform graph

Re: [DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-01-31 Thread Kenneth Knowles
On Wed, Jan 31, 2024 at 4:21 AM Jan Lukavský wrote: > Hi, > > if I understand this proposal correctly, the motivation is actually > reducing latency by bypassing bundle atomic guarantees, bundles after "at > least once" Reshuffle would be reconstructed independently of the > pre-shuffle

Re: [DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-01-31 Thread Kenneth Knowles
On Tue, Jan 30, 2024 at 5:22 PM Robert Burke wrote: > Is the benefit of this proposal just the bounded deviation from the > existing reshuffle? > > Reshuffle is already rather dictated by arbitrary runner choice, from > simply ignoring the node, to forcing a materialization break, to a full >

Re: [DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-01-31 Thread Jan Lukavský
Hi, if I understand this proposal correctly, the motivation is actually reducing latency by bypassing bundle atomic guarantees, bundles after "at least once" Reshuffle would be reconstructed independently of the pre-shuffle bundling. Provided this is correct, it seems that the behavior is

Re: [DESIGN PROPOSAL] Reshuffle Allowing Duplicates

2024-01-30 Thread Robert Burke
Is the benefit of this proposal just the bounded deviation from the existing reshuffle? Reshuffle is already rather dictated by arbitrary runner choice, from simply ignoring the node, to forcing a materialization break, to a full shuffle implementation which has additional side effects. But