Re: [Python SDK] Feedback for deferred side inputs + combiners

2024-04-11 Thread Valentyn Tymofieiev via dev
On Thu, Apr 11, 2024 at 1:00 PM Joey Tran  wrote:

> Thanks for the feedback!
>
> Sounds like it'll be a while before this is supported. Would it make sense
> to formally _not_ support this functionality by raising an early and clear
> exception if someone does try to use a side input with a combiner? As it
> stands now, the exception a user gets is very difficult to understand. My
> motivation for looking into this was a new user I was introducing to Beam
> ran into this issue and their first impression was that Beam had terrible
> error handling / messages.
>

In general, failing early and with a proper message is a good idea. If
something fails 100% of the time later, it is better to fail early.  If we
can't fail early, try to fail with a better error message. If some
functionality fails 90% of the time but works 10% of the time, we'd have to
be more careful to not accidentally introduce a breaking change for some
niche group of users. For combiners with side inputs, IIRC there was a
reference in BEAM-8400  that
they might work in some cases but not other cases - I haven't verified
that.



>
> Best,
> Joey
>
> On Thu, Apr 11, 2024 at 3:52 PM Valentyn Tymofieiev via dev <
> dev@beam.apache.org> wrote:
>
>> I took a look and mentioned the PR to a few folks. Couple of thoughts:
>> - We should avoid Beam adding a high-level functionality only for Batch.
>> - Supporting Windowing/Triggers would likely be non-trivial and worth
>> considering early in the design.
>> - If you'd like to continue working on this, I would suggest to start a
>> document and gradually cover the following (with help/feedback from others
>> in this list):
>>- the motivational example, available workarounds (if any)
>>- background on aspects of combiner implementation that are relevant.
>>- proposed approach  and considered alternatives
>>- any runner-specific considerations.
>>
>> Thanks,
>> Valentyn
>>
>> On Fri, Mar 29, 2024 at 5:06 AM Joey Tran 
>> wrote:
>>
>>> I posted a PoC PR [1] for fixing deferred side inputs with combiners in
>>> the python SDK. Would someone be willing to take a look at it?
>>>
>>> I have it working but could use some feedback on where to take it next.
>>> It looks like bundle processor combiner operations don't currently support
>>> side inputs [2] so I added a conditional in `CombinePerKey` that checks
>>> whether it was instantiated with a side input and if so, use a ParDo-based
>>> version of the combiner so we can piggyback off of the Do operations
>>> implementation of side inputs rather than reimplementing it for the
>>> combiner operation.
>>>
>>> [1] https://github.com/apache/beam/pull/30743
>>> [2]
>>> https://github.com/apache/beam/blob/e3fee5156b3515f96dc5ba90ea2fbc6f6be2bd34/sdks/python/apache_beam/runners/worker/operations.py#L1146
>>>
>>


Re: [Python SDK] Feedback for deferred side inputs + combiners

2024-04-11 Thread Joey Tran
Thanks for the feedback!

Sounds like it'll be a while before this is supported. Would it make sense
to formally _not_ support this functionality by raising an early and clear
exception if someone does try to use a side input with a combiner? As it
stands now, the exception a user gets is very difficult to understand. My
motivation for looking into this was a new user I was introducing to Beam
ran into this issue and their first impression was that Beam had terrible
error handling / messages.

Best,
Joey

On Thu, Apr 11, 2024 at 3:52 PM Valentyn Tymofieiev via dev <
dev@beam.apache.org> wrote:

> I took a look and mentioned the PR to a few folks. Couple of thoughts:
> - We should avoid Beam adding a high-level functionality only for Batch.
> - Supporting Windowing/Triggers would likely be non-trivial and worth
> considering early in the design.
> - If you'd like to continue working on this, I would suggest to start a
> document and gradually cover the following (with help/feedback from others
> in this list):
>- the motivational example, available workarounds (if any)
>- background on aspects of combiner implementation that are relevant.
>- proposed approach  and considered alternatives
>- any runner-specific considerations.
>
> Thanks,
> Valentyn
>
> On Fri, Mar 29, 2024 at 5:06 AM Joey Tran 
> wrote:
>
>> I posted a PoC PR [1] for fixing deferred side inputs with combiners in
>> the python SDK. Would someone be willing to take a look at it?
>>
>> I have it working but could use some feedback on where to take it next.
>> It looks like bundle processor combiner operations don't currently support
>> side inputs [2] so I added a conditional in `CombinePerKey` that checks
>> whether it was instantiated with a side input and if so, use a ParDo-based
>> version of the combiner so we can piggyback off of the Do operations
>> implementation of side inputs rather than reimplementing it for the
>> combiner operation.
>>
>> [1] https://github.com/apache/beam/pull/30743
>> [2]
>> https://github.com/apache/beam/blob/e3fee5156b3515f96dc5ba90ea2fbc6f6be2bd34/sdks/python/apache_beam/runners/worker/operations.py#L1146
>>
>


Re: [Python SDK] Feedback for deferred side inputs + combiners

2024-04-11 Thread Valentyn Tymofieiev via dev
I took a look and mentioned the PR to a few folks. Couple of thoughts:
- We should avoid Beam adding a high-level functionality only for Batch.
- Supporting Windowing/Triggers would likely be non-trivial and worth
considering early in the design.
- If you'd like to continue working on this, I would suggest to start a
document and gradually cover the following (with help/feedback from others
in this list):
   - the motivational example, available workarounds (if any)
   - background on aspects of combiner implementation that are relevant.
   - proposed approach  and considered alternatives
   - any runner-specific considerations.

Thanks,
Valentyn

On Fri, Mar 29, 2024 at 5:06 AM Joey Tran  wrote:

> I posted a PoC PR [1] for fixing deferred side inputs with combiners in
> the python SDK. Would someone be willing to take a look at it?
>
> I have it working but could use some feedback on where to take it next. It
> looks like bundle processor combiner operations don't currently support
> side inputs [2] so I added a conditional in `CombinePerKey` that checks
> whether it was instantiated with a side input and if so, use a ParDo-based
> version of the combiner so we can piggyback off of the Do operations
> implementation of side inputs rather than reimplementing it for the
> combiner operation.
>
> [1] https://github.com/apache/beam/pull/30743
> [2]
> https://github.com/apache/beam/blob/e3fee5156b3515f96dc5ba90ea2fbc6f6be2bd34/sdks/python/apache_beam/runners/worker/operations.py#L1146
>