Re: Runner Bundling Strategies

2023-09-22 Thread Robert Bradshaw via dev
On Fri, Sep 22, 2023 at 10:58 AM Jan Lukavský wrote: > On 9/22/23 18:07, Robert Bradshaw via dev wrote: > > On Fri, Sep 22, 2023 at 7:23 AM Byron Ellis via dev > wrote: > >> I've actually wondered about this specifically for streaming... if you're >> writing a pipeline there it seems like

Re: Runner Bundling Strategies

2023-09-22 Thread Jan Lukavský
On 9/22/23 18:07, Robert Bradshaw via dev wrote: On Fri, Sep 22, 2023 at 7:23 AM Byron Ellis via dev wrote: I've actually wondered about this specifically for streaming... if you're writing a pipeline there it seems like you're often going to want to put high fixed cost things

Re: User-facing website vs. contributor-facing website

2023-09-22 Thread Robert Bradshaw via dev
On Fri, Sep 22, 2023 at 8:05 AM Danny McCormick via dev wrote: > > I do feel strongly that https://beam.apache.org/contribute/ should > remain on the main site, as it's aimed at users (who hopefully want to step > up and contribute) > > To be clear, I don't think anyone is suggesting getting rid

Re: Runner Bundling Strategies

2023-09-22 Thread Robert Bradshaw via dev
On Fri, Sep 22, 2023 at 7:23 AM Byron Ellis via dev wrote: > I've actually wondered about this specifically for streaming... if you're > writing a pipeline there it seems like you're often going to want to put > high fixed cost things like database connections even outside of the bundle > setup.

Re: Runner Bundling Strategies

2023-09-22 Thread Jan Lukavský
Flink operators are long-running classes with life-cycle of open() and close(), so any amortization can be done between those methods, see [1]. Essentially, it could be viewed that in vanilla Flink the complete (unbounded) input is single "bundle". The crucial point is that state is

Re: User-facing website vs. contributor-facing website

2023-09-22 Thread Danny McCormick via dev
> I do feel strongly that https://beam.apache.org/contribute/ should remain on the main site, as it's aimed at users (who hopefully want to step up and contribute) To be clear, I don't think anyone is suggesting getting rid of the section, my comments were about replacing the side panel links

Re: User-facing website vs. contributor-facing website

2023-09-22 Thread Byron Ellis via dev
I feel like that's actually pretty easy with Github actions? I think maybe there's even one that exists Github Pages and probably any other static site generator thingy we could care to name. Related, I stumbled across this the other day: https://github.com/apache/beam-site which appears to be

Re: Runner Bundling Strategies

2023-09-22 Thread Joey Tran
Ah! Thanks for that catch. I had subscribed to the user mailing list but forgot to ever sub to the dev list On Fri, Sep 22, 2023 at 10:03 AM Kenneth Knowles wrote: > (I notice that you replied only to yourself, but there has been a whole > thread of discussion on this - are you subscribed to

Re: Runner Bundling Strategies

2023-09-22 Thread Byron Ellis via dev
I've actually wondered about this specifically for streaming... if you're writing a pipeline there it seems like you're often going to want to put high fixed cost things like database connections even outside of the bundle setup. You really only want to do that once in the lifetime of the worker

Re: Runner Bundling Strategies

2023-09-22 Thread Kenneth Knowles
(I notice that you replied only to yourself, but there has been a whole thread of discussion on this - are you subscribed to dev@beam? https://lists.apache.org/thread/k81fq301ypwmjowknzyqq2qc63844rbd) It sounds like you want what everyone wants: to have the biggest bundles possible. So for

Re: Runner Bundling Strategies

2023-09-22 Thread Joey Tran
Whoops, I typoed my last email. I meant to write "this isn't the greatest strategy for high *fixed* cost transforms", e.g. a transform that takes 5 minutes to get set up and then maybe a microsecond per input I suppose one solution is to move the responsibility for handling this kind of situation

Re: Runner Bundling Strategies

2023-09-22 Thread Kenneth Knowles
What is the best way to amortize heavy operations across elements in Flink? (that is what bundles are for, basically) On Fri, Sep 22, 2023 at 5:09 AM Jan Lukavský wrote: > Flink defines bundles in terms of number of elements and processing time, > by default 1000 elements or 1000 milliseconds,

Beam High Priority Issue Report (42)

2023-09-22 Thread beamactions
This is your daily summary of Beam's current high priority issues that may need attention. See https://beam.apache.org/contribute/issue-priorities for the meaning and expectations around issue priorities. Unassigned P1 Issues: https://github.com/apache/beam/issues/28383 [Failing Test]:

Re: Runner Bundling Strategies

2023-09-22 Thread Jan Lukavský
Flink defines bundles in terms of number of elements and processing time, by default 1000 elements or 1000 milliseconds, whatever happens first. But bundles are not a "natural" concept in Flink, it uses them merely to comply with the Beam model. By default, checkpoints are unaligned with