Hey Marco,
Other more senior people can correct me here. About limiting the
concurrency aspect of things. Beam/Runners split PCollections of <Key,
Value> by Key. So as long as all your items have the same key, I think it
will create only one executor for that ParDo. So that's what I did recently:
1. create a PTransform (to hide this logic from the rest of the pipeline)
2. In the PTransform apply: input.apply(WithKeys.of("")) (basically making
all the items go to the same executor)
3. In the PTransform create a private DoFn
4. the DoFn has a Timer and a State as Kenn's blog linked by Brian.
5. whenever the timer expires (a minute in your case), grab the first 300
items from your state and make service calls.
Hope it helps,
Cristian
On Fri, Sep 24, 2021 at 12:22 PM Brian Hulette <[email protected]> wrote:
> Kenn's Timely Processing blog [1] post discusses this use case.
>
> Brian
>
> [1] https://beam.apache.org/blog/timely-processing/
>
> On Fri, Sep 24, 2021 at 4:12 AM Evan Galpin <[email protected]> wrote:
>
>> This has been mentioned a few times and seems to me to be a fairly common
>> requirement.
>>
>> I think that a rate limit could be accomplished through stateful
>> processing, using a combination of bagState and Timers.
>> GroupIntoBatches.java would be a good example.
>>
>> I wonder if this would be a good built-in transform given the number of
>> times it’s come up 🙂
>>
>> Thanks,
>> Evan
>>
>> On Fri, Sep 24, 2021 at 05:29 Sofia’s World <[email protected]> wrote:
>>
>>> Hello
>>> i was wondering if it's somehow possible to limit the concurrency of a
>>> beam Step?
>>> i have a workflow which involves a Webclient that uses an API for which
>>> my account has a max of 300/requests per minute...
>>> Alternatively, will i have to go through a combine and custom ParDo ?
>>>
>>> Has anyone came across this Usecase>?
>>>
>>> kind regards
>>> Marco
>>>
>>