Re: How to process mini-batch events in Flink with Datastream API

2023-02-10 Thread Leon Xu
Thanks Austin. Will take a look at the AsyncIO. Looks like a pretty cool
feature.

On Fri, Feb 10, 2023 at 1:31 PM Austin Cawley-Edwards <
austin.caw...@gmail.com> wrote:

> It's been a while, but I think I've done something similar before with
> Async I/O [1] and batching records with a window.
>
> This was years ago, so no idea if this was/is good practice, but
> essentially it was:
>
> -> Window by batch size (with a timeout trigger to maintain some SLA)
> -> Process function that just collects all records in the window
> -> Send the entire batch to the AsyncFunction
>
> This approach definitely has some downside, where you don't get to take
> advantage of some of the nice per-record things Async I/O gives you
> (ordering, retries, etc.) but it does greatly reduce the load on external
> services.
>
> Hope that helps,
> Austin
>
> [1]:
> https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/operators/asyncio/
>
> On Fri, Feb 10, 2023 at 3:22 PM Leon Xu  wrote:
>
>> I wonder if windows will be the solution when it comes to datastream API.
>>
>> On Fri, Feb 10, 2023 at 12:07 PM Leon Xu  wrote:
>>
>>> Hi Flink Users,
>>>
>>> We wanted to use Flink to run a decoration pipeline, where we would like
>>> to make calls to some external service to fetch data and alter the event in
>>> the Flink pipeline.
>>>
>>> Since there's external service call involved so we want to do batch
>>> calls so that it can reduce the load on the external service.(batching
>>> multiple flink events and just make one external service call)
>>>
>>> It looks like min-batch might be something we can leverage to achieve
>>> that but that feature seems to only exist in table API. We are using
>>> datastream API and we are wondering if there's any solution/workaround for
>>> this?
>>>
>>>
>>> Thanks
>>> Leon
>>>
>>


Re: How to process mini-batch events in Flink with Datastream API

2023-02-10 Thread Austin Cawley-Edwards
It's been a while, but I think I've done something similar before with
Async I/O [1] and batching records with a window.

This was years ago, so no idea if this was/is good practice, but
essentially it was:

-> Window by batch size (with a timeout trigger to maintain some SLA)
-> Process function that just collects all records in the window
-> Send the entire batch to the AsyncFunction

This approach definitely has some downside, where you don't get to take
advantage of some of the nice per-record things Async I/O gives you
(ordering, retries, etc.) but it does greatly reduce the load on external
services.

Hope that helps,
Austin

[1]:
https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/operators/asyncio/

On Fri, Feb 10, 2023 at 3:22 PM Leon Xu  wrote:

> I wonder if windows will be the solution when it comes to datastream API.
>
> On Fri, Feb 10, 2023 at 12:07 PM Leon Xu  wrote:
>
>> Hi Flink Users,
>>
>> We wanted to use Flink to run a decoration pipeline, where we would like
>> to make calls to some external service to fetch data and alter the event in
>> the Flink pipeline.
>>
>> Since there's external service call involved so we want to do batch calls
>> so that it can reduce the load on the external service.(batching multiple
>> flink events and just make one external service call)
>>
>> It looks like min-batch might be something we can leverage to achieve
>> that but that feature seems to only exist in table API. We are using
>> datastream API and we are wondering if there's any solution/workaround for
>> this?
>>
>>
>> Thanks
>> Leon
>>
>


Re: How to process mini-batch events in Flink with Datastream API

2023-02-10 Thread Leon Xu
I wonder if windows will be the solution when it comes to datastream API.

On Fri, Feb 10, 2023 at 12:07 PM Leon Xu  wrote:

> Hi Flink Users,
>
> We wanted to use Flink to run a decoration pipeline, where we would like
> to make calls to some external service to fetch data and alter the event in
> the Flink pipeline.
>
> Since there's external service call involved so we want to do batch calls
> so that it can reduce the load on the external service.(batching multiple
> flink events and just make one external service call)
>
> It looks like min-batch might be something we can leverage to achieve that
> but that feature seems to only exist in table API. We are using datastream
> API and we are wondering if there's any solution/workaround for this?
>
>
> Thanks
> Leon
>


How to process mini-batch events in Flink with Datastream API

2023-02-10 Thread Leon Xu
Hi Flink Users,

We wanted to use Flink to run a decoration pipeline, where we would like to
make calls to some external service to fetch data and alter the event in
the Flink pipeline.

Since there's external service call involved so we want to do batch calls
so that it can reduce the load on the external service.(batching multiple
flink events and just make one external service call)

It looks like min-batch might be something we can leverage to achieve that
but that feature seems to only exist in table API. We are using datastream
API and we are wondering if there's any solution/workaround for this?


Thanks
Leon