Hello fellow Flink users,

How do you create backpressure with Statefun remote functions? I'm using an
asynchronous web server for the remote function (Python aiohttp on uvicorn)
which accepts more requests than its CPU bound backend can handle. That can
make the requests time out and can trigger a restart loop of the whole Flink
pipeline. Of course only in situations where so many requests are coming
into the ingress kafka stream that the service cannot handle it anymore.

Desired behavior: Flink only consumes as many records from the input stream
as the pipeline can handle instead of overloading the remote functions.

What I tried so far:
1. Set "statefun.async.max-per-task" in flink.conf to a low number. This
works. But that is one global static config for all function which cannot be
changed without restarting the cluster when the remote functions are scaled
up or down.
2. Add concurrency limiting to the remote function service. If the function
service returns failure codes (500, 503, 429) that doesn't seem to create
backpressure but is handled like a normal failure by flink with retries
until finally the whole pipeline gets restarted.
3. Try the new "io.statefun.transports.v1/async" transport type for the
endpoints with a low "pool_size" parameter. But reaching the pool size seems
to be treated like an error instead of creating backpressure. Same effect as
option 2.

Is there some other option? How should it be done by design?

Thanks,

Christian


Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to