Hello fellow Flink users, How do you create backpressure with Statefun remote functions? I'm using an asynchronous web server for the remote function (Python aiohttp on uvicorn) which accepts more requests than its CPU bound backend can handle. That can make the requests time out and can trigger a restart loop of the whole Flink pipeline. Of course only in situations where so many requests are coming into the ingress kafka stream that the service cannot handle it anymore.
Desired behavior: Flink only consumes as many records from the input stream as the pipeline can handle instead of overloading the remote functions. What I tried so far: 1. Set "statefun.async.max-per-task" in flink.conf to a low number. This works. But that is one global static config for all function which cannot be changed without restarting the cluster when the remote functions are scaled up or down. 2. Add concurrency limiting to the remote function service. If the function service returns failure codes (500, 503, 429) that doesn't seem to create backpressure but is handled like a normal failure by flink with retries until finally the whole pipeline gets restarted. 3. Try the new "io.statefun.transports.v1/async" transport type for the endpoints with a low "pool_size" parameter. But reaching the pool size seems to be treated like an error instead of creating backpressure. Same effect as option 2. Is there some other option? How should it be done by design? Thanks, Christian
smime.p7s
Description: S/MIME cryptographic signature