Hi Seed,

It’s a known issue that Flink doesn’t report back pressure properly for 
AsyncFunctions, due to how it monitors the output collector to gather back 
pressure statistics.

But that wouldn’t explain how you get a faster processing with the 
AsyncFunction inserted into your workflow.

I haven’t looked at how the Cassandra sink handles batching, but if the 
AsyncFunction somehow caused fewer, bigger Cassandra writes to happen then 
that’s one (serious hand waving) explanation.

— Ken

> On Mar 18, 2019, at 7:48 PM, Seed Zeng <seed.z...@klaviyo.com> wrote:
> 
> Flink Version - 1.6.1
> 
> In our application, we consume from Kafka and sink to Cassandra in the end. 
> We are trying to introduce a custom async function in front of the Sink to 
> carry out some customized operations. In our testing, it appears that the 
> Async function is not generating backpressure to slow down our Kafka Source 
> when Cassandra becomes unhappy. Essentially compared to an almost identical 
> job where the only difference is the lack of the Async function, Kafka source 
> consumption speed is much higher under the same settings and identical 
> Cassandra cluster. The experiment is like this.
> 
> Job 1 - without async function in front of Cassandra
> Job 2 - with async function in front of Cassandra
> 
> Job 1 is backpressured because Cassandra cannot handle all the writes and 
> eventually slows down the source rate to 6.5k/s. 
> Job 2 is slightly backpressured but was able to run at 14k/s.
> 
> Is the AsyncFunction somehow not reporting the backpressure correctly?
> 
> Thanks,
> Seed

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra

Reply via email to