Hi Seed,

I was assuming the Cassandra sink was separate from and after your async 
function.

I was trying to come up for an explanation as to why adding the async function 
would improve your performance.

The only very unlikely reason I thought of was that the async function somehow 
caused data arriving at the sink to be more “batchy”, which (if the Cassandra 
sink had an “every x seconds do a write” batch mode) could improve performance.

— Ken

> On Mar 19, 2019, at 11:35 AM, Seed Zeng <[email protected]> wrote:
> 
> Hi Ken and Andrey,
> 
> Thanks for the response. I think there is a confusion that the writes to 
> Cassandra are happening within the Async function. 
> In my test, the async function is just a pass-through without doing any work.
> 
> So any Cassandra related batching or buffering should not be the cause for 
> this.
> 
> Thanks,
> 
> Seed
> 
> On Tue, Mar 19, 2019 at 12:35 PM Ken Krugler <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi Seed,
> 
> It’s a known issue that Flink doesn’t report back pressure properly for 
> AsyncFunctions, due to how it monitors the output collector to gather back 
> pressure statistics.
> 
> But that wouldn’t explain how you get a faster processing with the 
> AsyncFunction inserted into your workflow.
> 
> I haven’t looked at how the Cassandra sink handles batching, but if the 
> AsyncFunction somehow caused fewer, bigger Cassandra writes to happen then 
> that’s one (serious hand waving) explanation.
> 
> — Ken
> 
>> On Mar 18, 2019, at 7:48 PM, Seed Zeng <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Flink Version - 1.6.1
>> 
>> In our application, we consume from Kafka and sink to Cassandra in the end. 
>> We are trying to introduce a custom async function in front of the Sink to 
>> carry out some customized operations. In our testing, it appears that the 
>> Async function is not generating backpressure to slow down our Kafka Source 
>> when Cassandra becomes unhappy. Essentially compared to an almost identical 
>> job where the only difference is the lack of the Async function, Kafka 
>> source consumption speed is much higher under the same settings and 
>> identical Cassandra cluster. The experiment is like this.
>> 
>> Job 1 - without async function in front of Cassandra
>> Job 2 - with async function in front of Cassandra
>> 
>> Job 1 is backpressured because Cassandra cannot handle all the writes and 
>> eventually slows down the source rate to 6.5k/s. 
>> Job 2 is slightly backpressured but was able to run at 14k/s.
>> 
>> Is the AsyncFunction somehow not reporting the backpressure correctly?
>> 
>> Thanks,
>> Seed
> 
> --------------------------
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com <http://www.scaleunlimited.com/>
> Custom big data solutions & training
> Flink, Solr, Hadoop, Cascading & Cassandra
> 

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra

Reply via email to