Re: PutDataBaseRecord performance issues.

Matt Burgess Tue, 15 Oct 2019 08:22:25 -0700

Lei,

What are your MergeRecord settings? Unfortunately since
CaptureChangeMySQL was developed before the Record API was available,
the processor emits one flow file per event. This can certainly cause
performance issues at PutDatabaseRecord. This may be the first time
I've suggested this :) but you may find you're better off with
ConvertJsonToSQL -> PutSQL and setting the "Batch Size" property of
PutSQL to a high number. This should effectively do a "merge"
(actually it grabs as many flow files as it can up to Batch Size for
one execution) and execute the SQL statement(s).

It seems like a CaptureChangeMySQLRecord processor would be a good
idea, or perhaps adding "Max Records Per FlowFile" and an optional
"Record Writer" property to the existing CaptureChangeMySQL processor,
so it can write multiple records per flowfile, sparing the need for a
Merge processor or explicit conversion to SQL. I presume the choice
between a new processor or augmenting the existing one would depend on
whether there's a common schema for all events. Please feel free to
write an Improvement Jira to cover this.

Regards,
Matt

On Tue, Oct 15, 2019 at 7:58 AM [email protected]
<[email protected]> wrote:
>
> I am using CaptureChangeMySQL to extract bin log and do some transformations 
> and then write to another database using  PutDataBaseRecord.  Now the 
> PutDataBaseRecord  processor is a performance bottleneck
>
> If i set the PutDataBaseRecord  processor concurrency lager than 1,  there 
> will be ordering issues. The ordering the binlog to the destination database 
> will not be the same as they comming. But with  one concurrency, the TPS is 
> only about 80/s
> Even I add a MergeRecord before PutDataBaseRecord, the TPS is no more than 300
>
> Anybody have any idea about this?
>
> Thanks,
> Lei
>
> ________________________________
> [email protected]

Re: PutDataBaseRecord performance issues.

Reply via email to