Lei, What are your MergeRecord settings? Unfortunately since CaptureChangeMySQL was developed before the Record API was available, the processor emits one flow file per event. This can certainly cause performance issues at PutDatabaseRecord. This may be the first time I've suggested this :) but you may find you're better off with ConvertJsonToSQL -> PutSQL and setting the "Batch Size" property of PutSQL to a high number. This should effectively do a "merge" (actually it grabs as many flow files as it can up to Batch Size for one execution) and execute the SQL statement(s).
It seems like a CaptureChangeMySQLRecord processor would be a good idea, or perhaps adding "Max Records Per FlowFile" and an optional "Record Writer" property to the existing CaptureChangeMySQL processor, so it can write multiple records per flowfile, sparing the need for a Merge processor or explicit conversion to SQL. I presume the choice between a new processor or augmenting the existing one would depend on whether there's a common schema for all events. Please feel free to write an Improvement Jira to cover this. Regards, Matt On Tue, Oct 15, 2019 at 7:58 AM [email protected] <[email protected]> wrote: > > I am using CaptureChangeMySQL to extract bin log and do some transformations > and then write to another database using PutDataBaseRecord. Now the > PutDataBaseRecord processor is a performance bottleneck > > If i set the PutDataBaseRecord processor concurrency lager than 1, there > will be ordering issues. The ordering the binlog to the destination database > will not be the same as they comming. But with one concurrency, the TPS is > only about 80/s > Even I add a MergeRecord before PutDataBaseRecord, the TPS is no more than 300 > > Anybody have any idea about this? > > Thanks, > Lei > > ________________________________ > [email protected]
