It does. Thank you Boris.

From: Boris Tyukin <[email protected]>
Sent: Friday, January 18, 2019 1:10 PM
To: [email protected]
Subject: Re: CaptureChangeMySQL and Triggers

Hi Vitaly,

I do not work for a CDC vendor but there are many reasons why CDC tools exist 
and no NiFi processor will ever beat the benefits of these commercial tools. If 
timestamps are reliable, extracts are fast and you can handle inserts, updates, 
delete and primary keys updates without impacting the performance of your 
source database - great!

Unfortunately, in my tenure, this is a rare case rather than exceptions. 
Vendors design extremely poor databases, developers and DBAs delete or change 
data, without updating timestamps. One of the projects I worked on recently, 
did not even have timestamps and tables were very large (over 1B rows).

CDC tools allow mining database log efficiently and reliably. Another use case 
for CDC tools is online replication (either for data redundancy, backups or 
offloading reporting queries from production system).

It just seems to me you have not encountered a project/use case when such a 
tool is a must.

Here is example for you...We recently purchased GoldenGate that can stream data 
into Kafka. While we had timestamps in the source database, they were 
unreliable for many reasons including people. Another reason, our source 
systems are beefy Oracle RAC clusters which are under extreme load 24x7. Lots 
of analysts and dev used it for reporting purposes which impact performance 
greatly for people who need to perform their duties.

There is also a lot of complexity then it comes to mining database logs. First, 
these things are platform/vendor dependent. Second, serious commercial RDMBS 
like Oracle, have tons of settings and deployment options, log file rotation 
rules, backups, clustering, load-balancing you name it.

CDC tool was our option to solve that issue with Oracle and that specific 
system. At the same time, I worked with another vendor system, when we could 
rely on timestamps and they would never delete data.

Hope it sheds some light a bit on "change data capture logs marketed 
advantages" ;)


On Fri, Jan 18, 2019 at 12:07 PM Vitaly Krivoy 
<[email protected]<mailto:[email protected]>> wrote:
This is a follow-on question to Apache/HortonWorks, rather than an answer to 
the question posted by Marcelo. Outside of CaptureChangeMySQL, are there plans 
underway to add similar processors for other databases? I realize that a 
database would have to produce a capture data change log for this feature to be 
implemented inexpensively. One of the objections that I that I constantly have 
to face in my organization to NiFi adaptation is that ExecuteSQL would require 
polling thus affecting performance of the source DBMS system. I realize that 
this objection is silly and if modification/creation timestamp column in the 
table is indexed, selecting the rows that have been added/modified after last 
run date in ExecuteSQL would barely affect the server. But as a consulting 
architect, I have to deal with non-technical clients who make their decisions 
based on buzz words and they have heard of change data capture logs marketed 
advantages. Thanks.

From: Marcelo Terres <[email protected]<mailto:[email protected]>>
Sent: Thursday, January 17, 2019 5:51 AM
To: [email protected]<mailto:[email protected]>
Subject: CaptureChangeMySQL and Triggers

Hello.

Is someone here using CaptureChangeMySQL processor to get data from a table 
which data is generated and managed by triggers/stored procedures?

I'm having some weird issues as data not being processed in case of specific 
inserts and also some weird data being generated in case of simple operations 
(3 objects in one update operation, for example).

Thanks in advance,

Regards,

Marcelo H. Terres
<[email protected]<mailto:[email protected]>>
https://www.mundoopensource.com.br
https://twitter.com/mhterres
https://linkedin.com/in/marceloterres

STATEMENT OF CONFIDENTIALITY The information contained in this email message 
and any attachments may be confidential and legally privileged and is intended 
for the use of the addressee(s) only. If you are not an intended recipient, 
please: (1) notify me immediately by replying to this message; (2) do not use, 
disseminate, distribute or reproduce any part of the message or any attachment; 
and (3) destroy all copies of this message and any attachments.

STATEMENT OF CONFIDENTIALITY The information contained in this email message 
and any attachments may be confidential and legally privileged and is intended 
for the use of the addressee(s) only. If you are not an intended recipient, 
please: (1) notify me immediately by replying to this message; (2) do not use, 
disseminate, distribute or reproduce any part of the message or any attachment; 
and (3) destroy all copies of this message and any attachments.

Reply via email to