Hi Vitaly, I do not work for a CDC vendor but there are many reasons why CDC tools exist and no NiFi processor will ever beat the benefits of these commercial tools. If timestamps are reliable, extracts are fast and you can handle inserts, updates, delete and primary keys updates without impacting the performance of your source database - great!
Unfortunately, in my tenure, this is a rare case rather than exceptions. Vendors design extremely poor databases, developers and DBAs delete or change data, without updating timestamps. One of the projects I worked on recently, did not even have timestamps and tables were very large (over 1B rows). CDC tools allow mining database log efficiently and reliably. Another use case for CDC tools is online replication (either for data redundancy, backups or offloading reporting queries from production system). It just seems to me you have not encountered a project/use case when such a tool is a must. Here is example for you...We recently purchased GoldenGate that can stream data into Kafka. While we had timestamps in the source database, they were unreliable for many reasons including people. Another reason, our source systems are beefy Oracle RAC clusters which are under extreme load 24x7. Lots of analysts and dev used it for reporting purposes which impact performance greatly for people who need to perform their duties. There is also a lot of complexity then it comes to mining database logs. First, these things are platform/vendor dependent. Second, serious commercial RDMBS like Oracle, have tons of settings and deployment options, log file rotation rules, backups, clustering, load-balancing you name it. CDC tool was our option to solve that issue with Oracle and that specific system. At the same time, I worked with another vendor system, when we could rely on timestamps and they would never delete data. Hope it sheds some light a bit on "change data capture logs marketed advantages" ;) On Fri, Jan 18, 2019 at 12:07 PM Vitaly Krivoy <[email protected]> wrote: > This is a follow-on question to Apache/HortonWorks, rather than an answer > to the question posted by Marcelo. Outside of CaptureChangeMySQL, are there > plans underway to add similar processors for other databases? I realize > that a database would have to produce a capture data change log for this > feature to be implemented inexpensively. One of the objections that I that > I constantly have to face in my organization to NiFi adaptation is that > ExecuteSQL would require polling thus affecting performance of the source > DBMS system. I realize that this objection is silly and if > modification/creation timestamp column in the table is indexed, selecting > the rows that have been added/modified after last run date in ExecuteSQL > would barely affect the server. But as a consulting architect, I have to > deal with non-technical clients who make their decisions based on buzz > words and they have heard of change data capture logs marketed advantages. > Thanks. > > > > *From:* Marcelo Terres <[email protected]> > *Sent:* Thursday, January 17, 2019 5:51 AM > *To:* [email protected] > *Subject:* CaptureChangeMySQL and Triggers > > > > Hello. > > > > Is someone here using CaptureChangeMySQL processor to get data from a > table which data is generated and managed by triggers/stored procedures? > > > > I'm having some weird issues as data not being processed in case of > specific inserts and also some weird data being generated in case of simple > operations (3 objects in one update operation, for example). > > > > Thanks in advance, > > > > Regards, > > > > Marcelo H. Terres > > <[email protected]> > https://www.mundoopensource.com.br > https://twitter.com/mhterres > https://linkedin.com/in/marceloterres > > > > STATEMENT OF CONFIDENTIALITY The information contained in this email > message and any attachments may be confidential and legally privileged and > is intended for the use of the addressee(s) only. If you are not an > intended recipient, please: (1) notify me immediately by replying to this > message; (2) do not use, disseminate, distribute or reproduce any part of > the message or any attachment; and (3) destroy all copies of this message > and any attachments. >
