Alberto, Since it sounds like you have control over the structure of the tables, this should be doable.
If you have a changelog table for each table this will probably be easier, and in your changelog table you’ll need to make sure you have a good transaction timestamp column and a change type column (I/U/D). Then use QueryDatabaseTable to tail your change log table, one copy of QueryDatabaseTable for each change table. Now your changes are in easy to ingest Avro files. For HIVE I’d probably use an external table with the Avro schema, this makes it easy to use PutHDFS to load the file and make it accessible from HIVE. I haven’t used Phoenix, sorry. If you have a single change table for all tables, then you can still use the above patter, but you’ll need a middle step where you extract and rebuild the changes. Maybe if you store the changes in JSON you could extract them using one of the Record parsers and then rebuild the data row. Much harder though. Thanks, Peter From: Alberto Bengoa [mailto:[email protected]] Sent: Wednesday, December 06, 2017 06:24 To: [email protected] Subject: [EXT] CDC like updates on Nifi Hey folks, I read about Nifi CDC processor for MySQL and other CDC "solutions" with Nifi found on Google, like these: https://community.hortonworks.com/idea/53420/apache-nifi-processor-to-address-cdc-use-cases-for.html https://community.hortonworks.com/questions/88686/change-data-capture-using-nifi-1.html https://community.hortonworks.com/articles/113941/change-data-capture-cdc-with-apache-nifi-version-1-1.html I'm trying a different approach to acquire fresh information from tables, using triggers on source database's tables to write changes to a "changelog table". This is done, but my questions are: Would Nifi be capable to read this tables, transform these data to generate a SQL equivalent query (insert/update/delete) to send to Hive and/or Phoenix with current available processors? Which would be the best / suggested flow? The objective is to keep tables on the Data Lake as up-to-date as possible for real time analyses. Cheers, Alberto
