Alberto,

Since it sounds like you have control over the structure of the tables, this 
should be doable.

If you have a changelog table for each table this will probably be easier, and 
in your changelog table you’ll need to make sure you have a good transaction 
timestamp column and a change type column (I/U/D). Then use QueryDatabaseTable 
to tail your change log table, one copy of QueryDatabaseTable for each change 
table.

Now your changes are in easy to ingest Avro files. For HIVE I’d probably use an 
external table with the Avro schema, this makes it easy to use PutHDFS to load 
the file and make it accessible from HIVE. I haven’t used Phoenix, sorry.

If you have a single change table for all tables, then you can still use the 
above patter, but you’ll need a middle step where you extract and rebuild the 
changes. Maybe if you store the changes in JSON you could extract them using 
one of the Record parsers and then rebuild the data row. Much harder though.

Thanks,
  Peter

From: Alberto Bengoa [mailto:[email protected]]
Sent: Wednesday, December 06, 2017 06:24
To: [email protected]
Subject: [EXT] CDC like updates on Nifi

Hey folks,

I read about Nifi CDC processor for MySQL and other CDC "solutions" with Nifi 
found on Google, like these:

https://community.hortonworks.com/idea/53420/apache-nifi-processor-to-address-cdc-use-cases-for.html
https://community.hortonworks.com/questions/88686/change-data-capture-using-nifi-1.html
https://community.hortonworks.com/articles/113941/change-data-capture-cdc-with-apache-nifi-version-1-1.html

I'm trying a different approach to acquire fresh information from tables, using 
triggers on source database's tables to write changes to a "changelog table".

This is done, but my questions are:

Would Nifi be capable to read this tables, transform these data to generate a 
SQL equivalent query (insert/update/delete) to send to Hive and/or Phoenix with 
current available processors?

Which would be the best / suggested flow?

The objective is to keep tables on the Data Lake as up-to-date as possible for 
real time analyses.

Cheers,
Alberto

Reply via email to