Hi,
I'm trying to replicate a number of tables from one database to another. I'd
like the flow to take care of both the DDL ("create table if not exists ...")
and DML ("insert") commands automatically. Ideally, the "create table" should
be executed just once, before any insert for that same table is executed.
I can use a Distributed Map Cache to know if a "create table" for each table
was already performed or not, but the problem is that I don't know how to hold
the "inserts" for that table until the "create table" is done.
I'm using "crate table if not exists as select * from ...", so I'm trying to
create the table and populate it at the same time with the data from that first
row. It's not a pure "create table" without data because I couldn't find any
processor that automatically maps the avro.schema to my database's DDL. I could
use ExecuteScript for that, and then use "create table if not exists <table
definition>", but how to avoid running the create for every single row (or even
for every flowfile containing many rows each)? It would be great if I could run
the "create table" just once per table, with or without data for the first
"batch".
It looks like a Setup task, if you know what I mean. I'm not sure if that is
something that would fit how NiFi works, though. Wait and Notify don't look
like an answer, either.
Probably I'd be better off considering the creation of the table structures as
a one-time configuration task performed before the flow is first executed, but
it would be cool to have everything automated using the same toolset, specially
considering that new tables could be created at any time. (You may assume mine
is a test database so I don't actually need or want to enforce a more strict
control on what is going to be created or not. I just want it there, and maybe
be notified when something new comes up.)
Suggestions?
Thank you,
Marcio