Hi,

I'm currently building a flow in Nifi and I'm trying to get the best way to
do it in a reliable manner:

The setup is the following:

1) Some files are copied in a folder in HDFS
2) An Hive external table point to this directory
3) The data of this table are then copied in an ORC table
4) The data from the folder are archived and compress in another folder

My first issue is that I cannot easily trigger an Insert SQL query from
Nifi. ExecuteSQL processor only execute SELECT query and not INSERT query.
I can of course Select all the data and bring them back in Nifi and then
use a PutSQL but as the data are going to be copied as is, it doesn't bring
any value.
My current solution is to rely on an external python script (using JDBC
from there) and use the ExecuteStreamCommand to trigger the insert from the
external table. It is not very elegant but it seems to work.

Now I have to ensure that the SQL query is successful before moving the
file to an other folder, otherwise I will end up with inconsistent data.
I'm currently using the GetHDFS/PutHDFS to move file around however it is
not possible to trigger the GetHDFS processor.

What will be the best strategy to move the HDFS file only if a previous
event is successful? Any recommendation?

Thanks for your help!

Regards,

Reply via email to