Hi, I'm currently building a flow in Nifi and I'm trying to get the best way to do it in a reliable manner:
The setup is the following: 1) Some files are copied in a folder in HDFS 2) An Hive external table point to this directory 3) The data of this table are then copied in an ORC table 4) The data from the folder are archived and compress in another folder My first issue is that I cannot easily trigger an Insert SQL query from Nifi. ExecuteSQL processor only execute SELECT query and not INSERT query. I can of course Select all the data and bring them back in Nifi and then use a PutSQL but as the data are going to be copied as is, it doesn't bring any value. My current solution is to rely on an external python script (using JDBC from there) and use the ExecuteStreamCommand to trigger the insert from the external table. It is not very elegant but it seems to work. Now I have to ensure that the SQL query is successful before moving the file to an other folder, otherwise I will end up with inconsistent data. I'm currently using the GetHDFS/PutHDFS to move file around however it is not possible to trigger the GetHDFS processor. What will be the best strategy to move the HDFS file only if a previous event is successful? Any recommendation? Thanks for your help! Regards,
