Hi,

 

We just came across NIFI as a possible option for backing up our data lake 
periodically into S3. We have our pipelines that dump batches of data at some 
granularity. For example, our one-minute dumps are of the form “201807210617”, 
“201807210618”, “201807210619” etc. We are looking for a simple configuration 
based solution that reads these incoming batches periodically and creates a 
workflow for backing these up. Also, these batches have a “success” marker 
inside them that indicates that the batches are full and ready to be backed up. 
We came across the ListHDFS processor that can do this, without duplication, 
but we are not sure if it has the ability to only copy batches that have a 
particular state (that is, like having a success marker in them). We are not 
sure if it also works on “folders” and not files directly. 

 

Can I get some recommendations on whether NIFI can be used at for such a 
ingestion use-case/alternative? Thank you.

 

Kind Regards,

Sudhindra. 

Reply via email to