Hi,
We just came across NIFI as a possible option for backing up our data lake periodically into S3. We have our pipelines that dump batches of data at some granularity. For example, our one-minute dumps are of the form “201807210617”, “201807210618”, “201807210619” etc. We are looking for a simple configuration based solution that reads these incoming batches periodically and creates a workflow for backing these up. Also, these batches have a “success” marker inside them that indicates that the batches are full and ready to be backed up. We came across the ListHDFS processor that can do this, without duplication, but we are not sure if it has the ability to only copy batches that have a particular state (that is, like having a success marker in them). We are not sure if it also works on “folders” and not files directly. Can I get some recommendations on whether NIFI can be used at for such a ingestion use-case/alternative? Thank you. Kind Regards, Sudhindra.