I have a data flow that takes delimited input using GetFile, extracts some of that into attributes, converts the attributes to a JSON object, reformats the JSON using the Jolt transformer, and then does additional processing before using PutFile to move the original file based on the dataflow result. I have to work around NiFi to make the last step happen.
I am setting the AttributesToJSON to replace the flowfile content because the Jolt transformer requires the JSON object to be in the flowfile content. There is no "original" relationship out of AttributesToJSON, so this data would be lost. I have the "Keep Source File" set to true on the GetFile, and then use PutFile with the filename to grab it later. This works for the most part, but under heavy data loads we have some errors trying to process a file more than once. I think we could resolve this by not keeping the source file, sending a duplicate of the content down another path and merging later. I want to explore the possibility of either 1) having an "original" relationship whenever the previous flowfile content is being modified or replaced, or 2) maintaining an "original" flowfile content alongside the working content so that it is easily available once the processing is complete. Am I missing a more direct way to process this data? Other thoughts? Thanks, Charlie
