Thanks Joe, I'm using the HL7 processor to extract HL7v2 data to attributes, then mapping the attributes to expected JSON entries. I am using the Record reader/writers elsewhere, definitely the best thing that has happened to NiFi since bedtime stories [1]. So my current flow is:
GetFile (leave original file) -> ExtractHL7Attributes -> UpdateAttribute (for light conversions) -> AttributesToJSON (as flowfile-content) -> JoltTransformJSON (This could probably be replaced by record readers / writers) -> InvokeHTTP (call webservice) -> FetchFile (using filename attribute) There are some additional exception paths, but this flow works as intended except when the web service can't keep up with new files. I have a delay built in to GetFile to account for this, which mostly works, but sometimes we pull the same file more than once. I suppose I could also move the file to an interim folder to prevent multiple reads. Thanks, Charlie [1] https://community.hortonworks.com/articles/28380/nifi-ocr-using-apache-nifi-to-read-childrens-books.html On Tue, Sep 19, 2017 at 11:35 AM, Joe Witt <[email protected]> wrote: > Charlie > > You'll absolutely want to look at the Record reader/writer > capabilities. It will help you convert from the CSV (or similar) to > JSON without having to go through attributes at all. > > Take a look here > https://cwiki.apache.org/confluence/display/NIFI/ > Example+Dataflow+Templates > and you could see the provenance example for configuration. If you > want to share a sample line of the delimited data and a sample of the > output JSON I can share you back a template that would help you get > started. > > Thanks > Joe > > On Tue, Sep 19, 2017 at 11:29 AM, Charlie Frasure > <[email protected]> wrote: > > I have a data flow that takes delimited input using GetFile, extracts > some > > of that into attributes, converts the attributes to a JSON object, > reformats > > the JSON using the Jolt transformer, and then does additional processing > > before using PutFile to move the original file based on the dataflow > result. > > I have to work around NiFi to make the last step happen. > > > > I am setting the AttributesToJSON to replace the flowfile content because > > the Jolt transformer requires the JSON object to be in the flowfile > content. > > There is no "original" relationship out of AttributesToJSON, so this data > > would be lost. I have the "Keep Source File" set to true on the GetFile, > > and then use PutFile with the filename to grab it later. > > > > This works for the most part, but under heavy data loads we have some > errors > > trying to process a file more than once. > > > > I think we could resolve this by not keeping the source file, sending a > > duplicate of the content down another path and merging later. I want to > > explore the possibility of either 1) having an "original" relationship > > whenever the previous flowfile content is being modified or replaced, or > 2) > > maintaining an "original" flowfile content alongside the working content > so > > that it is easily available once the processing is complete. > > > > Am I missing a more direct way to process this data? Other thoughts? > > > > Thanks, > > Charlie > > > > > > > > >
