Thanks Joe,

I'm using the HL7 processor to extract HL7v2 data to attributes, then
mapping the attributes to expected JSON entries.  I am using the Record
reader/writers elsewhere, definitely the best thing that has happened to
NiFi since bedtime stories [1].
So my current flow is:

GetFile (leave original file) ->
ExtractHL7Attributes ->
UpdateAttribute (for light conversions) ->
AttributesToJSON (as flowfile-content) ->
JoltTransformJSON (This could probably be replaced by record readers /
writers) ->
InvokeHTTP (call webservice) ->
FetchFile (using filename attribute)

There are some additional exception paths, but this flow works as intended
except when the web service can't keep up with new files.  I have a delay
built in to GetFile to account for this, which mostly works, but sometimes
we pull the same file more than once.  I suppose I could also move the file
to an interim folder to prevent multiple reads.

Thanks,
Charlie


[1]
https://community.hortonworks.com/articles/28380/nifi-ocr-using-apache-nifi-to-read-childrens-books.html


On Tue, Sep 19, 2017 at 11:35 AM, Joe Witt <[email protected]> wrote:

> Charlie
>
> You'll absolutely want to look at the Record reader/writer
> capabilities.  It will help you convert from the CSV (or similar) to
> JSON without having to go through attributes at all.
>
> Take a look here
> https://cwiki.apache.org/confluence/display/NIFI/
> Example+Dataflow+Templates
> and you could see the provenance example for configuration.  If you
> want to share a sample line of the delimited data and a sample of the
> output JSON I can share you back a template that would help you get
> started.
>
> Thanks
> Joe
>
> On Tue, Sep 19, 2017 at 11:29 AM, Charlie Frasure
> <[email protected]> wrote:
> > I have a data flow that takes delimited input using GetFile, extracts
> some
> > of that into attributes, converts the attributes to a JSON object,
> reformats
> > the JSON using the Jolt transformer, and then does additional processing
> > before using PutFile to move the original file based on the dataflow
> result.
> > I have to work around NiFi to make the last step happen.
> >
> > I am setting the AttributesToJSON to replace the flowfile content because
> > the Jolt transformer requires the JSON object to be in the flowfile
> content.
> > There is no "original" relationship out of AttributesToJSON, so this data
> > would be lost.  I have the "Keep Source File" set to true on the GetFile,
> > and then use PutFile with the filename to grab it later.
> >
> > This works for the most part, but under heavy data loads we have some
> errors
> > trying to process a file more than once.
> >
> > I think we could resolve this by not keeping the source file, sending a
> > duplicate of the content down another path and merging later.  I want to
> > explore the possibility of either 1) having an "original" relationship
> > whenever the previous flowfile content is being modified or replaced, or
> 2)
> > maintaining an "original" flowfile content alongside the working content
> so
> > that it is easily available once the processing is complete.
> >
> > Am I missing a more direct way to process this data?  Other thoughts?
> >
> > Thanks,
> > Charlie
> >
> >
> >
> >
>

Reply via email to