Re: AttributesToJSON

Joe Witt Tue, 19 Sep 2017 11:31:18 -0700

Ha!  They are nearly as cool as nifi reading bedtime stories.  You
have a good point.


I was all happy we were about to make your flow far
better/faster/stronger.  Then you threw down with HL7.

We really need to make an HL7RecordReader then the rest of this would
be fast/fun.  Any volunteers?

Thanks

On Tue, Sep 19, 2017 at 2:09 PM, Charlie Frasure
<[email protected]> wrote:
> Thanks Joe,
>
> I'm using the HL7 processor to extract HL7v2 data to attributes, then
> mapping the attributes to expected JSON entries.  I am using the Record
> reader/writers elsewhere, definitely the best thing that has happened to
> NiFi since bedtime stories [1].
> So my current flow is:
>
> GetFile (leave original file) ->
> ExtractHL7Attributes ->
> UpdateAttribute (for light conversions) ->
> AttributesToJSON (as flowfile-content) ->
> JoltTransformJSON (This could probably be replaced by record readers /
> writers) ->
> InvokeHTTP (call webservice) ->
> FetchFile (using filename attribute)
>
> There are some additional exception paths, but this flow works as intended
> except when the web service can't keep up with new files.  I have a delay
> built in to GetFile to account for this, which mostly works, but sometimes
> we pull the same file more than once.  I suppose I could also move the file
> to an interim folder to prevent multiple reads.
>
> Thanks,
> Charlie
>
>
> [1]
> https://community.hortonworks.com/articles/28380/nifi-ocr-using-apache-nifi-to-read-childrens-books.html
>
>
> On Tue, Sep 19, 2017 at 11:35 AM, Joe Witt <[email protected]> wrote:
>>
>> Charlie
>>
>> You'll absolutely want to look at the Record reader/writer
>> capabilities.  It will help you convert from the CSV (or similar) to
>> JSON without having to go through attributes at all.
>>
>> Take a look here
>>
>> https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates
>> and you could see the provenance example for configuration.  If you
>> want to share a sample line of the delimited data and a sample of the
>> output JSON I can share you back a template that would help you get
>> started.
>>
>> Thanks
>> Joe
>>
>> On Tue, Sep 19, 2017 at 11:29 AM, Charlie Frasure
>> <[email protected]> wrote:
>> > I have a data flow that takes delimited input using GetFile, extracts
>> > some
>> > of that into attributes, converts the attributes to a JSON object,
>> > reformats
>> > the JSON using the Jolt transformer, and then does additional processing
>> > before using PutFile to move the original file based on the dataflow
>> > result.
>> > I have to work around NiFi to make the last step happen.
>> >
>> > I am setting the AttributesToJSON to replace the flowfile content
>> > because
>> > the Jolt transformer requires the JSON object to be in the flowfile
>> > content.
>> > There is no "original" relationship out of AttributesToJSON, so this
>> > data
>> > would be lost.  I have the "Keep Source File" set to true on the
>> > GetFile,
>> > and then use PutFile with the filename to grab it later.
>> >
>> > This works for the most part, but under heavy data loads we have some
>> > errors
>> > trying to process a file more than once.
>> >
>> > I think we could resolve this by not keeping the source file, sending a
>> > duplicate of the content down another path and merging later.  I want to
>> > explore the possibility of either 1) having an "original" relationship
>> > whenever the previous flowfile content is being modified or replaced, or
>> > 2)
>> > maintaining an "original" flowfile content alongside the working content
>> > so
>> > that it is easily available once the processing is complete.
>> >
>> > Am I missing a more direct way to process this data?  Other thoughts?
>> >
>> > Thanks,
>> > Charlie
>> >
>> >
>> >
>> >
>
>

Re: AttributesToJSON

Reply via email to