Thank you Matt. Understood. Thanks again for taking the time to reply to my
questions. -Jim

On Mon, Aug 20, 2018 at 9:13 AM, Matt Burgess <[email protected]> wrote:

> Jim,
>
> If you know all the possible fields that can occur, you can create a
> schema that contains the three mandatory fields and include all the
> others as "optional", this is done by setting the type of the field to
> ["null", <data type>]. This can even be done for the lookup field so
> you can inherit the record schema in the record writer (so you don't
> have to add it by hand in the writer).
>
> If you won't know all the fields, UpdateRecord doesn't currently alter
> the schema to add the field(s), although there is a Jira to cover the
> improvement [1].
>
> Regards,
> Matt
>
> [1] https://issues.apache.org/jira/browse/NIFI-5524
>
> On Sat, Aug 18, 2018 at 7:43 AM James McMahon <[email protected]>
> wrote:
> >
> > I do have a follow-up question. In my example I have oversimplified the
> structure. In my production space I have two complicating factors: the
> number of fields can vary, and only three fields are mandatory and so must
> be there. And the fields order can vary: the messages posted to the queue
> that we consume from have no requirement to enforce the order of the
> fields. All I know is that I will have my three guaranteed fields. Can
> UpdateRecord still be used, referencing the three fields explicitly,
> telling it to put my new field(s) after one of those where ever it may be
> in the object, and indicating it should then include all other keys/values
> in the object?
> >
> > On Fri, Aug 17, 2018 at 4:24 PM, Matt Burgess <[email protected]>
> wrote:
> >>
> >> Jim,
> >>
> >> You can use UpdateRecord for this, your input schema would have "last"
> >> and "first" in it (and I think you can have an optional "myKey" field
> >> so you can use the same schema for the writer), and the output schema
> >> would have all three fields in it. Then you'd set the Replacement
> >> Value Strategy to "Literal Value" and add a user-defined property in
> >> UpdateRecord called "/myKey" set to "${myKey}". This will take the
> >> value from the attribute myKey and put it at the root of each record
> >> in a field called myKey.  Since this is JSON, you could do the same
> >> with JoltTransformJSON, with a Default spec setting "myKey":
> >> "${myKey}". Not sure which is faster in this case, since there appears
> >> to be a single record.
> >>
> >> This also works if there are multiple records in the flow file, as
> >> long as the myKey field is to have the same value for all records
> >> (since there is only one myKey attribute value for the whole flow
> >> file).  If there are multiple records and they each need, you have a
> >> "lookup" use case on your hands, where you'd want to match some value
> >> against some lookup service, and it would fill in that field from the
> >> value supplied by the lookup service (you'd use LookupService for
> >> this). Or if all else fails, there is the Split pattern if you truly
> >> do want/need to process one JSON object at a time.
> >>
> >> Regards,
> >> Matt
> >>
> >> On Fri, Aug 17, 2018 at 4:06 PM James McMahon <[email protected]>
> wrote:
> >> >
> >> > I do appreciate your point, Tim and Lee. What if I do this instead:
> append select attributes to my data payload. Would that minimize the impact
> on RAM? Can I do that?
> >> >
> >> > More specifically, my data payload is a string representation of a
> JSON object, like so:
> >> > {"last":"manson","first":"marilyn"}
> >> > and I have an attribute named myKey that contains the value "123abc"
> >> >
> >> > Is there a processor that allows me to wind up with this string
> representation of JSON:
> >> > {"last":"manson","first":"marilyn", "myKey":"123abc"}
> >> >
> >> > If I could do that, I could avoid loading the entire data payload
> into an attribute, and manipulate them in a python script called by
> ExecuteScript. I know how to do that, I don't know how to do the above with
> native processors.
> >> > Thanks in advance for your help.
> >> >
> >> > On Fri, Aug 17, 2018 at 2:02 PM, Lee Laim <[email protected]> wrote:
> >> >>
> >> >> Jim,
> >> >> I think the ExtractText processor with a large enough
> MaxCaptureGroup length (default :1024) will do that.      Though, I share
> Tim’s concerns when you scale up
> >> >> Thanks,
> >> >> Lee
> >> >>
> >> >>
> >> >> > On Aug 17, 2018, at 11:52 AM, Timothy Tschampel <tim.tschampel@
> vivacehealthsolutions.com> wrote:
> >> >> >
> >> >> >
> >> >> > This may not be applicable to your use case depending on message
> volume / # of attributes; but I would avoid putting payloads into
> attributes for scalability reasons (especially RAM usage).
> >> >> >
> >> >> >
> >> >> >> On Aug 17, 2018, at 10:47 AM, James McMahon <[email protected]>
> wrote:
> >> >> >>
> >> >> >> I have flowfiles with data payloads that represent small strings
> of text (messages consumed from AMQP queues). I want to create an attribute
> that holds the entire payload for downstream use. How can I capture the
> entire data payload of a flowfile in a new attribute on the flowfile? Thank
> you in advance for your help. -Jim
> >> >> >
> >> >
> >> >
> >
> >
>

Reply via email to