Thank you Matt. Understood. Thanks again for taking the time to reply to my questions. -Jim
On Mon, Aug 20, 2018 at 9:13 AM, Matt Burgess <[email protected]> wrote: > Jim, > > If you know all the possible fields that can occur, you can create a > schema that contains the three mandatory fields and include all the > others as "optional", this is done by setting the type of the field to > ["null", <data type>]. This can even be done for the lookup field so > you can inherit the record schema in the record writer (so you don't > have to add it by hand in the writer). > > If you won't know all the fields, UpdateRecord doesn't currently alter > the schema to add the field(s), although there is a Jira to cover the > improvement [1]. > > Regards, > Matt > > [1] https://issues.apache.org/jira/browse/NIFI-5524 > > On Sat, Aug 18, 2018 at 7:43 AM James McMahon <[email protected]> > wrote: > > > > I do have a follow-up question. In my example I have oversimplified the > structure. In my production space I have two complicating factors: the > number of fields can vary, and only three fields are mandatory and so must > be there. And the fields order can vary: the messages posted to the queue > that we consume from have no requirement to enforce the order of the > fields. All I know is that I will have my three guaranteed fields. Can > UpdateRecord still be used, referencing the three fields explicitly, > telling it to put my new field(s) after one of those where ever it may be > in the object, and indicating it should then include all other keys/values > in the object? > > > > On Fri, Aug 17, 2018 at 4:24 PM, Matt Burgess <[email protected]> > wrote: > >> > >> Jim, > >> > >> You can use UpdateRecord for this, your input schema would have "last" > >> and "first" in it (and I think you can have an optional "myKey" field > >> so you can use the same schema for the writer), and the output schema > >> would have all three fields in it. Then you'd set the Replacement > >> Value Strategy to "Literal Value" and add a user-defined property in > >> UpdateRecord called "/myKey" set to "${myKey}". This will take the > >> value from the attribute myKey and put it at the root of each record > >> in a field called myKey. Since this is JSON, you could do the same > >> with JoltTransformJSON, with a Default spec setting "myKey": > >> "${myKey}". Not sure which is faster in this case, since there appears > >> to be a single record. > >> > >> This also works if there are multiple records in the flow file, as > >> long as the myKey field is to have the same value for all records > >> (since there is only one myKey attribute value for the whole flow > >> file). If there are multiple records and they each need, you have a > >> "lookup" use case on your hands, where you'd want to match some value > >> against some lookup service, and it would fill in that field from the > >> value supplied by the lookup service (you'd use LookupService for > >> this). Or if all else fails, there is the Split pattern if you truly > >> do want/need to process one JSON object at a time. > >> > >> Regards, > >> Matt > >> > >> On Fri, Aug 17, 2018 at 4:06 PM James McMahon <[email protected]> > wrote: > >> > > >> > I do appreciate your point, Tim and Lee. What if I do this instead: > append select attributes to my data payload. Would that minimize the impact > on RAM? Can I do that? > >> > > >> > More specifically, my data payload is a string representation of a > JSON object, like so: > >> > {"last":"manson","first":"marilyn"} > >> > and I have an attribute named myKey that contains the value "123abc" > >> > > >> > Is there a processor that allows me to wind up with this string > representation of JSON: > >> > {"last":"manson","first":"marilyn", "myKey":"123abc"} > >> > > >> > If I could do that, I could avoid loading the entire data payload > into an attribute, and manipulate them in a python script called by > ExecuteScript. I know how to do that, I don't know how to do the above with > native processors. > >> > Thanks in advance for your help. > >> > > >> > On Fri, Aug 17, 2018 at 2:02 PM, Lee Laim <[email protected]> wrote: > >> >> > >> >> Jim, > >> >> I think the ExtractText processor with a large enough > MaxCaptureGroup length (default :1024) will do that. Though, I share > Tim’s concerns when you scale up > >> >> Thanks, > >> >> Lee > >> >> > >> >> > >> >> > On Aug 17, 2018, at 11:52 AM, Timothy Tschampel <tim.tschampel@ > vivacehealthsolutions.com> wrote: > >> >> > > >> >> > > >> >> > This may not be applicable to your use case depending on message > volume / # of attributes; but I would avoid putting payloads into > attributes for scalability reasons (especially RAM usage). > >> >> > > >> >> > > >> >> >> On Aug 17, 2018, at 10:47 AM, James McMahon <[email protected]> > wrote: > >> >> >> > >> >> >> I have flowfiles with data payloads that represent small strings > of text (messages consumed from AMQP queues). I want to create an attribute > that holds the entire payload for downstream use. How can I capture the > entire data payload of a flowfile in a new attribute on the flowfile? Thank > you in advance for your help. -Jim > >> >> > > >> > > >> > > > > > >
