Matt, Thank you for those links, they should give me a good starting point.
Michael On Wed, Aug 10, 2016 at 4:21 PM, Matt Burgess <[email protected]> wrote: > Michael, > > There are a handful of examples of ExecuteScript using Javascript > and/or Jython, on my blog (http://funnifi.blogspot.com) and other > locations: > > Javascript: > http://funnifi.blogspot.com/2016/03/executescript-json-to- > json-revisited.html > https://mail-archives.apache.org/mod_mbox/nifi-users/ > 201603.mbox/%3CCALhfc-WwqmZ7RMkRt2qxfgDnH1feHdM0o_ > [email protected]%3E > > Jython: > http://funnifi.blogspot.com/2016/03/executescript-json-to- > json-revisited_14.html > https://community.hortonworks.com/articles/35568/python- > script-in-nifi.html > https://mail-archives.apache.org/mod_mbox/nifi-users/ > 201602.mbox/%3CCAEV8zdWm7_E-qC1KKHV8eW8CP0HZaEkwjC= > [email protected]%3E > > I'm happy to help get you going with a scripted solution if you like. > > Regards, > Matt > > On Wed, Aug 10, 2016 at 4:12 PM, Michael Xu <[email protected]> wrote: > > Mark, > > The expression you suggested seems to be working. I don't think the file > > names I'm working with will have a comma, so this should be a good > solution. > > > > Joe, > > Are you referring to the ExecuteScript processor? That looks like a good > > alternative. However, I couldn't find much information for it in the > > documentation (https://nifi.apache.org/docs.html). Is there anywhere I > can > > find simple examples, especially in Javascript or Python? > > > > Thank you, > > Michael > > > > On Wed, Aug 10, 2016 at 3:44 PM, Joe Witt <[email protected]> wrote: > >> > >> Probably a good idea to use a script in a script processor to extract > >> the details needed about the splits then feed those results into merge > >> attribute as you suggested. This would be the safest/cleanest. > >> > >> On Wed, Aug 10, 2016 at 3:42 PM, Mark Payne <[email protected]> > wrote: > >> > Michael, > >> > > >> > Well, sort of... > >> > > >> > You could use: > >> > ${allDelineatedValues('${fileArray}', ','):count()} > >> > > >> > So that will split up the fileArray attribute by commas and then count > >> > them. > >> > The only issue is that if you were to have a filename with a comma in > >> > it, > >> > you'd get the wrong value. Given that your'e not likely to have a > >> > filename > >> > with a comma, you may be all right, but it's not really the "cleanest" > >> > solution... > >> > > >> > The Expression language does allow you to evaluate JSONPath against an > >> > attribute but JSONPath doesn't allow for the nice functions that you > can > >> > get > >> > in XPath and similar. > >> > > >> > Anyone else have any better ideas? > >> > > >> > > >> > On Aug 10, 2016, at 3:32 PM, Michael Xu <[email protected]> > wrote: > >> > > >> > Hi Mark, > >> > > >> > Thanks for your response earlier. While trying to implement what you > >> > suggested in your email, I came across the issue of updating > >> > fragment.count > >> > on a per-flowfile basis. I have another attribute called fileArray, > >> > which is > >> > a json-compatible array that contains all the files for a particular > >> > groupId. As an example taken from the Bulletin: > >> > > >> > Key: 'fileArray' > >> > Value: '["file1.txt","file2.txt","file3.txt"]' > >> > > >> > Is it possible in UpdateAttribute to use the Expression Language to > >> > return > >> > the length of this array? > >> > > >> > Thanks for your help, > >> > Michael > >> > > >> > On Wed, Aug 10, 2016 at 11:00 AM, Mark Payne <[email protected]> > >> > wrote: > >> >> > >> >> Michael, > >> >> > >> >> In the MergeContent processor, you can set the "Merge Strategy" to > >> >> "Defragment." This will tell Merge Content to > >> >> determine its bin thresholds based on the following FlowFile > >> >> attributes: > >> >> > >> >> fragment.identifier > >> >> fragment.index > >> >> fragment.count > >> >> > >> >> So you'd need to set those 3 attributes on each of the FlowFiles. > >> >> Rather > >> >> than using the Correlation Attribute Name, > >> >> you'd set the "fragment.identifier" attribute (you can use > >> >> UpdateAttribute > >> >> to copy the value from the groupId attribute > >> >> to the 'fragment.identifier' attribute if you need to). > >> >> > >> >> The "fragment.index" attribute tells MergeContent how to order the > >> >> different FlowFiles in the merged bin. > >> >> > >> >> The "fragment.count" attribute tells MergeContent how many FlowFiles > go > >> >> this bin. > >> >> > >> >> Does that all make sense? > >> >> > >> >> Thanks > >> >> -Mark > >> >> > >> >> > >> >> On Aug 10, 2016, at 10:54 AM, Michael Xu <[email protected]> > wrote: > >> >> > >> >> I am sending into the MergeContent processor, payloads that each > belong > >> >> in > >> >> a certain group of files in some data I'm working with. Each payload > >> >> has an > >> >> attribute called "groupId" which is an identification number for a > >> >> particular group of files. This is the attribute I'm using to bin > each > >> >> incoming flowfile, and have set the Correlation Attribute Name to > >> >> groupId. > >> >> > >> >> > >> >> > >> >> The problem I'm dealing with right now is that each groupId has a > >> >> varying > >> >> number of files associated with it. As such, I'm not sure how in NiFi > >> >> to > >> >> detect when the MergeContent processor has received all files for a > >> >> particular groupId, and once done, release the bin. > >> >> > >> >> > >> >> > >> >> Any help with this problem is appreciated, thanks! > >> >> > >> >> > >> > > >> > > > > > >
