Mark, The expression you suggested seems to be working. I don't think the file names I'm working with will have a comma, so this should be a good solution.
Joe, Are you referring to the ExecuteScript processor? That looks like a good alternative. However, I couldn't find much information for it in the documentation (https://nifi.apache.org/docs.html). Is there anywhere I can find simple examples, especially in Javascript or Python? Thank you, Michael On Wed, Aug 10, 2016 at 3:44 PM, Joe Witt <[email protected]> wrote: > Probably a good idea to use a script in a script processor to extract > the details needed about the splits then feed those results into merge > attribute as you suggested. This would be the safest/cleanest. > > On Wed, Aug 10, 2016 at 3:42 PM, Mark Payne <[email protected]> wrote: > > Michael, > > > > Well, sort of... > > > > You could use: > > ${allDelineatedValues('${fileArray}', ','):count()} > > > > So that will split up the fileArray attribute by commas and then count > them. > > The only issue is that if you were to have a filename with a comma in it, > > you'd get the wrong value. Given that your'e not likely to have a > filename > > with a comma, you may be all right, but it's not really the "cleanest" > > solution... > > > > The Expression language does allow you to evaluate JSONPath against an > > attribute but JSONPath doesn't allow for the nice functions that you can > get > > in XPath and similar. > > > > Anyone else have any better ideas? > > > > > > On Aug 10, 2016, at 3:32 PM, Michael Xu <[email protected]> wrote: > > > > Hi Mark, > > > > Thanks for your response earlier. While trying to implement what you > > suggested in your email, I came across the issue of updating > fragment.count > > on a per-flowfile basis. I have another attribute called fileArray, > which is > > a json-compatible array that contains all the files for a particular > > groupId. As an example taken from the Bulletin: > > > > Key: 'fileArray' > > Value: '["file1.txt","file2.txt","file3.txt"]' > > > > Is it possible in UpdateAttribute to use the Expression Language to > return > > the length of this array? > > > > Thanks for your help, > > Michael > > > > On Wed, Aug 10, 2016 at 11:00 AM, Mark Payne <[email protected]> > wrote: > >> > >> Michael, > >> > >> In the MergeContent processor, you can set the "Merge Strategy" to > >> "Defragment." This will tell Merge Content to > >> determine its bin thresholds based on the following FlowFile attributes: > >> > >> fragment.identifier > >> fragment.index > >> fragment.count > >> > >> So you'd need to set those 3 attributes on each of the FlowFiles. Rather > >> than using the Correlation Attribute Name, > >> you'd set the "fragment.identifier" attribute (you can use > UpdateAttribute > >> to copy the value from the groupId attribute > >> to the 'fragment.identifier' attribute if you need to). > >> > >> The "fragment.index" attribute tells MergeContent how to order the > >> different FlowFiles in the merged bin. > >> > >> The "fragment.count" attribute tells MergeContent how many FlowFiles go > >> this bin. > >> > >> Does that all make sense? > >> > >> Thanks > >> -Mark > >> > >> > >> On Aug 10, 2016, at 10:54 AM, Michael Xu <[email protected]> wrote: > >> > >> I am sending into the MergeContent processor, payloads that each belong > in > >> a certain group of files in some data I'm working with. Each payload > has an > >> attribute called "groupId" which is an identification number for a > >> particular group of files. This is the attribute I'm using to bin each > >> incoming flowfile, and have set the Correlation Attribute Name to > groupId. > >> > >> > >> > >> The problem I'm dealing with right now is that each groupId has a > varying > >> number of files associated with it. As such, I'm not sure how in NiFi to > >> detect when the MergeContent processor has received all files for a > >> particular groupId, and once done, release the bin. > >> > >> > >> > >> Any help with this problem is appreciated, thanks! > >> > >> > > > > >
