Mark,
The expression you suggested seems to be working. I don't think the file
names I'm working with will have a comma, so this should be a good
solution.

Joe,
Are you referring to the ExecuteScript processor? That looks like a good
alternative. However, I couldn't find much information for it in the
documentation (https://nifi.apache.org/docs.html). Is there anywhere I can
find simple examples, especially in Javascript or Python?

Thank you,
Michael

On Wed, Aug 10, 2016 at 3:44 PM, Joe Witt <[email protected]> wrote:

> Probably a good idea to use a script in a script processor to extract
> the details needed about the splits then feed those results into merge
> attribute as you suggested.  This would be the safest/cleanest.
>
> On Wed, Aug 10, 2016 at 3:42 PM, Mark Payne <[email protected]> wrote:
> > Michael,
> >
> > Well, sort of...
> >
> > You could use:
> > ${allDelineatedValues('${fileArray}', ','):count()}
> >
> > So that will split up the fileArray attribute by commas and then count
> them.
> > The only issue is that if you were to have a filename with a comma in it,
> > you'd get the wrong value. Given that your'e not likely to have a
> filename
> > with a comma, you may be all right, but it's not really the "cleanest"
> > solution...
> >
> > The Expression language does allow you to evaluate JSONPath against an
> > attribute but JSONPath doesn't allow for the nice functions that you can
> get
> > in XPath and similar.
> >
> > Anyone else have any better ideas?
> >
> >
> > On Aug 10, 2016, at 3:32 PM, Michael Xu <[email protected]> wrote:
> >
> > Hi Mark,
> >
> > Thanks for your response earlier. While trying to implement what you
> > suggested in your email, I came across the issue of updating
> fragment.count
> > on a per-flowfile basis. I have another attribute called fileArray,
> which is
> > a json-compatible array that contains all the files for a particular
> > groupId. As an example taken from the Bulletin:
> >
> > Key: 'fileArray'
> >         Value: '["file1.txt","file2.txt","file3.txt"]'
> >
> > Is it possible in UpdateAttribute to use the Expression Language to
> return
> > the length of this array?
> >
> > Thanks for your help,
> > Michael
> >
> > On Wed, Aug 10, 2016 at 11:00 AM, Mark Payne <[email protected]>
> wrote:
> >>
> >> Michael,
> >>
> >> In the MergeContent processor, you can set the "Merge Strategy" to
> >> "Defragment." This will tell Merge Content to
> >> determine its bin thresholds based on the following FlowFile attributes:
> >>
> >> fragment.identifier
> >> fragment.index
> >> fragment.count
> >>
> >> So you'd need to set those 3 attributes on each of the FlowFiles. Rather
> >> than using the Correlation Attribute Name,
> >> you'd set the "fragment.identifier" attribute (you can use
> UpdateAttribute
> >> to copy the value from the groupId attribute
> >> to the 'fragment.identifier' attribute if you need to).
> >>
> >> The "fragment.index" attribute tells MergeContent how to order the
> >> different FlowFiles in the merged bin.
> >>
> >> The "fragment.count" attribute tells MergeContent how many FlowFiles go
> >> this bin.
> >>
> >> Does that all make sense?
> >>
> >> Thanks
> >> -Mark
> >>
> >>
> >> On Aug 10, 2016, at 10:54 AM, Michael Xu <[email protected]> wrote:
> >>
> >> I am sending into the MergeContent processor, payloads that each belong
> in
> >> a certain group of files in some data I'm working with. Each payload
> has an
> >> attribute called "groupId" which is an identification number for a
> >> particular group of files. This is the attribute I'm using to bin each
> >> incoming flowfile, and have set the Correlation Attribute Name to
> groupId.
> >>
> >>
> >>
> >> The problem I'm dealing with right now is that each groupId has a
> varying
> >> number of files associated with it. As such, I'm not sure how in NiFi to
> >> detect when the MergeContent processor has received all files for a
> >> particular groupId, and once done, release the bin.
> >>
> >>
> >>
> >> Any help with this problem is appreciated, thanks!
> >>
> >>
> >
> >
>

Reply via email to