Charlie, As you mentioned, there have been several others asking about how Merge Content is making the determination that a bin is full. I created a ticket [1] to add this information to the Provenance Event generated by Merge Content. This way, it should be much more obvious exactly why each bin is being merged.
Thanks -Mark [1] https://issues.apache.org/jira/browse/NIFI-1232 <https://issues.apache.org/jira/browse/NIFI-1232> > On Nov 30, 2015, at 3:11 PM, Mark Payne <[email protected]> wrote: > > Charlie, > > One thing that you should note, specifically when using the Correlation > Attribute is the <Maximum number of Bins> property. If the value that > you are using for the Correlation Attribute varies quite a bit, you could > quickly fill up the default number of bins (100). In this case, it won't be > able to add a FlowFile to any of the bins until the timeout occurs and as a > result it will immediately evict the oldest bin. > > Thanks > -Mark > > > >> On Nov 30, 2015, at 3:05 PM, Charlie Frasure <[email protected] >> <mailto:[email protected]>> wrote: >> >> Joe, >> >> Thanks for checking in. I tried it again and noticed that the correlation >> attribute in MergeContent doesn't accept expressions. I was attempting to >> combine multiple attributes to define a bin, so I moved that expression to >> an earlier UpdateAttribute process which seemed to resolve my issue. >> >> Now I'm dealing with bins being released before I think they should, but it >> seems that there's been other people with the same problem that must've been >> resolved, so I'll poke on that a bit more before posting. >> >> Thanks, >> Charlie >> >> >> >> >> On Mon, Nov 30, 2015 at 1:21 PM, Joe Percivall <[email protected] >> <mailto:[email protected]>> wrote: >> Hello Charlie, >> >> Sorry no one has gotten back to you yet, everyone is busy getting 0.4.0 >> finished up and of course Thanksgiving. Have you made any more progress? >> >> >> Since it is a continuous task it is well within NiFi's wheelhouse. In your >> original message you mentioned that you already had them merged in to single >> flowfile but just had trouble creating the path to do a PutFile. Have you >> tried using expression language [1] to create the path? Assuming you have >> attributes for the category and date you should be able to create an >> expression language expression which properly evaluates to what you need. >> >> If you need help with creating the proper expression, just reply with the >> attribute names for the category and dates and I'd be happy to help. >> >> [1] >> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html >> <https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html> >> >> Joe >> - - - - - - >> Joseph Percivall >> linkedin.com/in/Percivall <http://linkedin.com/in/Percivall> >> e: [email protected] <mailto:[email protected]> >> >> >> >> >> On Monday, November 23, 2015 11:37 AM, Charlie Frasure >> <[email protected] <mailto:[email protected]>> wrote: >> >> >> >> Joe, >> >> This is a continuous task. The main intent is to keep a version of the file >> prior to conversions etc. Ideally, it would be highly compressed, and easy >> to locate. Best case scenario, the archive files are the contents of highly >> structured nested directories. File sizes range from a few bytes to < 1GB. >> It wouldn't have to run real time (updating archives seems to be a fairly >> intensive task), but would probably run at least every few days. >> >> Thanks, >> Charlie >> >> >> >> >> >> On Mon, Nov 23, 2015 at 11:08 AM, Joe Witt <[email protected] >> <mailto:[email protected]>> wrote: >> >> Charlie, >> > >> >Can give some pointers on how to get in the ballpark with this but >> >want to make sure we have a good alignment of purpose here. NiFi has >> >from time to time come up as an intuitive way to build an archive >> >management tool and it is always "not quite right" because of the >> >subtle differences between continuous streams of information and >> >ad-hoc sort of one-time tasks. >> > >> >Would this be a continuous task (always running) even if it is slow >> >(every few minutes, hours, days) or would it be a one-time thing to >> >move a bunch of data from one place to another? >> > >> >The difference sounds very minor but it will help me to understand how >> >best to respond. >> > >> >Thanks >> >Joe >> > >> > >> >On Mon, Nov 23, 2015 at 10:54 AM, Charlie Frasure >> ><[email protected] <mailto:[email protected]>> wrote: >> >> Use case: Archive and compress files by category and month, store like >> >> files >> >> in a common directory. >> >> >> >> I'm already processing the files, and have extracted the interesting >> >> attributes from each. I ran them through MergeContent, but have not been >> >> able to produce a logical directory structure to store the results. I >> >> would >> >> prefer something like archive/categoryA/201511/somefilename.tar.gz where >> >> somefilename is made up of all the categoryA files received in November >> >> 2015. >> >> >> >> I switched gears, and used PutFile to store the files in the preferred >> >> directory structure, but at a loss of how to archive them within their >> >> folders given hundreds of dynamic categories, and date additions every >> >> month. >> >> >> >> I'm playing with MergeContent's Correlation Attribute Name, but am also >> >> considering trying the "Degfragment" merge strategy by correlating the >> >> files >> >> earlier in the process. >> >> >> >> Any suggestions would be appreciated. >> > >> >
