Charlie, One thing that you should note, specifically when using the Correlation Attribute is the <Maximum number of Bins> property. If the value that you are using for the Correlation Attribute varies quite a bit, you could quickly fill up the default number of bins (100). In this case, it won't be able to add a FlowFile to any of the bins until the timeout occurs and as a result it will immediately evict the oldest bin.
Thanks -Mark > On Nov 30, 2015, at 3:05 PM, Charlie Frasure <[email protected]> wrote: > > Joe, > > Thanks for checking in. I tried it again and noticed that the correlation > attribute in MergeContent doesn't accept expressions. I was attempting to > combine multiple attributes to define a bin, so I moved that expression to an > earlier UpdateAttribute process which seemed to resolve my issue. > > Now I'm dealing with bins being released before I think they should, but it > seems that there's been other people with the same problem that must've been > resolved, so I'll poke on that a bit more before posting. > > Thanks, > Charlie > > > > > On Mon, Nov 30, 2015 at 1:21 PM, Joe Percivall <[email protected] > <mailto:[email protected]>> wrote: > Hello Charlie, > > Sorry no one has gotten back to you yet, everyone is busy getting 0.4.0 > finished up and of course Thanksgiving. Have you made any more progress? > > > Since it is a continuous task it is well within NiFi's wheelhouse. In your > original message you mentioned that you already had them merged in to single > flowfile but just had trouble creating the path to do a PutFile. Have you > tried using expression language [1] to create the path? Assuming you have > attributes for the category and date you should be able to create an > expression language expression which properly evaluates to what you need. > > If you need help with creating the proper expression, just reply with the > attribute names for the category and dates and I'd be happy to help. > > [1] > https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html > <https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html> > > Joe > - - - - - - > Joseph Percivall > linkedin.com/in/Percivall <http://linkedin.com/in/Percivall> > e: [email protected] <mailto:[email protected]> > > > > > On Monday, November 23, 2015 11:37 AM, Charlie Frasure > <[email protected] <mailto:[email protected]>> wrote: > > > > Joe, > > This is a continuous task. The main intent is to keep a version of the file > prior to conversions etc. Ideally, it would be highly compressed, and easy > to locate. Best case scenario, the archive files are the contents of highly > structured nested directories. File sizes range from a few bytes to < 1GB. > It wouldn't have to run real time (updating archives seems to be a fairly > intensive task), but would probably run at least every few days. > > Thanks, > Charlie > > > > > > On Mon, Nov 23, 2015 at 11:08 AM, Joe Witt <[email protected] > <mailto:[email protected]>> wrote: > > Charlie, > > > >Can give some pointers on how to get in the ballpark with this but > >want to make sure we have a good alignment of purpose here. NiFi has > >from time to time come up as an intuitive way to build an archive > >management tool and it is always "not quite right" because of the > >subtle differences between continuous streams of information and > >ad-hoc sort of one-time tasks. > > > >Would this be a continuous task (always running) even if it is slow > >(every few minutes, hours, days) or would it be a one-time thing to > >move a bunch of data from one place to another? > > > >The difference sounds very minor but it will help me to understand how > >best to respond. > > > >Thanks > >Joe > > > > > >On Mon, Nov 23, 2015 at 10:54 AM, Charlie Frasure > ><[email protected] <mailto:[email protected]>> wrote: > >> Use case: Archive and compress files by category and month, store like > >> files > >> in a common directory. > >> > >> I'm already processing the files, and have extracted the interesting > >> attributes from each. I ran them through MergeContent, but have not been > >> able to produce a logical directory structure to store the results. I > >> would > >> prefer something like archive/categoryA/201511/somefilename.tar.gz where > >> somefilename is made up of all the categoryA files received in November > >> 2015. > >> > >> I switched gears, and used PutFile to store the files in the preferred > >> directory structure, but at a loss of how to archive them within their > >> folders given hundreds of dynamic categories, and date additions every > >> month. > >> > >> I'm playing with MergeContent's Correlation Attribute Name, but am also > >> considering trying the "Degfragment" merge strategy by correlating the > >> files > >> earlier in the process. > >> > >> Any suggestions would be appreciated. > > >
