Thanks Mark, that would be especially useful during development of a new flow, I believe. I decreased the timeouts and increased the max number of bins to get some of the files merging that were being binned individually.
On Mon, Nov 30, 2015 at 3:22 PM, Mark Payne <[email protected]> wrote: > Charlie, > > As you mentioned, there have been several others asking about how Merge > Content is making the determination > that a bin is full. I created a ticket [1] to add this information to the > Provenance Event generated by Merge Content. > This way, it should be much more obvious exactly why each bin is being > merged. > > Thanks > -Mark > > [1] https://issues.apache.org/jira/browse/NIFI-1232 > > > > > On Nov 30, 2015, at 3:11 PM, Mark Payne <[email protected]> wrote: > > Charlie, > > One thing that you should note, specifically when using the Correlation > Attribute is the <Maximum number of Bins> property. If the value that > you are using for the Correlation Attribute varies quite a bit, you could > quickly fill up the default number of bins (100). In this case, it won't be > able to add a FlowFile to any of the bins until the timeout occurs and as > a result it will immediately evict the oldest bin. > > Thanks > -Mark > > > > On Nov 30, 2015, at 3:05 PM, Charlie Frasure <[email protected]> > wrote: > > Joe, > > Thanks for checking in. I tried it again and noticed that the correlation > attribute in MergeContent doesn't accept expressions. I was attempting to > combine multiple attributes to define a bin, so I moved that expression to > an earlier UpdateAttribute process which seemed to resolve my issue. > > Now I'm dealing with bins being released before I think they should, but > it seems that there's been other people with the same problem that must've > been resolved, so I'll poke on that a bit more before posting. > > Thanks, > Charlie > > > > > On Mon, Nov 30, 2015 at 1:21 PM, Joe Percivall <[email protected]> > wrote: > >> Hello Charlie, >> >> Sorry no one has gotten back to you yet, everyone is busy getting 0.4.0 >> finished up and of course Thanksgiving. Have you made any more progress? >> >> >> Since it is a continuous task it is well within NiFi's wheelhouse. In >> your original message you mentioned that you already had them merged in to >> single flowfile but just had trouble creating the path to do a PutFile. >> Have you tried using expression language [1] to create the path? Assuming >> you have attributes for the category and date you should be able to create >> an expression language expression which properly evaluates to what you need. >> >> If you need help with creating the proper expression, just reply with the >> attribute names for the category and dates and I'd be happy to help. >> >> [1] >> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html >> >> Joe >> - - - - - - >> Joseph Percivall >> linkedin.com/in/Percivall >> e: [email protected] >> >> >> >> >> On Monday, November 23, 2015 11:37 AM, Charlie Frasure < >> [email protected]> wrote: >> >> >> >> Joe, >> >> This is a continuous task. The main intent is to keep a version of the >> file prior to conversions etc. Ideally, it would be highly compressed, and >> easy to locate. Best case scenario, the archive files are the contents of >> highly structured nested directories. File sizes range from a few bytes to >> < 1GB. It wouldn't have to run real time (updating archives seems to be a >> fairly intensive task), but would probably run at least every few days. >> >> Thanks, >> Charlie >> >> >> >> >> >> On Mon, Nov 23, 2015 at 11:08 AM, Joe Witt <[email protected]> wrote: >> >> Charlie, >> > >> >Can give some pointers on how to get in the ballpark with this but >> >want to make sure we have a good alignment of purpose here. NiFi has >> >from time to time come up as an intuitive way to build an archive >> >management tool and it is always "not quite right" because of the >> >subtle differences between continuous streams of information and >> >ad-hoc sort of one-time tasks. >> > >> >Would this be a continuous task (always running) even if it is slow >> >(every few minutes, hours, days) or would it be a one-time thing to >> >move a bunch of data from one place to another? >> > >> >The difference sounds very minor but it will help me to understand how >> >best to respond. >> > >> >Thanks >> >Joe >> > >> > >> >On Mon, Nov 23, 2015 at 10:54 AM, Charlie Frasure >> ><[email protected]> wrote: >> >> Use case: Archive and compress files by category and month, store like >> files >> >> in a common directory. >> >> >> >> I'm already processing the files, and have extracted the interesting >> >> attributes from each. I ran them through MergeContent, but have not >> been >> >> able to produce a logical directory structure to store the results. I >> would >> >> prefer something like archive/categoryA/201511/somefilename.tar.gz >> where >> >> somefilename is made up of all the categoryA files received in November >> >> 2015. >> >> >> >> I switched gears, and used PutFile to store the files in the preferred >> >> directory structure, but at a loss of how to archive them within their >> >> folders given hundreds of dynamic categories, and date additions every >> >> month. >> >> >> >> I'm playing with MergeContent's Correlation Attribute Name, but am also >> >> considering trying the "Degfragment" merge strategy by correlating the >> files >> >> earlier in the process. >> >> >> >> Any suggestions would be appreciated. >> > >> > > > >
