Charlie,

As you mentioned, there have been several others asking about how Merge Content 
is making the determination
that a bin is full. I created a ticket [1] to add this information to the 
Provenance Event generated by Merge Content.
This way, it should be much more obvious exactly why each bin is being merged.

Thanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-1232 
<https://issues.apache.org/jira/browse/NIFI-1232>




> On Nov 30, 2015, at 3:11 PM, Mark Payne <[email protected]> wrote:
> 
> Charlie,
> 
> One thing that you should note, specifically when using the Correlation 
> Attribute is the <Maximum number of Bins> property. If the value that
> you are using for the Correlation Attribute varies quite a bit, you could 
> quickly fill up the default number of bins (100). In this case, it won't be
> able to add a FlowFile to any of the bins until the timeout occurs and as a 
> result it will immediately evict the oldest bin. 
> 
> Thanks
> -Mark
> 
> 
> 
>> On Nov 30, 2015, at 3:05 PM, Charlie Frasure <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Joe,
>> 
>> Thanks for checking in.  I tried it again and noticed that the correlation 
>> attribute in MergeContent doesn't accept expressions.  I was attempting to 
>> combine multiple attributes to define a bin, so I moved that expression to 
>> an earlier UpdateAttribute process which seemed to resolve my issue.
>> 
>> Now I'm dealing with bins being released before I think they should, but it 
>> seems that there's been other people with the same problem that must've been 
>> resolved, so I'll poke on that a bit more before posting.
>> 
>> Thanks,
>> Charlie
>>  
>> 
>> 
>> 
>> On Mon, Nov 30, 2015 at 1:21 PM, Joe Percivall <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hello Charlie,
>> 
>> Sorry no one has gotten back to you yet, everyone is busy getting 0.4.0 
>> finished up and of course Thanksgiving. Have you made any more progress?
>> 
>> 
>> Since it is a continuous task it is well within NiFi's wheelhouse. In your 
>> original message you mentioned that you already had them merged in to single 
>> flowfile but just had trouble creating the path to do a PutFile. Have you 
>> tried using expression language [1] to create the path? Assuming you have 
>> attributes for the category and date you should be able to create an 
>> expression language expression which properly evaluates to what you need.
>> 
>> If you need help with creating the proper expression, just reply with the 
>> attribute names for the category and dates and I'd be happy to help.
>> 
>> [1] 
>> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html 
>> <https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html>
>> 
>> Joe
>> - - - - - -
>> Joseph Percivall
>> linkedin.com/in/Percivall <http://linkedin.com/in/Percivall>
>> e: [email protected] <mailto:[email protected]>
>> 
>> 
>> 
>> 
>> On Monday, November 23, 2015 11:37 AM, Charlie Frasure 
>> <[email protected] <mailto:[email protected]>> wrote:
>> 
>> 
>> 
>> Joe,
>> 
>> This is a continuous task.  The main intent is to keep a version of the file 
>> prior to conversions etc.  Ideally, it would be highly compressed, and easy 
>> to locate.  Best case scenario, the archive files are the contents of highly 
>> structured nested directories.  File sizes range from a few bytes to < 1GB.  
>> It wouldn't have to run real time (updating archives seems to be a fairly 
>> intensive task), but would probably run at least every few days.
>> 
>> Thanks,
>> Charlie
>> 
>> 
>> 
>> 
>> 
>> On Mon, Nov 23, 2015 at 11:08 AM, Joe Witt <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Charlie,
>> >
>> >Can give some pointers on how to get in the ballpark with this but
>> >want to make sure we have a good alignment of purpose here.  NiFi has
>> >from time to time come up as an intuitive way to build an archive
>> >management tool and it is always "not quite right" because of the
>> >subtle differences between continuous streams of information and
>> >ad-hoc sort of one-time tasks.
>> >
>> >Would this be a continuous task (always running) even if it is slow
>> >(every few minutes, hours, days) or would it be a one-time thing to
>> >move a bunch of data from one place to another?
>> >
>> >The difference sounds very minor but it will help me to understand how
>> >best to respond.
>> >
>> >Thanks
>> >Joe
>> >
>> >
>> >On Mon, Nov 23, 2015 at 10:54 AM, Charlie Frasure
>> ><[email protected] <mailto:[email protected]>> wrote:
>> >> Use case: Archive and compress files by category and month, store like 
>> >> files
>> >> in a common directory.
>> >>
>> >> I'm already processing the files, and have extracted the interesting
>> >> attributes from each.  I ran them through MergeContent, but have not been
>> >> able to produce a logical directory structure to store the results.  I 
>> >> would
>> >> prefer something like archive/categoryA/201511/somefilename.tar.gz where
>> >> somefilename is made up of all the categoryA files received in November
>> >> 2015.
>> >>
>> >> I switched gears, and used PutFile to store the files in the preferred
>> >> directory structure, but at a loss of how to archive them within their
>> >> folders given hundreds of dynamic categories, and date additions every
>> >> month.
>> >>
>> >> I'm playing with MergeContent's Correlation Attribute Name, but am also
>> >> considering trying the "Degfragment" merge strategy by correlating the 
>> >> files
>> >> earlier in the process.
>> >>
>> >> Any suggestions would be appreciated.
>> >
>> 
> 

Reply via email to