Hi Elli,

The best way to frame it in its current implementation is that it is
primarily time window driven.  In terms of unsuccessful attempts, these are
documented as being attempts to put an item into a bin, with the
perspective that the respective bin may be too close to its max size to
accommodate additional flowfiles.

For your example, a merged file could be shipped at one minute when there
were 100 files (there could be fewer should the rest of the flowfiles to
merge be larger than the max size and it was not able to pack any
additional items over those successive tries).  However, the MergeContent
operates in a relaxed mode where, depending on when it was invoked, could
also ship a merged file as long as both minimums have been met.  This puts
emphasis largely on the minimum size and counts and the maximum age.  All
of this, is taken into account each time the processor executes and as many
flowfiles as possible are consumed from the input not to exceed the maximum
number of bins as per the processor property.

I think the processor's approach aligns with your concerns with the
failsafe being that no flowfiles, regardless of minimums will be stranded
beyond the max bin age. You could additionally prioritize older flow files
from the incoming connection to ensure they are pushed through first.

Let us know if you have additional questions or if I was off the mark from
what you were looking to tackle.

On Mon, Nov 23, 2015 at 12:09 PM, Elli Schwarz <eliezer_schw...@yahoo.com>
wrote:

> Thank you for your help. I'm still a bit confused: if a bin's min entries
> has been reached, but not the max age, wouldn't it immediately merge the
> flowfiles? You mention 5 successive unsuccessful attempts - but what would
> cause unsuccessful attempts after the min entries has been reached? I don't
> understand how the max entries property would ever come in to play.
>
> For example, if I have a min bin entries of 25, max of 100, and max age of
> 5 minutes. What scenario would cause 100 files to be included in a single
> merge content if I get 100 flowfiles hitting the processor within 1 minute?
> (If it matters, I'm using bin-packing algorithms and bin concatenation).
>
> I want to have a better understanding of the processor to adjust these
> settings to obtain the optimal configuration. What I would like the
> processor to do is, even after min entries has been reached, wait a certain
> period of time to see if there are more files. This is not max bin size - I
> view that as a failsafe mechanism so that something doesn't stay in the
> queue forever if the minimums are not reached.
>
> Thanks!
>
>
> On Thursday, November 19, 2015 8:44 AM, Aldrin Piri <aldrinp...@gmail.com>
> wrote:
>
>
>
> Elli,
>
> Your understanding of the functionality is correct. There are a couple of
> criteria that drive when a bin is "done." In this case, if you establish
> the optional maximum properties, these drive in closing out sooner.  That
> is if a max age is specified, and any of the bins have gone beyond that
> time, they will be closed and transferred out.
>
> Alternatively, a bin is also considered ready if the max age has not yet
> elapsed and:
>
> * both minimum size and minimum number of files has been reached and a few
> successive attempts to add to the bin (specifically, five) have been
> unsuccessful, signaling that it is nearly full or the objects are ill
> suited for tighter packing.
>
> * size or number of entries is greater than or equal to their respective,
> optionally, specified maximum
>
> Let us know if you have any other questions!
>
> On Thu, Nov 19, 2015 at 8:09 AM, Elli Schwarz <eliezer_schw...@yahoo.com>
> wrote:
>
> Hello,
>
> I'm a bit confused about the relationship of certain properties of the
> MergeContent processor. Specifically, how do the properties min entries,
> max entries, max bin age, max number of bins interact? If the MergeContent
> processor receives the min number of entries, does it merge without waiting
> for max bin age? Or does max bin age trump the other properties? If max bin
> age is hit before the min number of entries, does the processor wait until
> it gets the min number? Does it merge once it gets to the max bin age,
> regardless of whether or not the max entries has been received? What about
> min/max group size vs. min/max number of entries?
>
> I want to make sure that the processor isn't waiting forever (ie, will
> send after 10 minutes no matter what) if there's only 1 flowfile in the
> queue. If I set max bin age to 10 minutes, and min entries to 10, what does
> that mean, it seems to work the way I expect, which makes me wonder what
> does the min entries property mean if it doesn't seem to be used?
>
> Thank you for any clarifications possible. I looked through the
> documentation for this processor, but it doesn't seem to explain these
> crucial details, which greatly impact my strategy for using this processor
> properly.
>
> -Elli
>
>
>
>
>

Reply via email to