Thanks Mark, that would be especially useful during development of a new
flow, I believe.  I decreased the timeouts and increased the max number of
bins to get some of the files merging that were being binned individually.

On Mon, Nov 30, 2015 at 3:22 PM, Mark Payne <[email protected]> wrote:

> Charlie,
>
> As you mentioned, there have been several others asking about how Merge
> Content is making the determination
> that a bin is full. I created a ticket [1] to add this information to the
> Provenance Event generated by Merge Content.
> This way, it should be much more obvious exactly why each bin is being
> merged.
>
> Thanks
> -Mark
>
> [1] https://issues.apache.org/jira/browse/NIFI-1232
>
>
>
>
> On Nov 30, 2015, at 3:11 PM, Mark Payne <[email protected]> wrote:
>
> Charlie,
>
> One thing that you should note, specifically when using the Correlation
> Attribute is the <Maximum number of Bins> property. If the value that
> you are using for the Correlation Attribute varies quite a bit, you could
> quickly fill up the default number of bins (100). In this case, it won't be
> able to add a FlowFile to any of the bins until the timeout occurs and as
> a result it will immediately evict the oldest bin.
>
> Thanks
> -Mark
>
>
>
> On Nov 30, 2015, at 3:05 PM, Charlie Frasure <[email protected]>
> wrote:
>
> Joe,
>
> Thanks for checking in.  I tried it again and noticed that the correlation
> attribute in MergeContent doesn't accept expressions.  I was attempting to
> combine multiple attributes to define a bin, so I moved that expression to
> an earlier UpdateAttribute process which seemed to resolve my issue.
>
> Now I'm dealing with bins being released before I think they should, but
> it seems that there's been other people with the same problem that must've
> been resolved, so I'll poke on that a bit more before posting.
>
> Thanks,
> Charlie
>
>
>
>
> On Mon, Nov 30, 2015 at 1:21 PM, Joe Percivall <[email protected]>
> wrote:
>
>> Hello Charlie,
>>
>> Sorry no one has gotten back to you yet, everyone is busy getting 0.4.0
>> finished up and of course Thanksgiving. Have you made any more progress?
>>
>>
>> Since it is a continuous task it is well within NiFi's wheelhouse. In
>> your original message you mentioned that you already had them merged in to
>> single flowfile but just had trouble creating the path to do a PutFile.
>> Have you tried using expression language [1] to create the path? Assuming
>> you have attributes for the category and date you should be able to create
>> an expression language expression which properly evaluates to what you need.
>>
>> If you need help with creating the proper expression, just reply with the
>> attribute names for the category and dates and I'd be happy to help.
>>
>> [1]
>> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
>>
>> Joe
>> - - - - - -
>> Joseph Percivall
>> linkedin.com/in/Percivall
>> e: [email protected]
>>
>>
>>
>>
>> On Monday, November 23, 2015 11:37 AM, Charlie Frasure <
>> [email protected]> wrote:
>>
>>
>>
>> Joe,
>>
>> This is a continuous task.  The main intent is to keep a version of the
>> file prior to conversions etc.  Ideally, it would be highly compressed, and
>> easy to locate.  Best case scenario, the archive files are the contents of
>> highly structured nested directories.  File sizes range from a few bytes to
>> < 1GB.  It wouldn't have to run real time (updating archives seems to be a
>> fairly intensive task), but would probably run at least every few days.
>>
>> Thanks,
>> Charlie
>>
>>
>>
>>
>>
>> On Mon, Nov 23, 2015 at 11:08 AM, Joe Witt <[email protected]> wrote:
>>
>> Charlie,
>> >
>> >Can give some pointers on how to get in the ballpark with this but
>> >want to make sure we have a good alignment of purpose here.  NiFi has
>> >from time to time come up as an intuitive way to build an archive
>> >management tool and it is always "not quite right" because of the
>> >subtle differences between continuous streams of information and
>> >ad-hoc sort of one-time tasks.
>> >
>> >Would this be a continuous task (always running) even if it is slow
>> >(every few minutes, hours, days) or would it be a one-time thing to
>> >move a bunch of data from one place to another?
>> >
>> >The difference sounds very minor but it will help me to understand how
>> >best to respond.
>> >
>> >Thanks
>> >Joe
>> >
>> >
>> >On Mon, Nov 23, 2015 at 10:54 AM, Charlie Frasure
>> ><[email protected]> wrote:
>> >> Use case: Archive and compress files by category and month, store like
>> files
>> >> in a common directory.
>> >>
>> >> I'm already processing the files, and have extracted the interesting
>> >> attributes from each.  I ran them through MergeContent, but have not
>> been
>> >> able to produce a logical directory structure to store the results.  I
>> would
>> >> prefer something like archive/categoryA/201511/somefilename.tar.gz
>> where
>> >> somefilename is made up of all the categoryA files received in November
>> >> 2015.
>> >>
>> >> I switched gears, and used PutFile to store the files in the preferred
>> >> directory structure, but at a loss of how to archive them within their
>> >> folders given hundreds of dynamic categories, and date additions every
>> >> month.
>> >>
>> >> I'm playing with MergeContent's Correlation Attribute Name, but am also
>> >> considering trying the "Degfragment" merge strategy by correlating the
>> files
>> >> earlier in the process.
>> >>
>> >> Any suggestions would be appreciated.
>> >
>>
>
>
>
>

Reply via email to