Huagen,

Sorry, I am a little confused. My understanding is that you want to combine n 
individual logs (each with a respective flowfile) from a specific hour into a 
single file. What is confusing is when you say “Even with that [a 5* 
confirmation loop], I occasionally still get more than one merged flowfile.” Do 
you mean that what you expected to be combined into a single flowfile is output 
as two distinct and incomplete flowfiles?

Without seeing a template of your work flow, I can make a couple of suggestions.

First, as mentioned last night by James Wing, I would encourage you to look at 
the MergeContent [1] processor properties to provide a high threshold for 
merging flowfiles. If you know the number of log files per hour a priori, you 
can set that as the “Minimum Number of Entries” and ensure that output will 
wait until that many flowfiles have been accumulated.

Also, given that you have described a “loop”, I would imagine you may have 
multiple connections feeding into MergeContent. MergeContent can have 
unexpected behavior with multiple incoming connections, and so I would 
recommend adding a Funnel to aggregate all incoming connections and provide a 
single incoming connection to MergeContent.

Please let us know if this helps, and if not, please share a template and some 
sample input if possible. Thanks.

[1] 
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.MergeContent/index.html


Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jun 1, 2016, at 11:52 AM, Huagen peng <[email protected]> wrote:
> 
> Hi,
> 
> In the data flow I am dealing with now, there are multiple (up to 200) logs 
> associated with a given hour.  I need to process these fragment hourly logs 
> and then concatenate them into a single file.  The approach I am using now 
> has an UpdateAttribute processor to set an arbitrary 
> segment.original.filename attribute on all the flowfiles I want to merge.  
> Then I use a MergeContent processor, with an UpdateAttribute and 
> RouteOnAttribute processor to form a loop to confirm five times that the 
> merge is complete.  Even with that, I occasionally still get more than one 
> merged flowfile.
> 
> Is there a better way to do this?  Or should I increase the loop count, say 
> 10?
> 
> Thanks.
> 
> Huagen

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to