Thanks for the reply, Andy.

I ended up abandoning my previous approach and using ExecuteStreamCommand to 
output (with zcat command on GZ files) all the files I want to concatenate.  
Then performing some data manipulation and saving the file.

Huagen

> 在 2016年6月3日,上午12:29,Andy LoPresto <[email protected]> 写道:
> 
> Huagen, 
> 
> Sorry, I am a little confused. My understanding is that you want to combine n 
> individual logs (each with a respective flowfile) from a specific hour into a 
> single file. What is confusing is when you say “Even with that [a 5* 
> confirmation loop], I occasionally still get more than one merged flowfile.” 
> Do you mean that what you expected to be combined into a single flowfile is 
> output as two distinct and incomplete flowfiles? 
> 
> Without seeing a template of your work flow, I can make a couple of 
> suggestions. 
> 
> First, as mentioned last night by James Wing, I would encourage you to look 
> at the MergeContent [1] processor properties to provide a high threshold for 
> merging flowfiles. If you know the number of log files per hour a priori, you 
> can set that as the “Minimum Number of Entries” and ensure that output will 
> wait until that many flowfiles have been accumulated. 
> 
> Also, given that you have described a “loop”, I would imagine you may have 
> multiple connections feeding into MergeContent. MergeContent can have 
> unexpected behavior with multiple incoming connections, and so I would 
> recommend adding a Funnel to aggregate all incoming connections and provide a 
> single incoming connection to MergeContent. 
> 
> Please let us know if this helps, and if not, please share a template and 
> some sample input if possible. Thanks. 
> 
> [1] 
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.MergeContent/index.html
>  
> <https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.MergeContent/index.html>
> 
> 
> Andy LoPresto
> [email protected] <mailto:[email protected]>
> [email protected] <mailto:[email protected]>
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On Jun 1, 2016, at 11:52 AM, Huagen peng <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi,
>> 
>> In the data flow I am dealing with now, there are multiple (up to 200) logs 
>> associated with a given hour.  I need to process these fragment hourly logs 
>> and then concatenate them into a single file.  The approach I am using now 
>> has an UpdateAttribute processor to set an arbitrary 
>> segment.original.filename attribute on all the flowfiles I want to merge.  
>> Then I use a MergeContent processor, with an UpdateAttribute and 
>> RouteOnAttribute processor to form a loop to confirm five times that the 
>> merge is complete.  Even with that, I occasionally still get more than one 
>> merged flowfile.
>> 
>> Is there a better way to do this?  Or should I increase the loop count, say 
>> 10?
>> 
>> Thanks.
>> 
>> Huagen  
> 

Reply via email to