Thanks for the reply, Andy. I ended up abandoning my previous approach and using ExecuteStreamCommand to output (with zcat command on GZ files) all the files I want to concatenate. Then performing some data manipulation and saving the file.
Huagen > 在 2016年6月3日,上午12:29,Andy LoPresto <[email protected]> 写道: > > Huagen, > > Sorry, I am a little confused. My understanding is that you want to combine n > individual logs (each with a respective flowfile) from a specific hour into a > single file. What is confusing is when you say “Even with that [a 5* > confirmation loop], I occasionally still get more than one merged flowfile.” > Do you mean that what you expected to be combined into a single flowfile is > output as two distinct and incomplete flowfiles? > > Without seeing a template of your work flow, I can make a couple of > suggestions. > > First, as mentioned last night by James Wing, I would encourage you to look > at the MergeContent [1] processor properties to provide a high threshold for > merging flowfiles. If you know the number of log files per hour a priori, you > can set that as the “Minimum Number of Entries” and ensure that output will > wait until that many flowfiles have been accumulated. > > Also, given that you have described a “loop”, I would imagine you may have > multiple connections feeding into MergeContent. MergeContent can have > unexpected behavior with multiple incoming connections, and so I would > recommend adding a Funnel to aggregate all incoming connections and provide a > single incoming connection to MergeContent. > > Please let us know if this helps, and if not, please share a template and > some sample input if possible. Thanks. > > [1] > https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.MergeContent/index.html > > <https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.MergeContent/index.html> > > > Andy LoPresto > [email protected] <mailto:[email protected]> > [email protected] <mailto:[email protected]> > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > >> On Jun 1, 2016, at 11:52 AM, Huagen peng <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi, >> >> In the data flow I am dealing with now, there are multiple (up to 200) logs >> associated with a given hour. I need to process these fragment hourly logs >> and then concatenate them into a single file. The approach I am using now >> has an UpdateAttribute processor to set an arbitrary >> segment.original.filename attribute on all the flowfiles I want to merge. >> Then I use a MergeContent processor, with an UpdateAttribute and >> RouteOnAttribute processor to form a loop to confirm five times that the >> merge is complete. Even with that, I occasionally still get more than one >> merged flowfile. >> >> Is there a better way to do this? Or should I increase the loop count, say >> 10? >> >> Thanks. >> >> Huagen >
