Mika, Are you receiving the log messages using the ListenTCP processor?
If so, just wanted to mention that there is a property "Max Batch Size" that defaults to 1 and will control how many logical TCP messages can be written to a single flow file. If you increase that to say 1000, then you can send a flow file with 1000 log messages to the next record-based processor with the GrokReader. -Bryan On Mon, Jun 12, 2017 at 3:51 PM, Mark Payne <[email protected]> wrote: > Mika, > > Understood. The JIRA for this is NIFI-4060 [1]. MergeContent is likely the > best option for the short-term, > merging with a demarcator of \n (you can press Shift + Enter/Return to > insert a new-line in the UI), if that > works for your format. > > Thanks > -Mark > > > [1] https://issues.apache.org/jira/browse/NIFI-4060 > > > On Jun 12, 2017, at 3:36 PM, Mika Borner <[email protected]> wrote: > > Hi Mark > > Yes, this makes sense. > > In my case. I'm receiving single log events from a tcp input which I would > like to process further with record processors. This is probably an edge > case where a record merger would make sense to make the post-processing more > efficient. > > Good to hear it's already on the radar :-) > > Mika> > > > > On 06/12/2017 09:23 PM, Mark Payne wrote: > > Hi Mika, > > You're correct that there is not yet a MergeRecord processor. It is on my > personal radar, > but I've not yet gotten to it. One of the main reasons that I've not > prioritized this yet is that > typically in this record-oriented paradigm, you'll see data coming in, in > groups and being > processed in groups. MergeContent largely has been useful in cases where we > split data > apart (using processors like SplitText, for example), and then merge it back > together later. > I don't see this as being quite as prominent when using record readers and > writers, as the > readers are designed to handle streams of data instead of individual records > as FlowFiles. > > That being said, there are certainly cases where MergeRecord still makes > sense. For example, > when you're ingesting small payloads or want to batch up to send to > something like HDFS, which > prefers larger files, etc. So I'll hopefully have a chance to start working > on that this week or next. > > In the mean time, the best path forward for you may be to use MergeContent > to concatenate a bunch > of data before the processor that is using the Grok Reader. Or, if you are > splitting the data up > into individual records yourself, I would recommend not splitting them up at > all. > > Does this make sense? > > Thanks > -Mark > > > On Jun 12, 2017, at 3:12 PM, Mika Borner <[email protected]> wrote: > > Hi, > > what is the best way to merge records? I'm using a GrokReader, that spits > out single json records. For efficiency I would like to merge a few hundred > records into one flowfile. It seems there's no MergeRecord processor yet... > > Thanks! > > Mika> > > >
