Hi Mika,

You're correct that there is not yet a MergeRecord processor. It is on my 
personal radar,
but I've not yet gotten to it. One of the main reasons that I've not 
prioritized this yet is that
typically in this record-oriented paradigm, you'll see data coming in, in 
groups and being
processed in groups. MergeContent largely has been useful in cases where we 
split data
apart (using processors like SplitText, for example), and then merge it back 
together later.
I don't see this as being quite as prominent when using record readers and 
writers, as the
readers are designed to handle streams of data instead of individual records as 
FlowFiles.

That being said, there are certainly cases where MergeRecord still makes sense. 
For example,
when you're ingesting small payloads or want to batch up to send to something 
like HDFS, which
prefers larger files, etc. So I'll hopefully have a chance to start working on 
that this week or next.

In the mean time, the best path forward for you may be to use MergeContent to 
concatenate a bunch
of data before the processor that is using the Grok Reader. Or, if you are 
splitting the data up
into individual records yourself, I would recommend not splitting them up at 
all.

Does this make sense?

Thanks
-Mark


> On Jun 12, 2017, at 3:12 PM, Mika Borner <[email protected]> wrote:
> 
> Hi,
> 
> what is the best way to merge records? I'm using a GrokReader, that spits out 
> single json records. For efficiency I would like to merge a few hundred 
> records into one flowfile. It seems there's no MergeRecord processor yet...
> 
> Thanks!
> 
> Mika>
> 

Reply via email to