Re: RouteText questions (regex, grouping, performance)

Mark Payne Fri, 01 Jul 2016 05:49:51 -0700

Hi Stephane,

For #1, when you say that you get as many output as lines of text, are you 
sending in FlowFiles that are only
one line of text each? The Processor does not aggregate multiple FlowFiles 
together, so if you are sending in
1-line FlowFiles, it can only route that FlowFile in 1-line outputs.

Re #2: The regular expression is compiled every time. This is done, though, 
because the Regex allows the Expression
Language to be used, so the Regex could actually be different for each 
FlowFile. That being said, it could certainly be
improved by either (a) pre-compiling in the case that no Expression Language is 
used and/or (b) cache up to say 10
Regex'es once they are compiled. Do you mind filing a JIRA to improve the 
efficiency of this processor?

Also, when you say that the processor is having trouble keeping up with a batch 
size of 1, there are a few thoughts that
come to mind:

* How many concurrent tasks do you have assigned to the processor? Have you 
tried increasing it?
* When processing text in NiFi it is is generally going to be much more 
efficient to process a single FlowFile with many lines,
instead of many small FlowFiles, due to the expense of the Data Provenance that 
has to be generated. There are some things
that we can do to improve efficiency of the data provenance as well, but those 
improvements have generally been made
'high' priority rather than 'extremely high priority' :) so i would expect to 
see them coming out possibly toward the end of this year,
after 1.0 and a few other major features come out.
* Rather than using a Regular Expression, the "Satisfies Expression" Matching 
Strategy is likely to be more efficient in many cases
if it is able to provide the routing logic that you need. It also tends to be 
easier to read than regular expressions, which is nice when
you (or someone else) goes back later to modify the flow.

Please let me know if anything here doesn't make sense or if you have any more 
questions.

Thanks!
-Mark

> On Jun 30, 2016, at 9:04 PM, Stéphane Maarek <[email protected]> 
> wrote:
> 
> Hi,
> 
> I have a question regarding RouteText. The processor works just fine for me 
> but maybe I'm missing a couple subtleties:
> 
> 1) I have a regex to group data by (a pair of IDs), but what do I use the 
> grouping attribute for? I still get as many outputs as lines 
> 2) My data is coming from a listenUDP. If my batch size is 1, RouteText is 
> having a lot of trouble processing all the data. I would guess that it 
> compiles the regex everytime it is executed, is it correct? When I increase 
> the batch size to 100, RouteText processes everything well. I was wondering 
> if there could be some sort of optimization on the RouteText to keep the 
> regex compile nonetheless of the state of the processor? 
> 
> 
> Thanks a lot!
> Stephane

Re: RouteText questions (regex, grouping, performance)

Reply via email to