Hi Stephane, For #1, when you say that you get as many output as lines of text, are you sending in FlowFiles that are only one line of text each? The Processor does not aggregate multiple FlowFiles together, so if you are sending in 1-line FlowFiles, it can only route that FlowFile in 1-line outputs.
Re #2: The regular expression is compiled every time. This is done, though, because the Regex allows the Expression Language to be used, so the Regex could actually be different for each FlowFile. That being said, it could certainly be improved by either (a) pre-compiling in the case that no Expression Language is used and/or (b) cache up to say 10 Regex'es once they are compiled. Do you mind filing a JIRA to improve the efficiency of this processor? Also, when you say that the processor is having trouble keeping up with a batch size of 1, there are a few thoughts that come to mind: * How many concurrent tasks do you have assigned to the processor? Have you tried increasing it? * When processing text in NiFi it is is generally going to be much more efficient to process a single FlowFile with many lines, instead of many small FlowFiles, due to the expense of the Data Provenance that has to be generated. There are some things that we can do to improve efficiency of the data provenance as well, but those improvements have generally been made 'high' priority rather than 'extremely high priority' :) so i would expect to see them coming out possibly toward the end of this year, after 1.0 and a few other major features come out. * Rather than using a Regular Expression, the "Satisfies Expression" Matching Strategy is likely to be more efficient in many cases if it is able to provide the routing logic that you need. It also tends to be easier to read than regular expressions, which is nice when you (or someone else) goes back later to modify the flow. Please let me know if anything here doesn't make sense or if you have any more questions. Thanks! -Mark > On Jun 30, 2016, at 9:04 PM, Stéphane Maarek <[email protected]> > wrote: > > Hi, > > I have a question regarding RouteText. The processor works just fine for me > but maybe I'm missing a couple subtleties: > > 1) I have a regex to group data by (a pair of IDs), but what do I use the > grouping attribute for? I still get as many outputs as lines > 2) My data is coming from a listenUDP. If my batch size is 1, RouteText is > having a lot of trouble processing all the data. I would guess that it > compiles the regex everytime it is executed, is it correct? When I increase > the batch size to 100, RouteText processes everything well. I was wondering > if there could be some sort of optimization on the RouteText to keep the > regex compile nonetheless of the state of the processor? > > > Thanks a lot! > Stephane
