Re: BufferedReader best option to search through large flowfiles?

Lars Winderling Mon, 05 Jun 2023 06:16:03 -0700

Hi Jim,

RouteText works in a line-by-line fashion, so that shouldn't exhaust memory (unless for /very/ long lines). Other processors such as ReplaceText have the option to choose whether you want to stream lines, or slurp the whole file at once.


Best,
Lars

On 23-06-05 14:49, James McMahon wrote:

Thank you very much Mark and Lars. Ideally I do prefer to employ standard "out of the box" processors. In this case my requirement is to identify bounding dates across all content in the flowfile. As I match my DT patterns, I'll add the tokens to a groovy list that I can later sort and use to identify the extreme values. (I may actually throw out the extremes to ensure I'm not working with an outlier that is an error). I know how to make those manipulations in a groovy script. I don't know how to accomplish them using standard processors.
Mark, for future reference is there a risk when using RouteText that a huge flowfile might exhaust jvm or repo resources? Is there such a risk for the ExtractText, ReplaceText, and RouteOnContent processors mentioned by Lars?
Jim

On Mon, Jun 5, 2023 at 8:25 AM Mark Payne <[email protected]> wrote:

    Jim,

    Take a look at RouteText.

    Thanks
    -Mark


    > On Jun 5, 2023, at 8:09 AM, James McMahon <[email protected]>
    wrote:
    >
    > Hello. I have a requirement to scan for multiple regex patterns
    in very large flowfiles. Given that my flowfiles can be very
    large, I think my best approach is to employ an
    ExecuteGroovyScript processor and a script using a BufferedReader
    to scan the file one line at a time.
    >
    > I am concerned that I might exhaust jvm resources trying to
    otherwise process large content if I try to handle it all at once.
    Is a BufferedReader the right call? Does anyone recommend a better
    approach?
    >
    > Thanks in advance,
    > Jim

OpenPGP_signature
Description: OpenPGP digital signature

Re: BufferedReader best option to search through large flowfiles?

Reply via email to