Re: BufferedReader best option to search through large flowfiles?
Hi Jim, RouteText works in a line-by-line fashion, so that shouldn't exhaust memory (unless for /very/ long lines). Other processors such as ReplaceText have the option to choose whether you want to stream lines, or slurp the whole file at once. Best, Lars On 23-06-05 14:49, James McMahon wrote: Thank you very much Mark and Lars. Ideally I do prefer to employ standard "out of the box" processors. In this case my requirement is to identify bounding dates across all content in the flowfile. As I match my DT patterns, I'll add the tokens to a groovy list that I can later sort and use to identify the extreme values. (I may actually throw out the extremes to ensure I'm not working with an outlier that is an error). I know how to make those manipulations in a groovy script. I don't know how to accomplish them using standard processors. Mark, for future reference is there a risk when using RouteText that a huge flowfile might exhaust jvm or repo resources? Is there such a risk for the ExtractText, ReplaceText, and RouteOnContent processors mentioned by Lars? Jim On Mon, Jun 5, 2023 at 8:25 AM Mark Payne wrote: Jim, Take a look at RouteText. Thanks -Mark > On Jun 5, 2023, at 8:09 AM, James McMahon wrote: > > Hello. I have a requirement to scan for multiple regex patterns in very large flowfiles. Given that my flowfiles can be very large, I think my best approach is to employ an ExecuteGroovyScript processor and a script using a BufferedReader to scan the file one line at a time. > > I am concerned that I might exhaust jvm resources trying to otherwise process large content if I try to handle it all at once. Is a BufferedReader the right call? Does anyone recommend a better approach? > > Thanks in advance, > Jim OpenPGP_signature Description: OpenPGP digital signature
Re: BufferedReader best option to search through large flowfiles?
Thank you very much Mark and Lars. Ideally I do prefer to employ standard "out of the box" processors. In this case my requirement is to identify bounding dates across all content in the flowfile. As I match my DT patterns, I'll add the tokens to a groovy list that I can later sort and use to identify the extreme values. (I may actually throw out the extremes to ensure I'm not working with an outlier that is an error). I know how to make those manipulations in a groovy script. I don't know how to accomplish them using standard processors. Mark, for future reference is there a risk when using RouteText that a huge flowfile might exhaust jvm or repo resources? Is there such a risk for the ExtractText, ReplaceText, and RouteOnContent processors mentioned by Lars? Jim On Mon, Jun 5, 2023 at 8:25 AM Mark Payne wrote: > Jim, > > Take a look at RouteText. > > Thanks > -Mark > > > > On Jun 5, 2023, at 8:09 AM, James McMahon wrote: > > > > Hello. I have a requirement to scan for multiple regex patterns in very > large flowfiles. Given that my flowfiles can be very large, I think my best > approach is to employ an ExecuteGroovyScript processor and a script using a > BufferedReader to scan the file one line at a time. > > > > I am concerned that I might exhaust jvm resources trying to otherwise > process large content if I try to handle it all at once. Is a > BufferedReader the right call? Does anyone recommend a better approach? > > > > Thanks in advance, > > Jim > >
Re: BufferedReader best option to search through large flowfiles?
Jim, Take a look at RouteText. Thanks -Mark > On Jun 5, 2023, at 8:09 AM, James McMahon wrote: > > Hello. I have a requirement to scan for multiple regex patterns in very large > flowfiles. Given that my flowfiles can be very large, I think my best > approach is to employ an ExecuteGroovyScript processor and a script using a > BufferedReader to scan the file one line at a time. > > I am concerned that I might exhaust jvm resources trying to otherwise process > large content if I try to handle it all at once. Is a BufferedReader the > right call? Does anyone recommend a better approach? > > Thanks in advance, > Jim
Re: BufferedReader best option to search through large flowfiles?
Hi James, in case the NiFi processors such as ExtractText, ReplaceText and RouteOnContent (maybe multiple in a row/in parallel) do not match your use case, I'd definitely go with a bufferend reader and line wise processing. Afaik you can get it as easily as new File("/path/to/my/file").eachLine { line -> ... } Enjoy your day and take care! Best, Lars On 23-06-05 14:09, James McMahon wrote: Hello. I have a requirement to scan for multiple regex patterns in very large flowfiles. Given that my flowfiles can be very large, I think my best approach is to employ an ExecuteGroovyScript processor and a script using a BufferedReader to scan the file one line at a time. I am concerned that I might exhaust jvm resources trying to otherwise process large content if I try to handle it all at once. Is a BufferedReader the right call? Does anyone recommend a better approach? Thanks in advance, Jim OpenPGP_signature Description: OpenPGP digital signature
BufferedReader best option to search through large flowfiles?
Hello. I have a requirement to scan for multiple regex patterns in very large flowfiles. Given that my flowfiles can be very large, I think my best approach is to employ an ExecuteGroovyScript processor and a script using a BufferedReader to scan the file one line at a time. I am concerned that I might exhaust jvm resources trying to otherwise process large content if I try to handle it all at once. Is a BufferedReader the right call? Does anyone recommend a better approach? Thanks in advance, Jim