Re: BufferedReader best option to search through large flowfiles?

2023-06-05 Thread Lars Winderling

Hi Jim,

RouteText works in a line-by-line fashion, so that shouldn't exhaust 
memory (unless for /very/ long lines). Other processors such as 
ReplaceText have the option to choose whether you want to stream lines, 
or slurp the whole file at once.


Best,
Lars

On 23-06-05 14:49, James McMahon wrote:
Thank you very much Mark and Lars. Ideally I do prefer to employ 
standard "out of the box" processors. In this case my requirement is 
to identify bounding dates across all content in the flowfile. As I 
match my DT patterns, I'll add the tokens to a groovy list that I can 
later sort and use to identify the extreme values. (I may actually 
throw out the extremes to ensure I'm not working with an outlier that 
is an error). I know how to make those manipulations in a groovy 
script. I don't know how to accomplish them using standard processors.


Mark, for future reference is there a risk when using RouteText that a 
huge flowfile might exhaust jvm or repo resources? Is there such a 
risk for the ExtractText, ReplaceText, and RouteOnContent processors 
mentioned by Lars?


Jim

On Mon, Jun 5, 2023 at 8:25 AM Mark Payne  wrote:

Jim,

Take a look at RouteText.

Thanks
-Mark


> On Jun 5, 2023, at 8:09 AM, James McMahon 
wrote:
>
> Hello. I have a requirement to scan for multiple regex patterns
in very large flowfiles. Given that my flowfiles can be very
large, I think my best approach is to employ an
ExecuteGroovyScript processor and a script using a BufferedReader
to scan the file one line at a time.
>
> I am concerned that I might exhaust jvm resources trying to
otherwise process large content if I try to handle it all at once.
Is a BufferedReader the right call? Does anyone recommend a better
approach?
>
> Thanks in advance,
> Jim





OpenPGP_signature
Description: OpenPGP digital signature


Re: BufferedReader best option to search through large flowfiles?

2023-06-05 Thread James McMahon
Thank you very much Mark and Lars. Ideally I do prefer to employ standard
"out of the box" processors. In this case my requirement is to identify
bounding dates across all content in the flowfile. As I match my DT
patterns, I'll add the tokens to a groovy list that I can later sort and
use to identify the extreme values. (I may actually throw out the extremes
to ensure I'm not working with an outlier that is an error). I know how to
make those manipulations in a groovy script. I don't know how to accomplish
them using standard processors.

Mark, for future reference is there a risk when using RouteText that a huge
flowfile might exhaust jvm or repo resources? Is there such a risk for the
ExtractText, ReplaceText, and RouteOnContent processors mentioned by Lars?

Jim

On Mon, Jun 5, 2023 at 8:25 AM Mark Payne  wrote:

> Jim,
>
> Take a look at RouteText.
>
> Thanks
> -Mark
>
>
> > On Jun 5, 2023, at 8:09 AM, James McMahon  wrote:
> >
> > Hello. I have a requirement to scan for multiple regex patterns in very
> large flowfiles. Given that my flowfiles can be very large, I think my best
> approach is to employ an ExecuteGroovyScript processor and a script using a
> BufferedReader to scan the file one line at a time.
> >
> > I am concerned that I might exhaust jvm resources trying to otherwise
> process large content if I try to handle it all at once. Is a
> BufferedReader the right call? Does anyone recommend a better approach?
> >
> > Thanks in advance,
> > Jim
>
>


Re: BufferedReader best option to search through large flowfiles?

2023-06-05 Thread Mark Payne
Jim,

Take a look at RouteText.

Thanks
-Mark


> On Jun 5, 2023, at 8:09 AM, James McMahon  wrote:
> 
> Hello. I have a requirement to scan for multiple regex patterns in very large 
> flowfiles. Given that my flowfiles can be very large, I think my best 
> approach is to employ an ExecuteGroovyScript processor and a script using a 
> BufferedReader to scan the file one line at a time. 
> 
> I am concerned that I might exhaust jvm resources trying to otherwise process 
> large content if I try to handle it all at once. Is a BufferedReader the 
> right call? Does anyone recommend a better approach?
> 
> Thanks in advance,
> Jim



Re: BufferedReader best option to search through large flowfiles?

2023-06-05 Thread Lars Winderling

Hi James,

in case the NiFi processors such as ExtractText, ReplaceText and 
RouteOnContent (maybe multiple in a row/in parallel) do not match your 
use case, I'd definitely go with a bufferend reader and line wise 
processing. Afaik you can get it as easily as

    new File("/path/to/my/file").eachLine { line -> ... }

Enjoy your day and take care!
Best,
Lars

On 23-06-05 14:09, James McMahon wrote:
Hello. I have a requirement to scan for multiple regex patterns in 
very large flowfiles. Given that my flowfiles can be very large, I 
think my best approach is to employ an ExecuteGroovyScript processor 
and a script using a BufferedReader to scan the file one line at a time.


I am concerned that I might exhaust jvm resources trying to otherwise 
process large content if I try to handle it all at once. Is a 
BufferedReader the right call? Does anyone recommend a better approach?


Thanks in advance,
Jim




OpenPGP_signature
Description: OpenPGP digital signature