Juan, I think RouteText is the right answer. It would indeed need to check all lines to determine whether the condition is satisfied and to remove the first line it will need to write out all the remaining lines. If a majority of the input files do not have this problematic header I'd use RouteOnContent with a small buffer (however many bytes would be in the header line of an erroneous file) and check for the presence of "ERROR". If it did hit then I'd route to RouteText to do this more expensive piece. If it didn't hit then you can move it on without paying the RouteText cost.
It is useful to consider we'd have a processor, or update an existing one to handle removal of lines from the beginning or end based on some conditional. Not sure what that would look like as the requirements can get pretty specific. I do think generally for such cases that ExecuteScript processors offer an excellent tradeoff such that one can build a very small focused and fast script to do precisely what they need. Thanks Joe On Mon, Mar 6, 2017 at 4:50 PM, Lee Laim <[email protected]> wrote: > Juan, > > If you're in the linux environment, you can use the Execute Stream command > (ESC) processor to run "head -n 1" the contents of the incoming large > flowfile to quickly extract the first line. ESC has an option to put the > output of the command directly into a new attribute, and pass the "original > contents" to the next processor. The value of the new attribute contains > the first line, while the entire file remains in the flowfile contents. You > can use the new attribute for quick(er) routing decision. > > Thanks, > Lee > > > > On Mon, Mar 6, 2017 at 1:46 PM, Juan Sequeiros <[email protected]> wrote: >> >> Good afternoon all,, >> >> I am trying to remove the first line of a file if it has a certain word in >> it "ERROR" >> I know it will exist only in the first line ( I can not fix the reason why >> it gets put there ) >> >> These files are big and lots of them. >> >> and I can not find a "fast" fix to pop the first line of a file, >> everything I can think of within NIFI ends up at least running through the >> whole file. >> >> I am using RouteText suggested at one time on separate thread. >> >> Routing Strategy: Route to "matched" if the line matches any condition. >> Matching Strategy: Satisfies Expression >> My expression: ${lineNo:lt(2):and($line:find('ERROR')})} >> >> I then route "matched" to auto-terminate and unmatched as my "new" file >> without the first line. >> >> This seems to be working but it is slow since I believe it still at least >> runs through the whole file line by line. >> >> Is there any other suggestions? I've read the "ExecuteGroovy" solutions >> but they seem excessive if all I want is to remove first line of file. >> >> I've also looked at ReplaceText and thought that would give me a clean >> solution since I thought I could control input stream with the "Maximum >> Buffer Size" but that is a conditional setting and if "evaluation Mode" is >> Line-by-Line then I later learned "Maximum Buffer" is only for the buffer >> size of the line. >> >> Thanks > >
