Juan,

I think RouteText is the right answer.  It would indeed need to check
all lines to determine whether the condition is satisfied and to
remove the first line it will need to write out all the remaining
lines.  If a majority of the input files do not have this problematic
header I'd use RouteOnContent with a small buffer (however many bytes
would be in the header line of an erroneous file) and check for the
presence of "ERROR".  If it did hit then I'd route to RouteText to do
this more expensive piece.  If it didn't hit then you can move it on
without paying the RouteText cost.

It is useful to consider we'd have a processor, or update an existing
one to handle removal of lines from the beginning or end based on some
conditional.  Not sure what that would look like as the requirements
can get pretty specific.  I do think generally for such cases that
ExecuteScript processors offer an excellent tradeoff such that one can
build a very small focused and fast script to do precisely what they
need.

Thanks
Joe

On Mon, Mar 6, 2017 at 4:50 PM, Lee Laim <[email protected]> wrote:
> Juan,
>
> If you're in the linux environment, you can use the Execute Stream command
> (ESC) processor to run "head -n 1"  the contents of the incoming large
> flowfile to  quickly extract the first line.  ESC has an option to put the
> output of the command directly into a new attribute, and pass the "original
> contents" to the next processor.  The value of the new attribute contains
> the first line, while the entire file remains in the flowfile contents.  You
> can use the new attribute for quick(er) routing decision.
>
> Thanks,
> Lee
>
>
>
> On Mon, Mar 6, 2017 at 1:46 PM, Juan Sequeiros <[email protected]> wrote:
>>
>> Good afternoon all,,
>>
>> I am trying to remove the first line of a file if it has a certain word in
>> it "ERROR"
>> I know it will exist only in the first line ( I can not fix the reason why
>> it gets put there )
>>
>> These files are big and lots of them.
>>
>> and I can not find a "fast" fix to pop the first line of a file,
>> everything I can think of within NIFI ends up at least running through the
>> whole file.
>>
>> I am using RouteText suggested at one time on separate thread.
>>
>> Routing Strategy: Route to "matched" if the line matches any condition.
>> Matching Strategy: Satisfies Expression
>> My expression: ${lineNo:lt(2):and($line:find('ERROR')})}
>>
>> I then route "matched" to auto-terminate and unmatched as my "new" file
>> without the first line.
>>
>> This seems to be working but it is slow since I believe it still at least
>> runs through the whole file line by line.
>>
>> Is there any other suggestions?  I've read the "ExecuteGroovy" solutions
>> but they seem excessive if all I want is to remove first line of file.
>>
>> I've also looked at ReplaceText and thought that would give me a clean
>> solution since I thought I could control input stream with the "Maximum
>> Buffer Size" but that is a conditional setting and if "evaluation Mode" is
>> Line-by-Line then I later learned "Maximum Buffer" is only for the buffer
>> size of the line.
>>
>> Thanks
>
>

Reply via email to