Hi Mark,

I haven't had a chance to test your script yet, but I generally write 
one-liners to process text like yours. 
The question is ...how you want to take input:  do you want to take it off the 
command line?

1. In the first example below, the file is `slurp`ed in, then split on records 
using the appropriate lookahead/lookbehind. 
2. In the second example below, a 'sliding window` approach is used: lines 
accumulate in an array. You can modify this code to empty the array when the 
next <log-record-herald> (header) line arrives.

https://unix.stackexchange.com/questions/703741/how-to-retrieve-data-from-a-logfile-where-timestamp-can-be-followed-by-multiline/704242#704242

https://unix.stackexchange.com/questions/29906/delete-range-of-lines-above-pattern-with-sed-or-awk/774599#774599

Finally, if you're looking for a (possibly-difficult)  "negated"-regex answer, 
you could look at the StackOverflow link below:

https://stackoverflow.com/questions/47396166/how-to-negate-subtract-regexes-not-only-character-classes-in-perl-6?rq=1

HTH, Bill. 


> On Oct 25, 2025, at 10:16, Mark Devine <[email protected]> wrote:
> 
> RE Gurus,
>  
> I have a “match anything that is not this thing” pattern that I haven’t 
> worked out yet.
>  
> Colorized below is the (log) data to parse, sometimes multi-line, sometimes 
> single line, in a repeated pattern.  Here’s my test script:
>  
> #!/usr/bin/env raku
>  
> use Data::Dump::Tree;
> use Grammar::Debugger;
>  
> my $data = q:to/END/;
>     3_1    2025-08-30T03:06:44-04:00    info        Advanced Intrusion 
> Detection Environment (AIDE) detected potential changes to software on this 
> system. The changes are listed in /var/log/aide/aide.log and also at the end 
> of this alert message.
>                                                     Summary : :
>                                                     Total number of entries : 
> 54096
>                                                     Added entries : 1
>                                                     Removed entries : 0
>                                                     Changed entries : 0
>     1_1    2025-08-14T07:18:41-04:00    critical    After initial accelerated 
> space reclamation, file system / is 80% full, which is equal to or above the 
> 80% threshold. Accelerated space reclamation will continue.
>                                                     This alert will be 
> cleared when file system / becomes less than 75% full.
>                                                     Top three directories 
> ordered by total space usage are as follows:
>                                                     /opt        : 2.69G
>                                                     /root        : 2.15G
>                                                     /usr        : 1.76G
>     1_2    2025-08-14T17:36:40-04:00    clear       File system / is 58% 
> full, which is below the 75% threshold. Normal space reclamation will resume.
> END
>  
> my grammar EXADATALOG-grammar {
>     token TOP                   { <log-record>+                               
>                               }
>     token log-record            { <log-record-herald> \s+ <message>           
>                               }
>     token log-record-herald     { ^ \s+ <name-field> \s+ <datetime-field> \s+ 
> <status-field>                }
>     token name-field            { \d+ '_' \d+                                 
>                               }
>     token datetime-field        { \d\d\d\d '-' \d\d '-' \d\d 'T' \d\d ':' 
> \d\d ':' \d\d '-' \d\d ':' \d\d   }
>     token status-field          { \w+                                         
>                               }
>     token not-log-record-herald { <!log-record-herald>                        
>                               }
>     token message               { <not-log-record-herald>+                    
>                               }
> }
>  
> ddt EXADATALOG-grammar.parse($data);
>  
> =finish
>  
> My strategy is to characterize the start of each record with  
> <log-record-herald> as the anchor for the logic.  Match a <log-record-herald> 
> and match a potentially multi-line <message>, with <message> being anything 
> that IS NOT a <log-record-herald>.
>  
> Is this a viable approach?  Anyone know what I’m missing here?
>  
> Thanks,
>  
> Mark

Reply via email to