The logic of the idea seems to be sound, but I see here that 
<not-log-record-herald> matches "" (blank).  I was hoping that 
<not-log-record-herald> would consider the string a match and pack it in the 
Match object.  I'd use an action to accumulate if necessary.  But since it 
discards the string, I think I might be out of luck entirely.  Thoughts?

TOP
> |  log-record
>
|  |  log-record-herald
>
|  |  |  name-field
>
|  |  |  * MATCH "3_1"
>
|  |  |  datetime-field
>
|  |  |  * MATCH "2025-08-30T03:06:44-04:00"
>
|  |  |  status-field
>
|  |  |  * MATCH "info"
>
|  |  * MATCH "    3_1    2025-08-30T03:06:44-04:00    info"
>
|  |  message
>
|  |  |  not-log-record-herald
>
|  |  |  |  log-record-herald
>
|  |  |  |  * FAIL
>
|  |  |  * MATCH ""

Thanks,

Mark Devine
(202) 878-1500

From: Mark Devine <[email protected]>
Sent: Saturday, October 25, 2025 1:16 PM
To: [email protected]
Subject: Grammar: "match anything that is not this thing"

RE Gurus,

I have a "match anything that is not this thing" pattern that I haven't worked 
out yet.

Colorized below is the (log) data to parse, sometimes multi-line, sometimes 
single line, in a repeated pattern.  Here's my test script:

#!/usr/bin/env raku

use Data::Dump::Tree;
use Grammar::Debugger;

my $data = q:to/END/;
    3_1    2025-08-30T03:06:44-04:00    info        Advanced Intrusion 
Detection Environment (AIDE) detected potential changes to software on this 
system. The changes are listed in /var/log/aide/aide.log and also at the end of 
this alert message.
                                                    Summary : :
                                                    Total number of entries : 
54096
                                                    Added entries : 1
                                                    Removed entries : 0
                                                    Changed entries : 0
    1_1    2025-08-14T07:18:41-04:00    critical    After initial accelerated 
space reclamation, file system / is 80% full, which is equal to or above the 
80% threshold. Accelerated space reclamation will continue.
                                                    This alert will be cleared 
when file system / becomes less than 75% full.
                                                    Top three directories 
ordered by total space usage are as follows:
                                                    /opt        : 2.69G
                                                    /root        : 2.15G
                                                    /usr        : 1.76G
    1_2    2025-08-14T17:36:40-04:00    clear       File system / is 58% full, 
which is below the 75% threshold. Normal space reclamation will resume.
END

my grammar EXADATALOG-grammar {
    token TOP                   { <log-record>+                                 
                            }
    token log-record            { <log-record-herald> \s+ <message>             
                            }
    token log-record-herald     { ^ \s+ <name-field> \s+ <datetime-field> \s+ 
<status-field>                }
    token name-field            { \d+ '_' \d+                                   
                            }
    token datetime-field        { \d\d\d\d '-' \d\d '-' \d\d 'T' \d\d ':' \d\d 
':' \d\d '-' \d\d ':' \d\d   }
    token status-field          { \w+                                           
                            }
    token not-log-record-herald { <!log-record-herald>                          
                            }
    token message               { <not-log-record-herald>+                      
                            }
}

ddt EXADATALOG-grammar.parse($data);

=finish

My strategy is to characterize the start of each record with  
<log-record-herald> as the anchor for the logic.  Match a <log-record-herald> 
and match a potentially multi-line <message>, with <message> being anything 
that IS NOT a <log-record-herald>.

Is this a viable approach?  Anyone know what I'm missing here?

Thanks,

Mark

Reply via email to