RE Gurus,
I have a "match anything that is not this thing" pattern that I haven't worked
out yet.
Colorized below is the (log) data to parse, sometimes multi-line, sometimes
single line, in a repeated pattern. Here's my test script:
#!/usr/bin/env raku
use Data::Dump::Tree;
use Grammar::Debugger;
my $data = q:to/END/;
3_1 2025-08-30T03:06:44-04:00 info Advanced Intrusion
Detection Environment (AIDE) detected potential changes to software on this
system. The changes are listed in /var/log/aide/aide.log and also at the end of
this alert message.
Summary : :
Total number of entries :
54096
Added entries : 1
Removed entries : 0
Changed entries : 0
1_1 2025-08-14T07:18:41-04:00 critical After initial accelerated
space reclamation, file system / is 80% full, which is equal to or above the
80% threshold. Accelerated space reclamation will continue.
This alert will be cleared
when file system / becomes less than 75% full.
Top three directories
ordered by total space usage are as follows:
/opt : 2.69G
/root : 2.15G
/usr : 1.76G
1_2 2025-08-14T17:36:40-04:00 clear File system / is 58% full,
which is below the 75% threshold. Normal space reclamation will resume.
END
my grammar EXADATALOG-grammar {
token TOP { <log-record>+
}
token log-record { <log-record-herald> \s+ <message>
}
token log-record-herald { ^ \s+ <name-field> \s+ <datetime-field> \s+
<status-field> }
token name-field { \d+ '_' \d+
}
token datetime-field { \d\d\d\d '-' \d\d '-' \d\d 'T' \d\d ':' \d\d
':' \d\d '-' \d\d ':' \d\d }
token status-field { \w+
}
token not-log-record-herald { <!log-record-herald>
}
token message { <not-log-record-herald>+
}
}
ddt EXADATALOG-grammar.parse($data);
=finish
My strategy is to characterize the start of each record with
<log-record-herald> as the anchor for the logic. Match a <log-record-herald>
and match a potentially multi-line <message>, with <message> being anything
that IS NOT a <log-record-herald>.
Is this a viable approach? Anyone know what I'm missing here?
Thanks,
Mark