Hello again, rsyslog community!
I did not see any responses to my earlier question (2024-02-05) about imfile
timeouts, so I have created an example configuration and some shell scripting
that demonstrates the problem.
Here is an abbreviated version of the configuration file I created for this
experiment:
=====================================================
module(load="imfile" mode="inotify")
input(type="imfile" File="/var/log/imfile-issue/input/proc00.log"
Severity="info" Tag="pfx:proc00"
readTimeout="2" reopenOnTruncate="on"
startmsg.regex="^[0-9]{4}-[0-9]{2}-[0-9]{2}"
)
input(type="imfile" File="/var/log/imfile-issue/input/proc01.log"
Severity="info" Tag="pfx:proc01"
readTimeout="2" reopenOnTruncate="on"
startmsg.regex="^[0-9]{4}-[0-9]{2}-[0-9]{2}"
)
input(type="imfile" File="/var/log/imfile-issue/input/proc02.log"
Severity="info" Tag="pfx:proc02"
readTimeout="2" reopenOnTruncate="on"
startmsg.regex="^[0-9]{4}-[0-9]{2}-[0-9]{2}"
)
=== I think you can see the pattern here, so I'm going to skip to the bottom,
because there are 50 of them in total ===
input(type="imfile" File="/var/log/imfile-issue/input/proc49.log"
Severity="info" Tag="pfx:proc49"
readTimeout="2"
reopenOnTruncate="on"
startmsg.regex="^[0-9]{4}-[0-9]{2}-[0-9]{2}"
)
input(type="imfile" File="/var/log/imfile-issue/input/slowLogger.log"
Severity="info" Tag="pfx:slowLogger"
readTimeout="2"
reopenOnTruncate="on"
startmsg.regex="^[0-9]{4}-[0-9]{2}-[0-9]{2}"
)
# Fowarding to outlog
if ( $syslogtag startswith 'pfx:' ) then {
*.info action(type="omfile"
dirCreateMode="0755"
FileCreateMode="0644"
File="/var/log/imfile-issue/output/output.log")
stop
}
=====================================================
I then created the input and output directories:
$ sudo mkdir -p /var/log/imfile-issue/input /var/log/imfile-issue/output
Made the input directory writable so I could generate lots of sample entries
(do not try this at home! I wouldn't suggest anyone leave a directory in
/var/log open for writing by everyone):
$ sudo chmod o+w /var/log/imfile-issue/input
I then ran the following bash snippet to produce non-stop output into the 50
"procXX.log" files named /var/log/imfile-issue/input/proc00.log through
/var/log/imfile-issue/input/proc49.log:
$ while true; do for x in $(seq 0 49); do procId=$(printf "proc%02d" "${x}");
date +"%F %T - Test record (${procId})" | tee -a
"/var/log/imfile-issue/input/${procId}.log"; done; done
And finally, from another shell, I ran this this snippet to write into the
"slowLogger.log" file every 15 seconds:
$ while true; do sleep 15; date +"%F %T - slow log file" | tee -a
/var/log/imfile-issue/input/slowLogger.log; done
The resulting output in /var/log/imfile-issue/output/output.log looks like this:
...
Feb 23 13:19:15 carme5 pfx:proc18 2024-02-23 13:19:15 - Test record (proc18)
Feb 23 13:19:15 carme5 pfx:proc19 2024-02-23 13:19:15 - Test record (proc19)
Feb 23 13:19:15 carme5 pfx:proc20 2024-02-23 13:19:15 - Test record (proc20)
Feb 23 13:19:15 carme5 pfx:proc21 2024-02-23 13:19:15 - Test record (proc21)
...
The timestamp on the left (MON DD HH:MM:SS) is when each record was written
into output.log. The timestamp in the middle is the timestamp that was in the
record when it was written into the input file.
For the rapidly updated logs, the timestamps are near-identical (written into
output file within a second of being generated).
However, the records written into the "slow logger", show a delay:
...
Feb 23 13:19:07 carme5 pfx:proc33 2024-02-23 13:19:06 - Test record (proc33)
Feb 23 13:19:07 carme5 pfx:slowLogger 2024-02-23 13:18:52 - slow log file
Feb 23 13:19:07 carme5 pfx:proc34 2024-02-23 13:19:06 - Test record (proc34)
...
Note that the record from the slow log file took 15 seconds to get into the
output log - in fact, it was put into the output stream exactly when the next
record was written into "slowLogger.log".
The readTimeout, which I expected to occur at 13:18:54 (two seconds after
13:18:52 when the record was written) never occurred.
It wasn't until the startMsg.regex matched the beginning of the next record at
13:19:07 that the previous record (from 13:18:52) was processed.
I then stopped the "slowLogger" shell loop and just wrote individual log
records into "slowLogger.log" at longer intervals, to see if the timer would
timeout. Here's an example of 1m4s between when a record was put into the input
file, and when it hit the output log:
Feb 23 13:31:00 carme5 pfx:proc02 2024-02-23 13:30:59 - Test record (proc02)
Feb 23 13:31:00 carme5 pfx:slowLogger 2024-02-23 13:29:56 - slow log file
Feb 23 13:31:00 carme5 pfx:proc03 2024-02-23 13:30:59 - Test record (proc03)
As I was seeing in our production system, the only way to get the final log
record from "slowLogger.log" to show up in the output.log was to either write
another log entry into slowLogger.log, or to stop the script that was writing
into the procXX.log files.
If I stopped the loop writing into the procXX.log files, the last log entry
from slowLogger.txt would pop out, followed 2 seconds (readTimeout) later by
the last records that were written into the procXX.log files.
(Sorry for how long this message is; my thanks if you've made it this far!)
The result of this behavior is that important log records written into slow
growing log files are sometimes not showing up in our Graylog stream for hours,
or even days after they've been written to the local files, because rsyslog
imfile sits on the records, waiting for another record to be written, or for a
timeout that will not come unless / until all our application software is shut
down and writing ceases to the other log files.
My question is:
Is there any way to make that read timeout to occur on the slow log source
inputs while data is being written regularly into the other log files that are
being monitored?
A million thanks for any help you can provide.
Sincerely,
Russ Miranda
From: Miranda, Russel (US)
Sent: Monday, February 5, 2024 11:50 PM
To: [email protected]
Subject: imfile readTimeout behavior question
Hello rsyslog community,
I am working on a Centos 7 system that has rsyslog 8.2202.0 configured to
monitor and process perhaps 2 dozen log files with the imfile module.
The module is loaded with the module directive like this:
module(load="imfile" mode="inotify")
And the file references are set up using several config files in the
/etc/rsyslog.d directory containing entries similar to the following:
input(type="imfile"
File="/var/log/activity*.log"
Severity="info"
Tag="pfx:activity"
readTimeout="2"
reopenOnTruncate="on"
startmsg.regex="^[0-9]{4}-[0-9]{2}-[0-9]{2}"
)
Some of the referenced files normally receive one or more entries every second.
Other referenced files grow very slowly, receiving maybe one or two entries a
day.
We must be able to handle that some subset of the entries will be multi-line
entries.
Every entry begins with a date stamp. We use the startmsg.regex capability to
detect the date stamp at the start of a line so that imfile can correctly
detect the start of a new record.
We expect that if no new line is added to any specific file within 2 seconds,
then any existing line(s) of unprocessed data in that file can be considered a
complete record, so we have configured the "readTimeout" to be "2".
What we expect is that after 2 seconds of no additional content being added to
a file, the readTimeout timer will expire and the last line of that file will
be considered a complete entry and be processed.
What we are actually seeing is that as long as ANY monitored file has new data
added to it within 2 seconds, then no timer expires.
Because some files receive data every second, the last line in each slow
growing files is not processed until hours, maybe days after it is written,
when another entry is added and the date stamp on the new line matches the
startmsg.regex.
When the software on the host is running normally, the continuous stream of log
records being added to the frequently updated files prevents the readTimeout
from ever occurring. This makes it appear as if the readTimeout never expires.
I originally thought this was a recurrence of #1445
(https://github.com/rsyslog/rsyslog/issues/1445), which was fixed in 8.28 by
#1598 (https://github.com/rsyslog/rsyslog/pull/1598). Our version (8.2202.0)
should contain that fix (I do see it in the release note history). However, the
behavior is not exactly the same as described in #1445, because I can cause the
final lines to be processed by stopping the processes that are generating the
frequent log records. Once there are no new lines to process in any monitored
file, then the readTimeout occurs and the final lines of the slow growing files
are processed.
Is this the intended behavior? Am I misunderstanding the configuration of the
imfile module? Is there any way to get the timeouts to occur on a file-by-file
basis instead of requiring all of the monitored files to be unchanged for the
readTimeout?
I really wish I could disable the multi-line handling entirely, but as you
might guess, the reason we had to configure it this way is that some of the
files are generated by software that is not under our control.
I would prefer not to try to write "dummy" records into the files from another
process to push out the "straggler" final lines from the slow growing files.
Even though they aren't written very often, I just know that someday the
external process and the software we're trying to collect the data from will
both try to write at nearly the same time and the result will be a mangled mix
of the two records.
Thank you for any suggestions,
Russ Miranda
_______________________________________________
rsyslog mailing list
https://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.