Hi David: In message <alpine.deb.2.02.1601041043310.9...@nftneq.ynat.uz>, David Lang writes: >has anyone put together the code that would be needed to detect if sec or log >delivery is falling behind? something along the order of 'if the timestamp in >the logs is > X min behind current, alert'?
To detect slowdown in sec process, I didn't look at the timestamps, instead I compared the last lines in the input file to the last processed lines in SEC's buffer. Basic idea was discussed on http://sourceforge.net/p/simple-evcorr/mailman/message/30277509/ This is split for readability: compare=50 sudo sh -c "tail -n $compare /data/log/system/messages > /tmp/a; /etc/init.d/sec_rsyslog_master_mon dump | sed -ne '/Content of input buffer/,+101p' | tail -n $compare | comm -13 /tmp/a - | wc -l" where /data/log/system/messages is the input file that sec follows and syslog writes. Grab the last 50 lines, then dump sec's internal buffer state. Use sec to extract the lines in the buffer. Use tail to cut the 100 lines in the buffer to 50. Then use comm to see how many lines are only in the sec input file. The number of lines only in sec is the number of lines sec is behind the current input. Then to detect a delay in the transport, I have a job on each host that sends a heartbeat every 10 minutes and I include the time_t value in the heartbeat. Then process it with:. #****f* 10timestamp.sr/detect_process_delay # SYNOPSIS # Compare the time_t timestamp in the messages to current time to detect delays # DESCRIPTION # The current heartbeat messages include a timestamp in time_t # (seconds since Jan 1 1970). Use that time and compare to %u the # current process time for this rule. If > 10 seconds add it to a # context that is reported every 10 minutes. This rule passes the # event through to any additional rules in this ruleset for # consumption. # INPUTS # Sample input: # May 8 16:50:01 example02 heartbeat: 2010/05/08 16:50:01 \ # (1273337401) -- HEARTBEAT -- # NOTES # None. #****** type= single desc= Detect delays in syslog event transport/process continue= takenext ptype= regexp rem = $1 is hostname $2 is time_t on host where heartbeat was generated pattern= ([A-z0-9._-]+) heartbeat:.*\((\d+)\) -- HEARTBEAT --$ action= add delayed_heartbeat_events $0 %u rem = use perl function here to get time. %u doesn't work. context = !seeding_timestamps && !ignore_delay_$1 && =( $2 + 10 < time() ) to detect when transport is delayed. This just accumulates issues and reports them every 10 minutes. -- -- rouilj John Rouillard =========================================================================== My employers don't acknowledge my existence much less my opinions. ------------------------------------------------------------------------------ _______________________________________________ Simple-evcorr-users mailing list Simple-evcorr-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/simple-evcorr-users