While working on draft-ietf-behave-syslog-nat-logging-04, I added a
Management Considerations section intended to highlight requirements for
management of the logging system. I included the following general
requirements. These are my best guess, but undoubtedly people with real
operational experience will correct me. Please look this over and comment.
Note that I have cross-posted to Behave and OPSAWG.
Tom Taylor
7.1. General Requirements For Control Of Logging
This document assumes that any implementation provides the following
capabilities, discussed in more detail below:
o ability to configure the PRI value of each event report type at
the granularity of (APP-ID, MSGID) combination;
o ability at each collector to determine that event reports that it
should have received have been lost. The required granularity is
at least at the level of PRI and may be finer for some event
types.
o ability to configure criteria to automatically suppress the
generation of event reports while the criteria are met, at the
granularity of (APP-ID, MSGID) combination.
7.1.1. Configuration of PRI Value
The PRI value is composed of two numbers, the Facility value and the
Severity. It may be used at the origin for selecting logs to streams
being dispatched to different collectors, and in applications beyond
the collectors to prioritize display of logs to operators. The event
reports in this document have been structured such that the Severity
level varies between event types as represented by (APP-ID, MSGID)
combination. As an extreme example, the address pool high- water-
mark threshold event (APP-ID="NATMTC", MSGID="POOLHT") is obviously
more urgent than the low-water-mark threshold event
(APP-ID="NATMTC", MSGID="POOLLT").
To some extent, this document tries to simplify message routing by
making a general distinction between event types recording the
allocation of resources to hosts (with APP-ID="NAT") and events of
interest to operations and maintenance (with APP-ID="NATMTC"). The
need to provide different Severity levels for different event types
remains.
7.1.2. Ability For Each Collector To Detect Lost Event Reports
Operators have a need to know when a given collector has not received
all of the event reports it should have. It probably does not matter
if less-important events are tracked at the granularity of event type
(APP-ID, MSGID combination), by APP-ID, or just by PRI value.
The event types defined in this document relating to allocation of
resources to hosts are a special case. Regulatory requirements or
the possibility that such reports might be introduced into court in
cases such as abuse impose a requirement that the record of
allocations to a particular host be complete. This requirement is
important enough to be stated in the Security Considerations section
Section 8, where the implementation of signed SYSLOG messages
[RFC5848], which also provides message sequencing, is mandated as
part of this specification.
In deploying [RFC5848], the operator needs to decide the level of
granularity of tracking, whether it should be over the whole set of
reports covered by APP-ID="NAT" or at a finer level. This judgement
has to be tempered by local circumstances. One point to note is that
since both creations/allocations and deletions/deallocations are
recorded, a certain amount of redundancy is available in the reports
being generated. However, without both the creation and deletion
timestamps, there is no definitive evidence of the specific period of
time during which the resources concerned were allocated to a
specific host.
7.1.3. Ability To Suppress Event Reports
The event report types specified with APP-ID="NATMTC" all relate to
limits or thresholds. By their nature, events of this sort will come
in bursts. The limit or threshold will be hit, the resource
concerned will remain busy for a period, then pressure on the
resource will ease. Depending on the resource, possibly hundreds of
instances of the event concerned will be detected during a single
busy period.
Where repeated events involve the same resource, it makes little
sense to report all of them, since the NAT MIB counters provide the
necessary information more succinctly. On the other hand, it can be
useful to know that the fragmentation limit, for instance, is being
hit by successive packets from the same source address.
As a result of these considerations, this document requires that
implementations MUST provide means to configure limits on the rate at
which event reports of a given type (APP-ID, MSGID combination) are
generated. This document RECOMMENDs that it be possible to specify
two values per (APP-ID, MSGID) combination:
o minimum time between initial instances of a given event report
type;
o maximum number of instances of the event report to generate per
busy period.
The ability to suppress event reports MUST NOT interfere with the
requirement to detect lost messages. This has implications for any
sequence numbering used for that purpose. It is RECOMMENDED in any
event that the implementation provide counters of numbers of
suppressed messages by event type.
Just to state the obvious, given the need for a full record, an
operator will not wish to enable suppression of the APP-ID="NAT"
event reports.
_______________________________________________
OPSAWG mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsawg