[OPSAWG] Management of logging

Tom Taylor Tue, 01 Oct 2013 04:52:39 -0700

While working on draft-ietf-behave-syslog-nat-logging-04, I added aManagement Considerations section intended to highlight requirements formanagement of the logging system. I included the following generalrequirements. These are my best guess, but undoubtedly people with realoperational experience will correct me. Please look this over and comment.


Note that I have cross-posted to Behave and OPSAWG.


Tom Taylor

7.1.  General Requirements For Control Of Logging

   This document assumes that any implementation provides the following
   capabilities, discussed in more detail below:

   o  ability to configure the PRI value of each event report type at
      the granularity of (APP-ID, MSGID) combination;

   o  ability at each collector to determine that event reports that it
      should have received have been lost.  The required granularity is
      at least at the level of PRI and may be finer for some event
      types.

   o  ability to configure criteria to automatically suppress the
      generation of event reports while the criteria are met, at the
      granularity of (APP-ID, MSGID) combination.

7.1.1.  Configuration of PRI Value

   The PRI value is composed of two numbers, the Facility value and the
   Severity.  It may be used at the origin for selecting logs to streams
   being dispatched to different collectors, and in applications beyond
   the collectors to prioritize display of logs to operators.  The event
   reports in this document have been structured such that the Severity
   level varies between event types as represented by (APP-ID, MSGID)
   combination.  As an extreme example, the address pool high- water-
   mark threshold event (APP-ID="NATMTC", MSGID="POOLHT") is obviously
   more urgent than the low-water-mark threshold event
   (APP-ID="NATMTC", MSGID="POOLLT").

   To some extent, this document tries to simplify message routing by
   making a general distinction between event types recording the
   allocation of resources to hosts (with APP-ID="NAT") and events of
   interest to operations and maintenance (with APP-ID="NATMTC").  The
   need to provide different Severity levels for different event types
   remains.

7.1.2.  Ability For Each Collector To Detect Lost Event Reports

   Operators have a need to know when a given collector has not received
   all of the event reports it should have.  It probably does not matter
   if less-important events are tracked at the granularity of event type
   (APP-ID, MSGID combination), by APP-ID, or just by PRI value.

   The event types defined in this document relating to allocation of
   resources to hosts are a special case.  Regulatory requirements or
   the possibility that such reports might be introduced into court in
   cases such as abuse impose a requirement that the record of
   allocations to a particular host be complete.  This requirement is
   important enough to be stated in the Security Considerations section
   Section 8, where the implementation of signed SYSLOG messages
   [RFC5848], which also provides message sequencing, is mandated as
   part of this specification.

   In deploying [RFC5848], the operator needs to decide the level of
   granularity of tracking, whether it should be over the whole set of
   reports covered by APP-ID="NAT" or at a finer level.  This judgement
   has to be tempered by local circumstances.  One point to note is that
   since both creations/allocations and deletions/deallocations are
   recorded, a certain amount of redundancy is available in the reports
   being generated.  However, without both the creation and deletion
   timestamps, there is no definitive evidence of the specific period of
   time during which the resources concerned were allocated to a
   specific host.

7.1.3.  Ability To Suppress Event Reports

   The event report types specified with APP-ID="NATMTC" all relate to
   limits or thresholds.  By their nature, events of this sort will come
   in bursts.  The limit or threshold will be hit, the resource
   concerned will remain busy for a period, then pressure on the
   resource will ease.  Depending on the resource, possibly hundreds of
   instances of the event concerned will be detected during a single
   busy period.

   Where repeated events involve the same resource, it makes little
   sense to report all of them, since the NAT MIB counters provide the
   necessary information more succinctly.  On the other hand, it can be
   useful to know that the fragmentation limit, for instance, is being
   hit by successive packets from the same source address.

   As a result of these considerations, this document requires that
   implementations MUST provide means to configure limits on the rate at
   which event reports of a given type (APP-ID, MSGID combination) are
   generated.  This document RECOMMENDs that it be possible to specify
   two values per (APP-ID, MSGID) combination:

   o  minimum time between initial instances of a given event report
      type;

   o  maximum number of instances of the event report to generate per
      busy period.

   The ability to suppress event reports MUST NOT interfere with the
   requirement to detect lost messages.  This has implications for any
   sequence numbering used for that purpose.  It is RECOMMENDED in any
   event that the implementation provide counters of numbers of
   suppressed messages by event type.

   Just to state the obvious, given the need for a full record, an
   operator will not wish to enable suppression of the APP-ID="NAT"
   event reports.

_______________________________________________
OPSAWG mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsawg

[OPSAWG] Management of logging

Reply via email to