OK, enough bickering about the fine points of how timestamps
should go over the wire.  Here's a specific straw-man
proposal for a fully cooked protocol.  The basic idea is to
use the XML-style structure of unalog, but with the more
compact name=value representation of ULM.

Data elements are taking pretty much straight from ULM, with
some extensions.  Required data elements are:

    * LVL=0..99 - ULM does specific strings, but theirs are
      not really levels, but sort of a combination of level
      and facility.  This is a pure level, with mappings to
      the existing syslog and SNMP levels spelled out in the
      spec.
    * HOST=string - unique identifier for the host creating
      the log message
    * PROG=string - name of the program generating the
      message.  Probably need some standard "special" names
      like "kern".  Can also add the facility concept here by
      doing things like MAIL/MTA/sendmail,
      MAIL/ACCESS/popper, KERN/VM, KERN/FS, though we need to
      define some of the base "facilities".
    * DATE=YYMMDhhmmss.fraction (unlike ULM, this must be in
      UTC).

Optional data elements are:

    * TYPE=string - event type such as AUTH.FAIL,
      AUTH.SUCCESS, PROG.FAIL, PROG.START, etc. -
      extends/subsumes ULM's STAT
    * TZ=[+-]hhmm - extension to ULM to show timezone of
      originating process
    * LANG=<ISO-3366 2-letter code>
    * DUR=seconds.decimal - duration of event in seconds
    * SRC.ADDR=(IPv4 dotted notation, IPv6 16HEX, others?) -
      ULM uses .IP, which seems restrictive to IP-only.
    * SRC.FQDN=string
    * SRC.NAME=string - some identifier of the source system
      other than address or FQDN.
    * SRC.USR=string (user name or similar)
    * SRC.MAIL=string (e-mail address)
    * DST.ADDR, DST.FQDN, DST.NAME, DST.USR, DST.MAIL - dest
      instead of source
    * REL.ADDR, REL.FQDN, REL.NAME, REL.USR, REL.MAIL -
      relay/proxy instead of source/dest
    * VOL, VOL.SENT, VOL.RCVD, CNT, CNT.SENT, CNT.RCVD -
      volume in byest and count (articles, files, events,
      etc.)
    * PROG.FILE, PROG.LINE - name (and line number) of the
      program source file from which the message was
      generated (useful for messages like "out of memory",
      "can't fork", "assert failed", etc.).
    * TTY=string - tty or other description of user's
      physical connection to the host
    * DOC=string - name of an accessed document, such as an
      FTP file, a newsgroup, or a URL.
    * PROT=string - protocol used such as ESMTP, SSH2, etc.
    * CMD=string - an issued command
    * MSG=string - free form message text

Structure:

Since we're talking variables here, it would be nice to be
able to provide context in the message stream.  This is done
by putting end-messages in <M name=value ...> and then
wrapping contexts within <CNTXT name=value ...>name=value,
...</CNTXT>:

      <CNTXT HOST=myhost.somewhere.com>
        <M LVL=22 PROG=sendmail DATE=19991025105522
      FAC=MAIL TYPE=PROG.START DOC=/var/log/whatever
      MSG="processing queue">
        <CNTXT LVL=80 PROG=tripwire DATE=19991025105519
      FAC=AUDIT TYPE=AUTH.FAIL MSG="Expected mode 0444,
      saw mode 0644">
          <M DOC=/a/b/c/d>
          <M DOC=/a/b/c/e>
          <M DOC=/a/b/c/f>
          <M DOC=/a/b/c/g>
          </CNTXT>
        </CNTXT>

As you can see, adding this bit of syntactic sugar can
decrease the overall byte count quite a bit and makes up for
the extra bytes needed to support the flexibility of this
scheme.  Additionally, by assuming that message streams not
beginning with < are <M>, we can whittle down the number of
bytes needed to xmit simple messages over UDP (though
obviously still much higher than existing syslog).  The
reference implementation can also be made to process
existing syslog format and convert to this format.

One drawback to such structure is that it makes the logs
kind of hard for humans to follow, but that problem can be
solved by having an "expansion" script/program.  The example
above would expand to:

      HOST=myhost.somewhere.com LVL=22 PROG=sendmail
      DATE=19991025105522 FAC=MAIL TYPE=PROG.START
      DOC=/var/log/whatever MSG="processing queue"

      HOST=myhost.somewhere.com LVL=80 PROG=tripwire
      DATE=19991025105519 FAC=SEC TYPE=AUTH.FAIL
      MSG="Expected mode 0444, saw mode 0644"
      DOC=/a/b/c/d

      HOST=myhost.somewhere.com LVL=80 PROG=tripwire
      DATE=19991025105519 FAC=SEC TYPE=AUTH.FAIL
      MSG="Expected mode 0444, saw mode 0644"
      DOC=/a/b/c/e

      HOST=myhost.somewhere.com LVL=80 PROG=tripwire
      DATE=19991025105519 FAC=SEC TYPE=AUTH.FAIL
      MSG="Expected mode 0444, saw mode 0644"
      DOC=/a/b/c/f

      HOST=myhost.somewhere.com LVL=80 PROG=tripwire
      DATE=19991025105519 FAC=SEC TYPE=AUTH.FAIL
      MSG="Expected mode 0444, saw mode 0644"
      DOC=/a/b/c/f

Authentication and encryption:

Most existing secure syslog implementations have extended
the protocol with network-level hop-by-hop authentication
and encryption.  What I'm after, however, is something in
the data stream itself so that the data can pass through
un-trusted third parties and so that third parties can
independently verify that logs stored on disk/tape/whatever
have not been modified.

To do this, I introduce <CRYPT> and <AUTH>.  Auth works
something like this:

      <AUTH SIGNER>stuff to be signed</AUTH
      SIG.TYPE=md5, SIG.VALUE=j43kj3248>

Where the signature value is encoded in base-64.

Crypt with static previously exchanged keys looks like:

      <CRYPT [EMAIL PROTECTED]
      [EMAIL PROTECTED] KEY=STATIC
      CIPHER=static>

           base-64 representation of stream
           encrypted in blowfish using previously
           exchanged key in previously exchanged
           cipher
           </CRYPT>

Crypt with public-key (which also provides sender and
recipient authentication) looks like this:

      <CRYPT [EMAIL PROTECTED]
      [EMAIL PROTECTED] TYPE=RSA
      CIPHER=blowfish KEY=jfkadj43098kjer>

           base-64 representation of stream
           encrypted in blowfish using key
           specified above, itself encrypted using
           [EMAIL PROTECTED]'s private RSA key and
           [EMAIL PROTECTED]'s public RSA key
           </CRYPT>

You could also leave out the ENCRYPTOR to get the "anyone
could have written it but only this guy can read it"
situation (which can be fixed with an extra <AUTH>).  Or
leave out the RECIPIENT to get "any of a small number of
people I've sent my public key to can read it but only I
could have sent it".

The reference implementation will also need code to process
the authentication and encryption records.  Perhaps the
"expand" function mention above can automagically decrypt
and stick in AUTH=NONE, AUTH=<signer>, and AUTH=!<signer>
(for failed authentication).

--
Chris Calabrese
Internet Infrastructure and Security
Merck-Medco Managed Care, L.L.C.
[EMAIL PROTECTED]

Reply via email to