Re: syslog Message Character Set

Alfonso De Gregorio Sun, 15 Oct 2000 05:44:15 -0700
Hi Everyone,

Premised: even though I don't want exclude international characters
support in syslog messages, I don't neighter want plead the support of
all code sets and/or Unicode. :-)
Unicode and uncommon code sets support should be present only if you believe 
it useful.

Going on with syslog Message Character Set thread...

In my opinion, it whould be better to include an indication of the used 
code set; particularly if we decide to support code sets with characters
longer than eight bits.

The sending syslogds should don't care about the coherence of the indicated
code sets and the message field content; since an attacker may maliciously
modify syslog packets in transit (obviously, also the code set indication). 
The receiving syslogds, instead, should handle such messages.

Which way is preferable to indicate the used code set?

Add a code set field to the syslog packet format is a useful solution.
True; this can lead to an incompatibility with some syslog messages
interpeters; however with some Unicode encodings there will be 
incompatibilities with such packages in any case.

It's not necessary to place the code set field at the beginning of messages.
Infact the code set used in the priority field MUST be seven-bit ASCII 
in an eight-bit field. So we can place it between the first field (Priority)
and the message field. It can be structured in this way (very alpha version):

==========================================================================
   The code set field starts with a leading "[" ('open-square-bracket'
   character), followed by a character, which is followed by a "]"
   ('close-square-bracket' character). This is OPTIONALLY followed by a
   single space character.
   The code set used in this field MUST be seven-bit ASCII in an
   eight-bit field.  These are the ASCII codes as defined in "USA
   Standard Code for Information Interchange" [2].  In this, the "["
   character has code 91, and the "]" character has code 93. The
   character is known as the Code Set and represents the character set 
   used in the message field. The Code Set character use codes 97
   (for "a") through 122 (for "z"). The OPTIONAL space character at
   the end of this field is code 32.
   All Character-Sets are shown in the following table along with their
   alphanumeric code values.

   Alphanumeric                 Character-Sets 
       Code

        a       7-bit-ASCII   seven-bit ASCII in an eight-bit field
        b       ISO 8859-1    west European languages (Latin-1)
        c       ISO 8859-2    east European languages (Latin-2)
        d       ISO 8859-3    southeast European and miscellaneous languages
                              (Latin-3)
        e       ISO 8859-4    Scandinavian/Baltic languages (Latin-4)
        f       ISO 8859-5    Latin/Cyrillic
        g       ISO 8859-6    Latin/Arabic
        h       ISO 8859-7    Latin/Greek
        i       ISO 8859-8    Latin/Hebrew
        j       ISO 8859-9    Latin-1 modification for Turkish (Latin-5)
        k       ISO 8859-10   Lappish/Nordic/Eskimo languages (Latin-6)
        l       UTF-8         Unicode in UTF-8
        m       UTF-16BE      Unicode in UTF-16, Big-endian encoding
        n       UTF-16LE      Unicode in UTF-16, Little-endian encoding
        o       UTF-32BE      Unicode in UTF-32, Big-endian encoding
        p       UTF-32LE      Unicode in UTF-32, Little-endian encoding
        q       [Reserved]
        r       [Reserved]
        s       [Reserved]
        t       [Reserved]
        u       [Reserved]
        v       [Reserved]
        w       [Reserved]
        x       [Reserved]
        y       [Reserved]
        z       [Reserved]
============================================================================

The code sets present in the table are simply exaples of possible supported
ones; in this version I've left out many of them (eg. JIS, EBCDIC, etc.).
And obviously the Code Set field should be resized if we choose to support
many code sets.

<doubtfully>
Alternatively, if the code set used is part of Unicode, maybe we can decide 
to indicate just what kind of encoding we are dealing with, and omit this 
information if the used characters are long no more than eight bits.
</doubtfully>
For this purpose we can use the following Unicode BOMs:

   00 00 FE FF UTF-32, big-endian
   FF FE 00 00 UTF-32, little-endian
   FE FF UTF-16, big-endian
   FF FE UTF-16, little-endian
   EF BB BF UTF-8


In case we'll choose to adopt one of these notations I've rewritten partially
Chris Lonvick's section 3 (Packet Format and Contents). Attached below.

I'll appreciate all your critics (also sour ones), feedbacks and thoughs.
With this e-mail I want only continue this discussion and not merely insist
on some issue. The important thing is not prime vulnerabilities in the new
syslog.

Sorry for my ugly English indeed :-)
Thanks, ciao
alfonso 

===========================================================================
3  Packet Format and Contents

   The syslog packet has three parts. The first part is the priority 
   field, the second part is the code set field, and the third part 
   is the message field. The priority field has three, four, five, or six 
   characters. The code set field has three or four characters.
   The message field may fill the remainder of the syslog packet.  
   There is no ending delimeter but the total length of the packet MUST be 
   1024 bytes or less.  There is no minimum length of the message field 
   although sending a syslog packet with no message is worthless and 
   SHOULD NOT be done.

   The priority field starts with a "<" ('less-than' character),
   followed by a number, which is followed by a ">" ('greater-than'
   character).  This is OPTIONALLY followed by a single space character.
   The code set used in this field MUST be seven-bit ASCII in an
   eight-bit field.  These are the ASCII codes as defined in "USA
   Standard Code for Information Interchange" [2].  In this, the "<"
   character has code 60, and the ">" character has code 62.  The number
   is known as the Priority and represents both the Facility and
   Severity as described below.  The Priority number consists of one,
   two, or three decimal integers using codes 48 (for "0") through 57
   (for 9).  The OPTIONAL space character at the end of this field is
   code 32.

   The code set field starts with a leading "[" ('open-square-bracket'
   character), followed by a character, which is followed by a "]"
   ('close-square-bracket' character). This is OPTIONALLY followed by a
   single space character.
   The code set used in this field MUST be seven-bit ASCII in an
   eight-bit field.  These are the ASCII codes as defined in "USA
   Standard Code for Information Interchange" [2].  In this, the "["
   character has code 91, and the "]" character has code 93. The
   character is known as the Code Set and represents the character set
   used in the message field. The Code Set character use codes 97
   (for "a") through 122 (for "z"). The OPTIONAL space character at
   the end of this field is code 32.
   All Character-Sets are shown in the following table along with their
   alphanumeric code values.

   Alphanumeric                 Character-Sets
       Code

        a       7-bit-ASCII   seven-bit ASCII in an eight-bit field
        b       ISO 8859-1    west European languages (Latin-1)
        c       ISO 8859-2    east European languages (Latin-2)
        d       ISO 8859-3    southeast European and miscellaneous languages
                              (Latin-3)
        e       ISO 8859-4    Scandinavian/Baltic languages (Latin-4)
        f       ISO 8859-5    Latin/Cyrillic
        g       ISO 8859-6    Latin/Arabic
        h       ISO 8859-7    Latin/Greek
        i       ISO 8859-8    Latin/Hebrew
        j       ISO 8859-9    Latin-1 modification for Turkish (Latin-5)
        k       ISO 8859-10   Lappish/Nordic/Eskimo languages (Latin-6)
        l       UTF-8         Unicode in UTF-8
        m       UTF-16BE      Unicode in UTF-16, Big-endian encoding
        n       UTF-16LE      Unicode in UTF-16, Little-endian encoding
        o       UTF-32BE      Unicode in UTF-32, Big-endian encoding
        p       UTF-32LE      Unicode in UTF-32, Little-endian encoding
        q       [Reserved]
        r       [Reserved]
        s       [Reserved]
        t       [Reserved]
        u       [Reserved]
        v       [Reserved]
        w       [Reserved]
        x       [Reserved]
        y       [Reserved]
        z       [Reserved]

   The message field MUST contains only characters pertaining to the
   code set indicated in Code-Set field. The code set traditionally and 
   most often used has also been seven-bit ASCII in an eight-bit field 
   like that used in the priority field (Code-Set value: "a"). When
   that code set is used, the alphanumerics and symbols are codes 32
   through 126.  Indication of the code set used within the message
   is required and MUST be specified in Code-Set field. The selection of 
   a code set and codes used in a message SHOULD be made with thoughts of 
   the intended receiver.  A message containing characters in a code set
   that cannot be viewed by a receiver will yield no information of
   value to an operator or administrator looking at it.

   As examples, these are valid messages as they may be observed on the
   wire between two devices.  Each message starts with the less-than
   character but has been indented, with line breaks inserted for
   readability.

     <37> [b] Oct 11 16:00:15 mymachine su: 'su root' failed for adg
     on /dev/pts/8

     <160> [a] Aug 24 1987 03:24:00 AM CST mymachine.&.process_manager %%
     It's time to make the do-nuts.  %%  Ingredients: Mix=OK, Jelly=OK
     # Devices: Mixer=OK, Jelly_Injector=OK, Frier=OK # Transport:
     Conveyer1=OK, Conveyer2=OK # %%

============================================================================

--
Alfonso De Gregorio,  [EMAIL PROTECTED]
Re: syslog Message Character Set

Reply via email to