Hi Everyone,
Premised: even though I don't want exclude international characters
support in syslog messages, I don't neighter want plead the support of
all code sets and/or Unicode. :-)
Unicode and uncommon code sets support should be present only if you believe
it useful.
Going on with syslog Message Character Set thread...
In my opinion, it whould be better to include an indication of the used
code set; particularly if we decide to support code sets with characters
longer than eight bits.
The sending syslogds should don't care about the coherence of the indicated
code sets and the message field content; since an attacker may maliciously
modify syslog packets in transit (obviously, also the code set indication).
The receiving syslogds, instead, should handle such messages.
Which way is preferable to indicate the used code set?
Add a code set field to the syslog packet format is a useful solution.
True; this can lead to an incompatibility with some syslog messages
interpeters; however with some Unicode encodings there will be
incompatibilities with such packages in any case.
It's not necessary to place the code set field at the beginning of messages.
Infact the code set used in the priority field MUST be seven-bit ASCII
in an eight-bit field. So we can place it between the first field (Priority)
and the message field. It can be structured in this way (very alpha version):
==========================================================================
The code set field starts with a leading "[" ('open-square-bracket'
character), followed by a character, which is followed by a "]"
('close-square-bracket' character). This is OPTIONALLY followed by a
single space character.
The code set used in this field MUST be seven-bit ASCII in an
eight-bit field. These are the ASCII codes as defined in "USA
Standard Code for Information Interchange" [2]. In this, the "["
character has code 91, and the "]" character has code 93. The
character is known as the Code Set and represents the character set
used in the message field. The Code Set character use codes 97
(for "a") through 122 (for "z"). The OPTIONAL space character at
the end of this field is code 32.
All Character-Sets are shown in the following table along with their
alphanumeric code values.
Alphanumeric Character-Sets
Code
a 7-bit-ASCII seven-bit ASCII in an eight-bit field
b ISO 8859-1 west European languages (Latin-1)
c ISO 8859-2 east European languages (Latin-2)
d ISO 8859-3 southeast European and miscellaneous languages
(Latin-3)
e ISO 8859-4 Scandinavian/Baltic languages (Latin-4)
f ISO 8859-5 Latin/Cyrillic
g ISO 8859-6 Latin/Arabic
h ISO 8859-7 Latin/Greek
i ISO 8859-8 Latin/Hebrew
j ISO 8859-9 Latin-1 modification for Turkish (Latin-5)
k ISO 8859-10 Lappish/Nordic/Eskimo languages (Latin-6)
l UTF-8 Unicode in UTF-8
m UTF-16BE Unicode in UTF-16, Big-endian encoding
n UTF-16LE Unicode in UTF-16, Little-endian encoding
o UTF-32BE Unicode in UTF-32, Big-endian encoding
p UTF-32LE Unicode in UTF-32, Little-endian encoding
q [Reserved]
r [Reserved]
s [Reserved]
t [Reserved]
u [Reserved]
v [Reserved]
w [Reserved]
x [Reserved]
y [Reserved]
z [Reserved]
============================================================================
The code sets present in the table are simply exaples of possible supported
ones; in this version I've left out many of them (eg. JIS, EBCDIC, etc.).
And obviously the Code Set field should be resized if we choose to support
many code sets.
<doubtfully>
Alternatively, if the code set used is part of Unicode, maybe we can decide
to indicate just what kind of encoding we are dealing with, and omit this
information if the used characters are long no more than eight bits.
</doubtfully>
For this purpose we can use the following Unicode BOMs:
00 00 FE FF UTF-32, big-endian
FF FE 00 00 UTF-32, little-endian
FE FF UTF-16, big-endian
FF FE UTF-16, little-endian
EF BB BF UTF-8
In case we'll choose to adopt one of these notations I've rewritten partially
Chris Lonvick's section 3 (Packet Format and Contents). Attached below.
I'll appreciate all your critics (also sour ones), feedbacks and thoughs.
With this e-mail I want only continue this discussion and not merely insist
on some issue. The important thing is not prime vulnerabilities in the new
syslog.
Sorry for my ugly English indeed :-)
Thanks, ciao
alfonso
===========================================================================
3 Packet Format and Contents
The syslog packet has three parts. The first part is the priority
field, the second part is the code set field, and the third part
is the message field. The priority field has three, four, five, or six
characters. The code set field has three or four characters.
The message field may fill the remainder of the syslog packet.
There is no ending delimeter but the total length of the packet MUST be
1024 bytes or less. There is no minimum length of the message field
although sending a syslog packet with no message is worthless and
SHOULD NOT be done.
The priority field starts with a "<" ('less-than' character),
followed by a number, which is followed by a ">" ('greater-than'
character). This is OPTIONALLY followed by a single space character.
The code set used in this field MUST be seven-bit ASCII in an
eight-bit field. These are the ASCII codes as defined in "USA
Standard Code for Information Interchange" [2]. In this, the "<"
character has code 60, and the ">" character has code 62. The number
is known as the Priority and represents both the Facility and
Severity as described below. The Priority number consists of one,
two, or three decimal integers using codes 48 (for "0") through 57
(for 9). The OPTIONAL space character at the end of this field is
code 32.
The code set field starts with a leading "[" ('open-square-bracket'
character), followed by a character, which is followed by a "]"
('close-square-bracket' character). This is OPTIONALLY followed by a
single space character.
The code set used in this field MUST be seven-bit ASCII in an
eight-bit field. These are the ASCII codes as defined in "USA
Standard Code for Information Interchange" [2]. In this, the "["
character has code 91, and the "]" character has code 93. The
character is known as the Code Set and represents the character set
used in the message field. The Code Set character use codes 97
(for "a") through 122 (for "z"). The OPTIONAL space character at
the end of this field is code 32.
All Character-Sets are shown in the following table along with their
alphanumeric code values.
Alphanumeric Character-Sets
Code
a 7-bit-ASCII seven-bit ASCII in an eight-bit field
b ISO 8859-1 west European languages (Latin-1)
c ISO 8859-2 east European languages (Latin-2)
d ISO 8859-3 southeast European and miscellaneous languages
(Latin-3)
e ISO 8859-4 Scandinavian/Baltic languages (Latin-4)
f ISO 8859-5 Latin/Cyrillic
g ISO 8859-6 Latin/Arabic
h ISO 8859-7 Latin/Greek
i ISO 8859-8 Latin/Hebrew
j ISO 8859-9 Latin-1 modification for Turkish (Latin-5)
k ISO 8859-10 Lappish/Nordic/Eskimo languages (Latin-6)
l UTF-8 Unicode in UTF-8
m UTF-16BE Unicode in UTF-16, Big-endian encoding
n UTF-16LE Unicode in UTF-16, Little-endian encoding
o UTF-32BE Unicode in UTF-32, Big-endian encoding
p UTF-32LE Unicode in UTF-32, Little-endian encoding
q [Reserved]
r [Reserved]
s [Reserved]
t [Reserved]
u [Reserved]
v [Reserved]
w [Reserved]
x [Reserved]
y [Reserved]
z [Reserved]
The message field MUST contains only characters pertaining to the
code set indicated in Code-Set field. The code set traditionally and
most often used has also been seven-bit ASCII in an eight-bit field
like that used in the priority field (Code-Set value: "a"). When
that code set is used, the alphanumerics and symbols are codes 32
through 126. Indication of the code set used within the message
is required and MUST be specified in Code-Set field. The selection of
a code set and codes used in a message SHOULD be made with thoughts of
the intended receiver. A message containing characters in a code set
that cannot be viewed by a receiver will yield no information of
value to an operator or administrator looking at it.
As examples, these are valid messages as they may be observed on the
wire between two devices. Each message starts with the less-than
character but has been indented, with line breaks inserted for
readability.
<37> [b] Oct 11 16:00:15 mymachine su: 'su root' failed for adg
on /dev/pts/8
<160> [a] Aug 24 1987 03:24:00 AM CST mymachine.&.process_manager %%
It's time to make the do-nuts. %% Ingredients: Mix=OK, Jelly=OK
# Devices: Mixer=OK, Jelly_Injector=OK, Frier=OK # Transport:
Conveyer1=OK, Conveyer2=OK # %%
============================================================================
--
Alfonso De Gregorio, [EMAIL PROTECTED]