RE: syslog-int

Rainer Gerhards Thu, 14 Aug 2003 11:21:13 -0700

Hi Anton,

Now the second response ;)


First of all, I will revise the ID in respect to
ftp://ftp.rfc-editor.org/in-notes/rfc3536.txt, which I initially did not
pay proper attention to. This may bring a number of changes. I can not
do it this week, but most probably have done so by the end of next week.

> 1. I think it was a good idea to put language headers into the MSG
> part as opposed to other syslog fields. This would allow me, for
> example, storing the message on disk in the same format and/or re-use
> the language header even when bypassing syslog.

This is one of the intentions.

It leaves one problem, though - the hostname may include international
fields. But I think we can solve this with DNS internationalization. I
didn't manage to get this into the -00 ID but it may be worth talking a
little bit about the header. As of now, it is left out, which could
cause some issues. On the other hand, the current format makes
syslog-international highly compatible and integratable within an
existing syslog infrastructure.
>
> 2. The LANGUAGE field.  Is it necessary?  Can it be omitted?  I can
> easily see a situation where an application supporting foreign
> languages does not know the source language.  For example, my
> applications takes usernames in any language (in Unicode).  Now, I
> need to produce a syslog message which has the username in it. What am
> I going to put in the LANGUAGE field?

Good point. As you mentioned, it is in because of RFC2277. I think it
must be a user-configurable option. Maybe it should be "-" if the
emiting syslog client does not know...

Any more comments anyone?

>
> Also, there can be situations (bizarre as they are) when a message
> contains multiple languages in it. With Unicode-based encodings, it is
> not a problem.  What happens in this case?
>
> I'd propose removing the LANGUAGE field.

If I think about it, it is actually not bizarre - think of e.g. Japanese
or French names inside an English message (or substrings).

What does the rest of the WG think?

>
> 3. RFC1766 referenced in the LANGUAGE field spec is officially
> obsolete.  Need to address it one way or another if the LANGUAGE field
> stays?

4.3 of RFC2277 says that it should be used - this is why it is in.

>
> 4. I am concerned about attempting to support ALL encodings and ALL
> charsets. Is it necessary?  If this is a new standard, why do we need
> to provide for support of all legacy encodings?  Why can't we just
> standardize on Unicode encoded as straight binary UTF-8 or UTF-7?

I know there are some issues with Unicode acceptance and there may also
be some missing Unicode libs... As such, I would like to see (for Japan)
at least EUC and probably (S)JIS supported. In European languages, it is
the same thing.

> Maybe also provide for support of UTF-16 for more compact
> representation of Asian languages.  Either one will cover all
> languages and would not require getting into the business of
> interpreting locale/language specific charsets and many encodings.

Of course, we can also specify that all should be Unicode based and the
syslog client/collector needs to do the conversion. I just have the
feeling that this puts a lot of burden on the syslog side which again
will hinder implementation. But of course I am more than willing to
change this if there is enough support in this WG for plain Unicode...

If we go Unicode, we can only specify the UTF-7 encoding for the reasons
stated above. As of now, we would have a single encoding allowed.

>
> So, can we do away with the CHARSET field and only use Unicode-based
> and US-ASCII encodings in the ENCODING field?
>
> 5. Even if we support multiple encodings, it would probably help to
> specify a few that  MUST be supported.  Otherwise, how do we ensure
> interoperability if each implementation can choose its own set of
> supported encodings/charsets?
>
> 6. Can we specify that the absence of the header should be interpreted
> as UTF-8 encoded message?  A UTF-8 compliant syslog daemon will be
> backwards compatible as it would be able to receive message sent in
> plain US-ASCII as this range of characters is encoded identically.

See above for UTF-8. I really think a lot of the discussion has been and
will focus on the 8 bit char transport.

I would like to not re-specify the syslog frame format, as this removes
a lot of the ease of implementation. I also think that it really does
not belong in here - but I may be wrong. I also see a lot of issues if a
syslog-international enabled syslog client needs to talk to a e.g.
RFC3195 syslogd not supporting syslog-international. There is no defined
way of downgrading... Mmmhhh... Well, we could specify that it must send
in UTF-7 in this case (should be readable in the old syslogd) but if it
knows it will be talking to a syslog-international enabled server, it
will use UTF-8.

Does this sound like a solution? I still have the difficulty that I am
somehow mixing up the layers to solve this issue, but this may be not a
real concern...

Comments, anyone?
>
> 7. Just a minor thing.  The spec and examples show the special prefix
> as "@#i18n", but there is a note in the spec that the "i" should be
> capital "I".  Why not change it in the examples then?

;) changed it in the -01 ID.

>
> 8. With internationalization of messages, the length of messages will
> inevitably grow.  Should we provide for multi-part messages to
> overcome the 1024 byte limit?  Multi-part messages could have the same
> standard syslog header along with some additional part-seq-number.  I
> guess this also required msg-seq-number.  If we don't deal with
> multi-part messages as part of syslog RFC, applications will invent
> their own mechanisms to identify messages that belong together.  A
> standard mechanism would be nice.

This is where MORE and SEQNO are meant for. I apologize, it is very
briefly described in the current ID, it went in immediately before
publishing. I preferred to have it in right from the begining. I just
called it "fragmentation" and not "multi-part-messages". Maybe the later
term is more appropriate, I appreciate feedback from the native speakers
(to me, it looks like the same thing looked at from two different
sides...).

"  MORE and SEQNO provide support for syslog messages larger than the
   allowed syslog packet size. This is introduced to allow transmittal
   of "oversized" message, which may be the result of some character
   sets and encodings. These messages will be fragmented by the syslog
   client and reconstructed by the collector. Relays will pass them
   through unmodifed.

   Message fragmentation MAY be used if the underlying transport
   provides reliable and in-order delivery (for example RFC 3195 [8]).
   It the underlying transport is unreliable or its reliability is not
   known, fragmentation MUST NOT be used.

   More specifies wether this is the final fragment of the message or
   not. An asterisk ("*") means that at least one more fragement will
   follow. A period (".") means that this is the final (or only)
   fragment.

   SEQNO specifies the sequence number of fragments. It MUST start by 0
   for the first fragment and MUST be incremented by 1 for each
   following fragment. SEQNO MUST restart at 0 for each new full
   message. A new full message begins after the last message that had
   "." in MORE.
"

I have intentionally limited fragmentation to reliable transports - with
UDP, we would need to include reliability to ensure ordered delivery
which I think is far beyond the scope of syslog-international. I also
don't like the idea of sending frames multiple times as is done in
syslog-sign. In -sign, this is just done for some key frames. If we
would use a similar approach in -international, we would introduce
dramatic overhead.

Comments?

Looking forward to all replies. Just be warned that I am out of office
from the 14th until the 19th and will probably not be able to reply to
any postings in this period. Will follow-up when I am back.

Rainer

RE: syslog-int

Reply via email to