RE: syslog-int

2003-08-14 Thread Rainer Gerhards
Hi Anton,

Now the second response ;)

First of all, I will revise the ID in respect to
ftp://ftp.rfc-editor.org/in-notes/rfc3536.txt, which I initially did not
pay proper attention to. This may bring a number of changes. I can not
do it this week, but most probably have done so by the end of next week.

 1. I think it was a good idea to put language headers into the MSG
 part as opposed to other syslog fields. This would allow me, for
 example, storing the message on disk in the same format and/or re-use
 the language header even when bypassing syslog.

This is one of the intentions.

It leaves one problem, though - the hostname may include international
fields. But I think we can solve this with DNS internationalization. I
didn't manage to get this into the -00 ID but it may be worth talking a
little bit about the header. As of now, it is left out, which could
cause some issues. On the other hand, the current format makes
syslog-international highly compatible and integratable within an
existing syslog infrastructure.

 2. The LANGUAGE field.  Is it necessary?  Can it be omitted?  I can
 easily see a situation where an application supporting foreign
 languages does not know the source language.  For example, my
 applications takes usernames in any language (in Unicode).  Now, I
 need to produce a syslog message which has the username in it. What am
 I going to put in the LANGUAGE field?

Good point. As you mentioned, it is in because of RFC2277. I think it
must be a user-configurable option. Maybe it should be - if the
emiting syslog client does not know...

Any more comments anyone?


 Also, there can be situations (bizarre as they are) when a message
 contains multiple languages in it. With Unicode-based encodings, it is
 not a problem.  What happens in this case?

 I'd propose removing the LANGUAGE field.

If I think about it, it is actually not bizarre - think of e.g. Japanese
or French names inside an English message (or substrings).

What does the rest of the WG think?


 3. RFC1766 referenced in the LANGUAGE field spec is officially
 obsolete.  Need to address it one way or another if the LANGUAGE field
 stays?

4.3 of RFC2277 says that it should be used - this is why it is in.


 4. I am concerned about attempting to support ALL encodings and ALL
 charsets. Is it necessary?  If this is a new standard, why do we need
 to provide for support of all legacy encodings?  Why can't we just
 standardize on Unicode encoded as straight binary UTF-8 or UTF-7?

I know there are some issues with Unicode acceptance and there may also
be some missing Unicode libs... As such, I would like to see (for Japan)
at least EUC and probably (S)JIS supported. In European languages, it is
the same thing.

 Maybe also provide for support of UTF-16 for more compact
 representation of Asian languages.  Either one will cover all
 languages and would not require getting into the business of
 interpreting locale/language specific charsets and many encodings.

Of course, we can also specify that all should be Unicode based and the
syslog client/collector needs to do the conversion. I just have the
feeling that this puts a lot of burden on the syslog side which again
will hinder implementation. But of course I am more than willing to
change this if there is enough support in this WG for plain Unicode...

If we go Unicode, we can only specify the UTF-7 encoding for the reasons
stated above. As of now, we would have a single encoding allowed.


 So, can we do away with the CHARSET field and only use Unicode-based
 and US-ASCII encodings in the ENCODING field?

 5. Even if we support multiple encodings, it would probably help to
 specify a few that  MUST be supported.  Otherwise, how do we ensure
 interoperability if each implementation can choose its own set of
 supported encodings/charsets?

 6. Can we specify that the absence of the header should be interpreted
 as UTF-8 encoded message?  A UTF-8 compliant syslog daemon will be
 backwards compatible as it would be able to receive message sent in
 plain US-ASCII as this range of characters is encoded identically.

See above for UTF-8. I really think a lot of the discussion has been and
will focus on the 8 bit char transport.

I would like to not re-specify the syslog frame format, as this removes
a lot of the ease of implementation. I also think that it really does
not belong in here - but I may be wrong. I also see a lot of issues if a
syslog-international enabled syslog client needs to talk to a e.g.
RFC3195 syslogd not supporting syslog-international. There is no defined
way of downgrading... Mmmhhh... Well, we could specify that it must send
in UTF-7 in this case (should be readable in the old syslogd) but if it
knows it will be talking to a syslog-international enabled server, it
will use UTF-8.

Does this sound like a solution? I still have the difficulty that I am
somehow mixing up the layers to solve this issue, but this may be not a
real concern...

Comments, anyone?

 

RE: syslog-int

2003-08-14 Thread Anton Okmianski
Hi!

I assume this is the forum to provide the feedback on this draft.

   Title   : Syslog-international Protocol
   Author(s)   : R. Gerhards
   Filename: draft-ietf-syslog-international-00.txt
   Pages   : 13
   Date: 2003-8-1
http://www.ietf.org/internet-drafts/draft-ietf-syslog-international-00
.txt


First of all, thanks for putting the draft together!

1. I think it was a good idea to put language headers into the MSG
part as opposed to other syslog fields. This would allow me, for
example, storing the message on disk in the same format and/or re-use
the language header even when bypassing syslog.

2. The LANGUAGE field.  Is it necessary?  Can it be omitted?  I can
easily see a situation where an application supporting foreign
languages does not know the source language.  For example, my
applications takes usernames in any language (in Unicode).  Now, I
need to produce a syslog message which has the username in it. What am
I going to put in the LANGUAGE field?

Also, there can be situations (bizarre as they are) when a message
contains multiple languages in it. With Unicode-based encodings, it is
not a problem.  What happens in this case?

I'd propose removing the LANGUAGE field.

3. RFC1766 referenced in the LANGUAGE field spec is officially
obsolete.  Need to address it one way or another if the LANGUAGE field
stays?

4. I am concerned about attempting to support ALL encodings and ALL
charsets. Is it necessary?  If this is a new standard, why do we need
to provide for support of all legacy encodings?  Why can't we just
standardize on Unicode encoded as straight binary UTF-8 or UTF-7?
Maybe also provide for support of UTF-16 for more compact
representation of Asian languages.  Either one will cover all
languages and would not require getting into the business of
interpreting locale/language specific charsets and many encodings.

So, can we do away with the CHARSET field and only use Unicode-based
and US-ASCII encodings in the ENCODING field?

5. Even if we support multiple encodings, it would probably help to
specify a few that  MUST be supported.  Otherwise, how do we ensure
interoperability if each implementation can choose its own set of
supported encodings/charsets?

6. Can we specify that the absence of the header should be interpreted
as UTF-8 encoded message?  A UTF-8 compliant syslog daemon will be
backwards compatible as it would be able to receive message sent in
plain US-ASCII as this range of characters is encoded identically.

7. Just a minor thing.  The spec and examples show the special prefix
as @#i18n, but there is a note in the spec that the i should be
capital I.  Why not change it in the examples then?

8. With internationalization of messages, the length of messages will
inevitably grow.  Should we provide for multi-part messages to
overcome the 1024 byte limit?  Multi-part messages could have the same
standard syslog header along with some additional part-seq-number.  I
guess this also required msg-seq-number.  If we don't deal with
multi-part messages as part of syslog RFC, applications will invent
their own mechanisms to identify messages that belong together.  A
standard mechanism would be nice.

Thanks!

Anton.








RE: syslog-int

2003-08-14 Thread Anton Okmianski
RFC2277 IETF Policy on Character Sets and Languages could be of use
for syslog-international draft:
ftp://ftp.isi.edu/in-notes/rfc2277.txt

Of note:

 - It clearly favors UTF-8.  Does not mention UTF-7.
 - It seems to argue that language must be identified.

Anton.

 -Original Message-
 From: Anton Okmianski [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, August 12, 2003 11:30 AM
 To: [EMAIL PROTECTED]
 Cc: Nathan Sowatskey; David Bainbridge
 Subject: RE: syslog-int


 Hi!

 I assume this is the forum to provide the feedback on this draft.

  Title   : Syslog-international Protocol
  Author(s)   : R. Gerhards
  Filename: draft-ietf-syslog-international-00.txt
  Pages   : 13
  Date: 2003-8-1
 http://www.ietf.org/internet-drafts/draft-ietf-syslog-intern
 ational-00.txt


 First of all, thanks for putting the draft together!

 1. I think it was a good idea to put language headers into
 the MSG part as opposed to other syslog fields. This would
 allow me, for example, storing the message on disk in the
 same format and/or re-use the language header even when
 bypassing syslog.

 2. The LANGUAGE field.  Is it necessary?  Can it be
 omitted?  I can easily see a situation where an application
 supporting foreign languages does not know the source
 language.  For example, my applications takes usernames in
 any language (in Unicode).  Now, I need to produce a syslog
 message which has the username in it. What am I going to
 put in the LANGUAGE field?

 Also, there can be situations (bizarre as they are) when a
 message contains multiple languages in it. With
 Unicode-based encodings, it is not a problem.  What happens
 in this case?

 I'd propose removing the LANGUAGE field.

 3. RFC1766 referenced in the LANGUAGE field spec is
 officially obsolete.  Need to address it one way or another
 if the LANGUAGE field stays?

 4. I am concerned about attempting to support ALL encodings
 and ALL charsets. Is it necessary?  If this is a new
 standard, why do we need to provide for support of all
 legacy encodings?  Why can't we just standardize on Unicode
 encoded as straight binary UTF-8 or UTF-7?  Maybe also
 provide for support of UTF-16 for more compact
 representation of Asian languages.  Either one will cover
 all languages and would not require getting into the
 business of interpreting locale/language specific charsets
 and many encodings.

 So, can we do away with the CHARSET field and only use
 Unicode-based and US-ASCII encodings in the ENCODING field?

 5. Even if we support multiple encodings, it would probably
 help to specify a few that  MUST be supported.  Otherwise,
 how do we ensure interoperability if each implementation
 can choose its own set of supported encodings/charsets?

 6. Can we specify that the absence of the header should be
 interpreted as UTF-8 encoded message?  A UTF-8 compliant
 syslog daemon will be backwards compatible as it would be
 able to receive message sent in plain US-ASCII as this
 range of characters is encoded identically.

 7. Just a minor thing.  The spec and examples show the
 special prefix as @#i18n, but there is a note in the spec
 that the i should be capital I.  Why not change it in
 the examples then?

 8. With internationalization of messages, the length of
 messages will inevitably grow.  Should we provide for
 multi-part messages to overcome the 1024 byte limit?
 Multi-part messages could have the same standard syslog
 header along with some additional part-seq-number.  I guess
 this also required msg-seq-number.  If we don't deal with
 multi-part messages as part of syslog RFC, applications
 will invent their own mechanisms to identify messages that
 belong together.  A standard mechanism would be nice.

 Thanks!

 Anton.