I am uncomfortable with the terminology because I do not think it precise enough.
Digression on terminology: Character Set is a set of characters (letters, number, symbols, glyphs ...) Coded Character Set gives each a (numeric) code, as in ISO 10646. Character Encoding (Scheme/Syntax) specifies how the codes become octets as in UTF-8. Transfer Encoding/Syntax specifies how the octets are put on the wire, as in Base64. MIME conflates CCS and CES to charset but keeps (Content) Transfer Encoding distinct; they can be different in different parts of an e-mail. Currently, the ABNF in -15 nails down the character set everywhere except MSG and SD PARAM-VALUE for which the CES is UTF8 and so implicitly any of the 97,000 characters in the CCS are permitted (nb characters, not binary). The only specification of Transfer Encoding is for SD PARAM-VALUE where characters '"', '\' and ']' MUST be escaped. Implicitly, the other fields are encoded as is, octet for octet. If we add an encoding field, then is it Character or Transfer? If the latter we may want to specify the former and vice versa. And language I think meaningless unless these are defined. And then there is locale (which may be more important than language:-( If we add a count, what are we counting? Characters as went into UTF8? Octets as went into the transfer syntax? or what came out of eg Base64? As I suggested before, I do think MIME has mostly done a good job here, of internationalising and expanding the scope of character messages. Tom Petch ----- Original Message ----- From: "Anton Okmianski (aokmians)" <[EMAIL PROTECTED]> To: "Rainer Gerhards" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, November 30, 2005 7:28 PM Subject: RE: [Syslog] #3 NUL octets, #4 binary data, #8 octet-counting Specifying the encoding makes sense to me. This way we can state that only certain encoding support is required, but not preclude other options. We are still ok with always having UTF-8 in SD values, right? We need this for foreign usernames. We have discussed this before. Thanks, Anton. > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Rainer Gerhards > Sent: Wednesday, November 30, 2005 3:26 AM > To: [EMAIL PROTECTED] > Subject: [Syslog] #3 NUL octets, #4 binary data, #8 octet-counting > > Hi WG, > > I have received notes via private mail telling me there seem > to be some existing (and eventually soon upcoming) valid use > cases for binary data in syslog. I think there is no point in > arguing whether that's fortunate or not. It simply looks like > that's the way it is. I do not like the idea of breaking > existing use cases for syslog (because that will only lead to > implementors ignoring the spec and the story of syslog > inconsistencies continues...). As such, I think we need to > provide at least some minimal support for it (aka "not outlaw it"). > > At first, this implies that NUL octets may be present in the message. > > I propose that we write text that discourages the use of NUL, > but allows it if needed. That text should also allow, but > discourage, a receiver to modify messages containing NUL. > With that, we allow the use case, but do not make it a "show > stopper" for implementing compliant software. This would also > be pretty much in sync with what we currently find in > practice, so it is already expected behaviour. Finally, such > text would caution implementors that when NUL octets are > present, chancs are high that eventually present digitial > signatures will be broken. In my point of view, that's fair > and efficient. > > Chris proposal for #5 (character encoding) also provides an > elegant solution for binary data. We can use something like: > > [enc="binary"] > > or > > [enc="base-64"] > > I do NOT intend to specify this - I think it should be in the > scope of a separate document specifying the use of binary > data. Then would also be the right time to discuss all issues > that arise out of it. For now, I just would like to keep the > door open. > > Finally, I propose to extend Chris format so that the message > size can be conveyed. This has been brought up several times > and I think a clean solution is now obvious: > > [enc="utf-8" lang="en" size="MSG-size-in-octets"] > > MSG-size-in-octets would be the size of the MSG part (just > that!) in octets. Counting just the MSG part is sufficient, > as the rest of the message consists of fields properly > delimited. The size is probably most useful for binary data. > > Please comment. > > Rainer > _______________________________________________ Syslog mailing list [email protected] https://www1.ietf.org/mailman/listinfo/syslog
