Tom,

I agree there are some issues with truncation, but I think they are
inherent. We have specified that the message should be truncated at the
end of the message. In the text I proposed, I wanted to make sure that
the message ends with a technically-complete UTF-8 sequence. Based on
Anton's comment, I have to admit I am unsure if there is really benefit
in this. Anyhow, even if it is, I think we should not try to preserve
the proper meaning. If the message is truncated, the end of it is in
doubt. This might also mean a few characters at the end might be wrongly
interpreted due to truncated control characters. I think we should
document it and live with it (but it was important to bring this issue
up so that it can be documented).

Any comments?

Thanks,
Rainer 

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Tom Petch
> Sent: Tuesday, January 17, 2006 2:40 PM
> To: Darren Reed
> Cc: [EMAIL PROTECTED]
> Subject: Re: [Syslog] Sec 6.1: Truncation
> 
> ----- Original Message -----
> From: "Darren Reed" <[EMAIL PROTECTED]>
> To: "Tom Petch" <[EMAIL PROTECTED]>
> Cc: <[EMAIL PROTECTED]>
> Sent: Monday, January 16, 2006 10:51 PM
> Subject: Re: [Syslog] Sec 6.1: Truncation
> 
> 
> > [ Charset ISO-8859-1 unsupported, converting... ]
> > > Truncation of UTF-8 is actually slightly worse than has 
> been described.
> > >
> > > It is possible to determine from the UTF-8 octets where one coded
> > > character ends and another begins.  But because Unicode contains
> > > combining characters, with no limit on how many of these there can
> > > be, and these modify the meaning of previous or later 
> coded characters,
> > > it is not possible to determine where one 'symbol' ends.  
> So truncation
> > > at a UTF-8 boundary could subtlety change the meaning of 
> a message,
> > > even breach security.  Not something we can guard against
> > > but should mention.
> >
> > The above seems a little confused to me.  How can there be a problem
> > if a message is truncated on the boundary of complex character ?
> >
> > Darren
> 
> I lack the precise terminology.  Unicode includes base 
> characters and modifying
> characters, such as diacritic marks, as well as characters 
> that combine the two.
> Where the combination exists as a single code point, no 
> problem.  Where it does
> not, then what the user would see as a single character is 
> actually sent as
> several code points, each separately encoded in UTF-8.  It is 
> fairly easy for a
> truncating relay to work out the boundary of the UTF-8 and so 
> ensure that a
> complete UTF-8 encoding is truncated (or not).  It is much 
> harder, probably
> impossible, to work out where any modifying characters 
> belong, whether they
> should be removed or left in.  And the character 'o' with a 
> diacritic mark is
> not the same as that character without that diacritic mark, 
> so removing trailing
> modifying characters changes the meaning, which could be a 
> security exposure.
> .
> Tom Petch
> 
> 
> _______________________________________________
> Syslog mailing list
> [email protected]
> https://www1.ietf.org/mailman/listinfo/syslog
> 

_______________________________________________
Syslog mailing list
[email protected]
https://www1.ietf.org/mailman/listinfo/syslog

Reply via email to