Re: [Standards] RTT, take 2

Dave Cridland Fri, 24 Jun 2011 01:08:23 -0700

On Fri Jun 24 02:54:12 2011, Peter Saint-Andre wrote:

On 6/23/11 12:41 AM, Mark Rejhon wrote:
> Opinion?
On the wire is no such thing as a code point, there are only codepoints
that are encoded using an encoding form like UTF-8 or UTF-16. For
details, see:

http://tools.ietf.org/html/draft-ietf-appsawg-rfc3536bis-02
Given that XMPP is pure UTF-8, I don't see a compelling reason tocount
UTF-16-encoded code points or UTF-32-encoded code points.

I think UTF-16 and UTF-32 encodings would both be a bad idea; XMPP ispurely UTF-8 as you say.

However, I don't think that we should refer to UTF-8 octets either,here, for a number of reasons:

1) Processing software may have decoded the UTF-8 into "something",making it awkward to manage.

2) Referring to UTF-8 octets means we have silly states where wecould edit inside characters. It's even possible this may be usedintentionally, in some languages.

So I'd say that we should refer to characters in a string, and dealwith Unicode code-points in the abstract. I'd expect thatimplementations would convert this internally into whatever madesense for them.


Dave.
--
Dave Cridland - mailto:[email protected] - xmpp:[email protected]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

Re: [Standards] RTT, take 2

Reply via email to