RE: Roundtripping in Unicode

Lars Kristan Mon, 13 Dec 2004 06:21:13 -0800

Title: RE: Roundtripping in Unicode

Philippe Verdy wrote:
> From: "Doug Ewell" <[EMAIL PROTECTED]>
> > Lars Kristan wrote:
> >> I am sure one of the standardizers will find a Unicodally
> >> correct way of putting it.
> >
> > I can't even understand that paragraph, let alone paraphrase it.
>
> My understanding of his question and my reponse to his
> problem is that you
> MUST not use VALID Unicode codepoints to represent INVALID
> byte sequences
> found in some text with alleged UTF encoding.
OK, should the codepoints for this purpose be valid or not. If the modified conversion would be made standard and would replace the current UTF-16/32 to UTF-8 conversion, then they would have a status of close to that of surrogates. But not entirely. They could be considered invalid for applications that absolutely need the bijectivity. Which is not always the case. So, actually, many applications could and should consider them valid. And that also means that for current applications nothing changes, since they already consider them valid.

What I was talking about in the paragraph in question is what happens if you want to take unassigned codepoints and give them a new status. And this is precisely what happened with surrogates. We can discuss how things should be called in this context, what is valid at which point and what are the consequences. But please note that I have abandoned this idea and am now pursuing a slightly different approach.

Lars

RE: Roundtripping in Unicode

Reply via email to