Re: [FileAPI, common] UTF-16 to UTF-8 conversion

Simon Pieters Tue, 28 Feb 2012 04:58:21 -0800

On Tue, 28 Feb 2012 13:05:37 +0100, Jonas Sicking <[email protected]> wrote:

If we can't U+FFFD unpaired surrogates on paste, I agree it makes senseto
U+FFFD them in APIs. If the only way to get them is a JS escape, then an
exception seems OK.


People use JS strings to handle binary data. This is something that
has worked since the dawn of JS and is something that I believe is
defined to work in recent ECMAScript specs.

I don't think that we can start restricting that and try to enforce
that JS-strings always contain valid UTF16.


Right.

So I think our only option is to make all APIs which does UTF16->UTF8
conversion explicitly define how to deal with invalid surrogates.


Sure, I don't suggest we leave it undefined.

My
preference would be to deal with them by encoding them to U+FFFD for
the same reason that we let the HTML parser do error recovery rather
than XML-style draconian error handling.

I'm not really opposed to making APIs use U+FFFD instead of exception, butI'm not entirely convinced, either. If people use binary data in stringsand want to use them in these APIs, U+FFFDing lone surrogates is going to"silently" scramble their data. Why is this better than throwing anexception?


--
Simon Pieters
Opera Software

Re: [FileAPI, common] UTF-16 to UTF-8 conversion

Reply via email to