On Tue, 28 Feb 2012 13:05:37 +0100, Jonas Sicking <[email protected]> wrote:

If we can't U+FFFD unpaired surrogates on paste, I agree it makes sense to
U+FFFD them in APIs. If the only way to get them is a JS escape, then an
exception seems OK.

People use JS strings to handle binary data. This is something that
has worked since the dawn of JS and is something that I believe is
defined to work in recent ECMAScript specs.

I don't think that we can start restricting that and try to enforce
that JS-strings always contain valid UTF16.

Right.

So I think our only option is to make all APIs which does UTF16->UTF8
conversion explicitly define how to deal with invalid surrogates.

Sure, I don't suggest we leave it undefined.

My
preference would be to deal with them by encoding them to U+FFFD for
the same reason that we let the HTML parser do error recovery rather
than XML-style draconian error handling.

I'm not really opposed to making APIs use U+FFFD instead of exception, but I'm not entirely convinced, either. If people use binary data in strings and want to use them in these APIs, U+FFFDing lone surrogates is going to "silently" scramble their data. Why is this better than throwing an exception?

--
Simon Pieters
Opera Software

Reply via email to