Re: [FileAPI, common] UTF-16 to UTF-8 conversion

Simon Pieters Mon, 27 Feb 2012 22:12:59 -0800

On Tue, 28 Feb 2012 01:05:44 +0100, Glenn Maynard <[email protected]> wrote:

On Mon, Feb 27, 2012 at 5:34 PM, Arun Ranganathan
<[email protected]>wrote:

Simon,

Is the relevant part of HTML sufficient to refer to?
http://dev.w3.org/html5/spec/Overview.html#utf-8

I was thinking of "If the data argument has any unpaired surrogates, thenthrow a SyntaxError exception.".http://www.whatwg.org/specs/web-apps/current-work/multipage/network.html#dom-websocket-send


That defines decoding UTF-8 to Unicode strings.  You need the reverse.

Using a replacement scheme like UTF-8 decoding, instead of a hard
exception, seems more consistent with how encodings in general are
handled.  Otherwise, you'll end up with bugs in code if, for example,

people paste in unpaired surrogates (Firefox allows this, last Ichecked),

Maybe unpaired surrogates should be converted to U+FFFD on paste. Arethere other cases?

causing unexpected exceptions in code.  Instead, just convert them to
U+FFFD, which gives much more graceful error handling for such a rarecase
that most people will never handle explicitly.

If we can't U+FFFD unpaired surrogates on paste, I agree it makes sense toU+FFFD them in APIs. If the only way to get them is a JS escape, then anexception seems OK.

I think WebSocket should do the same, for the same reason.


Have you filed a bug?

--
Simon Pieters
Opera Software

Re: [FileAPI, common] UTF-16 to UTF-8 conversion

Reply via email to