On Tue, Feb 28, 2012 at 12:11 AM, Simon Pieters <[email protected]> wrote:
> I think WebSocket should do the same, for the same reason. > > Have you filed a bug? (No, not until this conversation moves along a bit further.) On Tue, Feb 28, 2012 at 8:26 AM, Jonas Sicking <[email protected]> wrote: > I agree that it "scrambles" the data. But no more than the HTML parser > error recovery does. And if an unexpected exception is thrown then the > result is likely dataloss which is not obviously better than > scrambling part of the data. > I'd say it's weaker than "scrambles", actually, at least with human-readable text. Replacing one character with U+FFFD usually results in an isolated glitch that a reader can recover from. (I do this regularly when reading the HTML spec, which uses characters not widely supported, in particular "Steps in synchronous sections are marked with ?.") Also, even if you're attentive to handling these errors, most of the time you don't want to. In my experience, it's very uncommon to want to explicitly handle very rare errors like "the user pasted in an unpaired surrogate". There's rarely anything useful you can do, except to walk through the string and change the unpaired surrogates to U+FFFD, so you can move on. I'd rather just get U+FFFD to begin with. -- Glenn Maynard
