Status: New
Owner: ----
New issue 2875 by [email protected]: Can generate (and parse) invalid
UTF-8
http://code.google.com/p/v8/issues/detail?id=2875
If a String contains an unpaired Unicode surrogate (U+D800 through U+DFFF)
encoding it as UTF-8 will result in an invalid string. This is because
UTF-8 is defined (in RFC-3629) not to allow surrogate characters at all.
(for context: This caused us problems because we were relying on a Node.js
frontend to only output valid utf-8 regardless of the validity of user
input. Everything worked fine except in the case of incoming unpaired
surrogates, at which point our backend crashed with an encoding error).
I've attached a naive fix as `generate-valid-utf8.patch`. (I say naive
because it breaks the tests, and I've not figured out how best to alter
them).
Relatedly when parsing UTF-8, surrogates are accepted. This should not be
allowed (according to RFC-3629 or UNICODE-TR26), instead they should be
replaced by U+FFFD in the same was as other invalid byte sequences.
I've attached this approach as `parse-utf8-only.patch`.
That said, it may be the case that people are relying on this laxness so
that they can use CESU-8 (though I don't have any evidence for this). It
may be more pragmatic to ignore the security recommendations in
UNICODE-TR26 and continue allowing correctly paired surrogates when
decoding UTF-8 so that CESU-8 continues to work. Even in that case, we
should still not parse incorrectly paired surrogates, as they are not
allowed in either CESU-8 or UTF-8.
I've attached this approach as `parse-utf8-or-cesu8.patch`
More work will be needed to make any of these patches acceptable, but I'd
like to get an idea of which approach you guys would prefer to take.
See also https://code.google.com/p/v8/issues/detail?id=761#c33
Attachments:
generate-valid-utf8.patch 406 bytes
parse-utf8-only.patch 555 bytes
parse-utf8-or-cesu8.patch 1.8 KB
--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings
--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.