Please watch your character encodings

Dave Townsend Wed, 10 Jul 2019 13:47:12 -0700

(If you don't know much about character encodings and how they can cause
issues I just posted the blog post for you:
https://www.oxymoronical.com/blog/2019/07/Please-watch-your-character-encodings
)


I've run into a few bugs recently where non-english characters were causing
things to break because we were encoding or decoding strings incorrectly.
Please watch out for this, even better add tests using non-English
characters where it makes sense.

A couple of specific things worth keeping in mind:

* nsAString is documented as always being encoded in (potentially invalid)
UTF-16.
* nsACString is documented as not having any defined encoding. Look back
over where your string is coming from and see if you can infer the encoding
from there. Ideally document it!
* Use the right IDL type when passing strings through XPCOM between C++ and
JavaScript. Even though you are working with an nsACString in C++, ACString
is not the right IDL type to use unless you know that there can be no
international characters involved, they will be mangled without any kind of
warning. Instead consider AUTF8String, this will encode/decode the
nsACString as UTF-8 when converting from or to a JavaScript string.

As a concrete example I skimmed over the IDLs that use ACString today and
came across nsINetUtil.idl. The escape and unescape functions look suspect
as they take and return ACStrings and it seems likely that someone may want
to escape international characters. Sure enough:

netUtils.escapeString("Ć", netUtils.ESCAPE_XALPHAS) == "%06" (should be
"%C4%86")
netUtils.unescapeString("%C4%86", 0) == "Ä\u0086"

Maybe those functions are only meant to work with single byte characters,
but that isn't clear from the comments. Currently these two functions are
only in use in tests so we can probably just remove them rather than figure
out whether to fix them or not.
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Please watch your character encodings

Reply via email to