On 12-04-19 07:25 AM, Jesse Ruderman wrote: > My preference is to remove null termination: > > * I'm guessing most strings aren't passed to C. (What are the most > common C string calls in rustc?)
All the filesystem access stuff, at this point. In the future it's harder to say. > * C functions that scan for null are inefficient, so they're even more > likely to be replaced with Rust equivalents than other C functions. Hm, I think this is not a reasonable stance: $ find /usr/include/ -name \*.h \ | xargs cat \ | grep -c 'char\( *const\)\? *\*' 10488 There are a lot of C APIs that take strings. "Rewrite the world in rust" is going to take a long time. > * Null termination is not sufficient for interop with C. You also have > to ensure the strings don't contain null characters. (This is a common > source of bugs in Firefox, since JavaScript strings and strings from > the network can contain null characters.) And if null characters are > present, what do you do? I can see some cases where that might be a bug, but in general I think an embedded null just ... makes a string shorter, from C's perspective. It's the same as passing a short string. Of course if the C code requires some other kind of well-formedness condition in the prefix, you'd need to enforce that, but that condition presumably holds over shorter and longer strings alike. Most C APIs aren't written to take strings of a fixed size. > * Each C function has its own expectations about character encoding > and allowed characters, so calls to C involve extra state-tracking or > checks anyway. For APIs that take UTF-16, such as the win32 APIs, we already do the conversion before calling, yes. But for APIs that take "char *" they tend to be set up so they can accept UTF-8 input: they're either agnostic to the differences between ASCII and UTF-8 (as UTF-8 was designed to exploit) or else they can operate in UTF-8 mode via LC_CTYPE or such. Sure you need to either enforce that and/or re-encode when it's not true, but again, this is about opportunistic recoding-avoidance by careful choice of defaults, rather than a guarantee that we never need to recode. Sometimes users want an array of UCS4 as well, but it's not our default string representation. -Graydon _______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
