On 28/05/2013 1:53 PM, Benjamin Striegel wrote:
A few days ago I submitted a pull request to convert str::as_c_str()
from a function into a method on strings:
https://github.com/mozilla/rust/pull/6729
And today in IRC there was a discussion regarding the fact that Rust's
strings are null-terminated:
https://botbot.me/mozilla/rust/msg/3466672/
The impression that I'm getting from each is that this aspect of Rust's
C interop appears to have no concrete design, so I figured I'd bring
this up on the mailing list.
It has a concrete design, it's just (like many things) underdocumented.
My own fault, apologies. Rust ~strs and static strings are null
terminated, and &strs are defined as (ptr,len+1) such that they reach
one-past-the-end of the data you wish the slice to cover. When they are
taken from a null-terminated "full string", the final null is
detectable. When they are taken as a sub-slice of some other slice, the
final null is missing. When making an as_c_str() sort of call, we check
the final null and only strdup if it's missing.
This is all intended to minimize the number of strdups we have to do
during calls to C. Most &strs originate in full strings and most of the
time we want our APIs to pass &str down to the last point, then convert
to a C string to pass into an OS or library function. Many C APIs take
null-terminated strings. That's just a fact. You can criticize them all
you like; they won't spontaneously change.
I made this choice a long time ago, around when I was doing the
in-place-append optimization for [] and "". Long time. It's fine to be
revisiting and questioning it.
The main problems with the design are:
- It's asymmetric with ~[] and &[]
- You don't always _have_ len+1 addressable bytes in a string
- It makes performance less predictable
- It's easy to forget since it appears to be redundant to hold
both null and len
I believe Brian has wanted to remove this choice for a long time. I
won't fight it either way, though it will take quite a lot of fiddling
library code and debugging crashes to get it fixed up. You'll want to
use valgrind.
It _will_ also cost new strdups on many C boundaries, if we fix it.
Maybe that's ok. Potential performance traps are worth measuring rather
than speculating about.
One possibility to mitigate a performance problem, if it arises, is to
make a small-but-fixed size buffer on the as_c_str() path so that for <
N byte strings you can copy to the stack. Another would be to keep an
arena around for the purpose; we should probably have TLS-based arena
for non-escaping dynamic allocations anyways, it keeps coming up.
-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev