On 28/05/2013 1:53 PM, Benjamin Striegel wrote:
A few days ago I submitted a pull request to convert str::as_c_str()
from a function into a method on strings:

https://github.com/mozilla/rust/pull/6729

And today in IRC there was a discussion regarding the fact that Rust's
strings are null-terminated:

https://botbot.me/mozilla/rust/msg/3466672/

The impression that I'm getting from each is that this aspect of Rust's
C interop appears to have no concrete design, so I figured I'd bring
this up on the mailing list.

It has a concrete design, it's just (like many things) underdocumented. My own fault, apologies. Rust ~strs and static strings are null terminated, and &strs are defined as (ptr,len+1) such that they reach one-past-the-end of the data you wish the slice to cover. When they are taken from a null-terminated "full string", the final null is detectable. When they are taken as a sub-slice of some other slice, the final null is missing. When making an as_c_str() sort of call, we check the final null and only strdup if it's missing.

This is all intended to minimize the number of strdups we have to do during calls to C. Most &strs originate in full strings and most of the time we want our APIs to pass &str down to the last point, then convert to a C string to pass into an OS or library function. Many C APIs take null-terminated strings. That's just a fact. You can criticize them all you like; they won't spontaneously change.

I made this choice a long time ago, around when I was doing the in-place-append optimization for [] and "". Long time. It's fine to be revisiting and questioning it.

The main problems with the design are:

  - It's asymmetric with ~[] and &[]
  - You don't always _have_ len+1 addressable bytes in a string
  - It makes performance less predictable
  - It's easy to forget since it appears to be redundant to hold
    both null and len

I believe Brian has wanted to remove this choice for a long time. I won't fight it either way, though it will take quite a lot of fiddling library code and debugging crashes to get it fixed up. You'll want to use valgrind.

It _will_ also cost new strdups on many C boundaries, if we fix it. Maybe that's ok. Potential performance traps are worth measuring rather than speculating about.

One possibility to mitigate a performance problem, if it arises, is to make a small-but-fixed size buffer on the as_c_str() path so that for < N byte strings you can copy to the stack. Another would be to keep an arena around for the purpose; we should probably have TLS-based arena for non-escaping dynamic allocations anyways, it keeps coming up.

-Graydon

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to