On 12-09-06 12:08 PM, Gareth Smith wrote:
I was one of those passing them by value too much. I did it because it
seemed like the idiomatic thing to do. Even rustc did it - that made it
seem legit. It no longer seems like the idiomatic thing to do because
the compiler emits a warning about it unless it is done explicitly, so I
try to avoid it. I think that documentation and compiler warnings will
determine typical use.
Right. So then, typical use API-use would be &str, access to the bytes
would be double-indirect, and we'd be unable to do any constant-string
or substring optimizations, correct?
> The current scheme is a very delicate balance between a large number
of pressures; I think it's about the best we're going to get.
The problem with rust's strings is that any rust program I write seems
to be more complicated because of features that strings have that 99% of
the time I will not use. I have to pay for safe concurrency even though
it looks like I will barely be using it. Ditto with fixed size and
constant memory strings.
Every time you write "foo", it is a constant-memory string; and in the
near-ish future, all slicing (hence substring-extraction) operations in
core::str (from which a great many derived strings originate) will
happen via borrowing, not allocating.
These are actually really important cases. Important enough that most
other languages dedicate built-in machinery to handle them non-uniformly
as well: substrings often pin the outer string alive in the GC heap (or
refcount it independently), constants often get their own pool and/or
separate representations, often all sorts of optimization apply too,
like in-place concat, doubling-growth, inline storage for small strings,
etc. etc.
I'm not trying to be a jerk. In C, a string is just a char* that you can
move around at as-near-as-possible zero cost, like an integer. Better
than just that: since it points to constant memory, the compiler can see
through it and boil off bounds checking or indexing operations
(extracting the element-bytes as sub constants). It's very cheap, and
sets people's expectations for "how fast it can be done", but it's not
safe. We want to be safe, and as close-to-as-fast as we can be while
being safe. So here's what we tried:
- Implementation #1: all strings were shared, refcounted, there's a
magic refcount that means "constant". Every time you copy one you
have to check both the magic refcount and the non-magic one, and
adjust it. Costly. Also meant you could never send them over
channels, since that'd require atomic refcounting. We don't want
that.
- Implementation #2: all strings were unique. Now you can send them
over channels, but must double-indirect to share them, and "foo"
causes a memory allocation, where it _should_ just be a pointer
to constant memory. No constants or substrings.
It's difficult to think of other practical versions that don't involve
either copying or refcounting all the time, even on constant strings,
which always puts is back into the same place you're suggesting: &str
for most APIs, and double-indirect, and losing all the constant-string
and substring optimization opportunities.
You created a nice language for programs that are mostly non-concurrent
(regardless of how nice it is for highly concurrent programs), so I and
others are going to try using it for that :) ... and sometimes wondering
why the strings are so hard to use.
Yeah, I'm .. sympathetic, I do want them to be "easy", or as easy as
they can be; can you describe _exactly_ what the difficulties you're
having are? Not just that they're "hard" or "weird", but like, a
use-case that you keep doing, that you want to be able to stop-doing?
Also note: many of our APIs (core::str for example) are still far more
~str-centric than they ought to be longer-term; we did a bulk-conversion
from str to ~str, and need to go through and fully convert over to &str
whenever possible.
-Graydon
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev