Re: [rust-dev] On the weirdness of strings

Graydon Hoare Thu, 06 Sep 2012 14:42:34 -0700

On 12-09-06 12:08 PM, Gareth Smith wrote:

I was one of those passing them by value too much. I did it because it
seemed like the idiomatic thing to do. Even rustc did it - that made it
seem legit. It no longer seems like the idiomatic thing to do because
the compiler emits a warning about it unless it is done explicitly, so I
try to avoid it. I think that documentation and compiler warnings will
determine typical use.

Right. So then, typical use API-use would be &str, access to the byteswould be double-indirect, and we'd be unable to do any constant-stringor substring optimizations, correct?

 > The current scheme is a very delicate balance between a large number
of pressures; I think it's about the best we're going to get.

The problem with rust's strings is that any rust program I write seems
to be more complicated because of features that strings have that 99% of
the time I will not use. I have to pay for safe concurrency even though
it looks like I will barely be using it. Ditto with fixed size and
constant memory strings.

Every time you write "foo", it is a constant-memory string; and in thenear-ish future, all slicing (hence substring-extraction) operations incore::str (from which a great many derived strings originate) willhappen via borrowing, not allocating.

These are actually really important cases. Important enough that mostother languages dedicate built-in machinery to handle them non-uniformlyas well: substrings often pin the outer string alive in the GC heap (orrefcount it independently), constants often get their own pool and/orseparate representations, often all sorts of optimization apply too,like in-place concat, doubling-growth, inline storage for small strings,etc. etc.

I'm not trying to be a jerk. In C, a string is just a char* that you canmove around at as-near-as-possible zero cost, like an integer. Betterthan just that: since it points to constant memory, the compiler can seethrough it and boil off bounds checking or indexing operations(extracting the element-bytes as sub constants). It's very cheap, andsets people's expectations for "how fast it can be done", but it's notsafe. We want to be safe, and as close-to-as-fast as we can be whilebeing safe. So here's what we tried:


  - Implementation #1: all strings were shared, refcounted, there's a
    magic refcount that means "constant". Every time you copy one you
    have to check both the magic refcount and the non-magic one, and
    adjust it. Costly. Also meant you could never send them over
    channels, since that'd require atomic refcounting. We don't want
    that.

  - Implementation #2: all strings were unique. Now you can send them
    over channels, but must double-indirect to share them, and "foo"
    causes a memory allocation, where it _should_ just be a pointer
    to constant memory. No constants or substrings.

It's difficult to think of other practical versions that don't involveeither copying or refcounting all the time, even on constant strings,which always puts is back into the same place you're suggesting: &strfor most APIs, and double-indirect, and losing all the constant-stringand substring optimization opportunities.

You created a nice language for programs that are mostly non-concurrent
(regardless of how nice it is for highly concurrent programs), so I and
others are going to try using it for that :) ... and sometimes wondering
why the strings are so hard to use.

Yeah, I'm .. sympathetic, I do want them to be "easy", or as easy asthey can be; can you describe _exactly_ what the difficulties you'rehaving are? Not just that they're "hard" or "weird", but like, ause-case that you keep doing, that you want to be able to stop-doing?

Also note: many of our APIs (core::str for example) are still far more~str-centric than they ought to be longer-term; we did a bulk-conversionfrom str to ~str, and need to go through and fully convert over to &strwhenever possible.


-Graydon

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] On the weirdness of strings

Reply via email to