This is a proposal for the removal of string-set! (and consequently string-fill!) from the R7RS small Scheme language. I am publishing this document to invite wide comment. There is nothing official about it. I very gratefully acknowledge the kind help of Alex Shinn, who provided the topic sentences for most of the paragraphs below. However, I retain sole responsibility for this document, including all errors.
I believe that despite the prescription of the draft WG1 charter that no features of IEEE Scheme (a subset of R4RS) should be removed from R7RS small Scheme, an exception should be made for string-set!, for at least the following reasons: 1) Immutable strings are more purely functional, and allow many optimizations, such as being transparently and freely shareable between procedures and between threads without concern for uncontrolled mutation. For this and other reasons, the general trend in new languages/runtimes such as Java and C# is toward immutable strings; unfortunately, this is the kind of argument that Schemers usually don't like, so I won't bother mentioning it. :-) 2) Algorithms where you want to modify strings in the middle are rare, and many of the classic devices (such as string-upcase!, a procedure that mutates a string in place) are awkward or impossible with representations that make use of characters of variable length such as UTF-8. Typical string algorithms want to also be able to do insertions and deletions, which are not directly possible with classical Scheme strings. Better representations such as trees of immutable strings do allow such changes, as well as making string appends O(n) in the number of strings rather than in the sum of their lengths. 3) If strings are immutable, it's possible to have both fast O(1) access to individual characters or substrings, and fairly space-efficient representation of full Unicode strings, by using different representations for strings drawn from diferent character repertoires. For example, an implementation might use 8-bit code units when all characters are less than \#x100, 16-bit code units when all characters are less than \#x10000, and 32-bit code units otherwise. Unfortunately, mutating even a single character in such a representation may require the entire string to be copied, which means that it also requires indirection through a separate header that can be redirected to point to the newly allocated code unit sequence. Immutable strings can just *be* their sequences, with a few extra bits indicating the size of the code units, although this design does prevent easy sharing of substrings. 4) As currently designed, strings are functionally just vectors of characters. In an 8-bit world, using the traditional representation of strings carries a 4:1 storage advantage, making it worthwhile to distinguish them clearly from general vectors But 21-bit Unicode characters are a much better fit, if represented as immediate (unboxed) values, for general vectors using 32-bit pointers. Granted that not all small Scheme systems will provide full Unicode support, general vectors start to look much less expensive than they once were. In short: if you want something that behaves like a vector of characters, simply use a general vector that contains characters. 5) Making strings immutable also permits a design in which all strings are Unicode-normalized. Though this has its own costs (for example, appending two strings may create a new string whose length is different from the lengths of the two source strings), it would be effectively impossible where arbitrary mutation is allowed. As a consequence of removing string-set!, string-fill! (not in IEEE Scheme) becomes impossible and string-copy less useful. I do not propose to remove string-copy, however, because it can eliminate space leaks that are caused by taking a small shared substring of a large existing string: when the larger string should be GC'ed, it is retained as a whole because of the shared substring. Using string-copy judiciously can prevent such leaks. -- John Cowan [email protected] http://ccil.org/~cowan If he has seen farther than others, it is because he is standing on a stack of dwarves. --Mike Champion, describing Tim Berners-Lee (adapted) _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
