Dr Alan Watson quoting me: > > Suppose each string s is represented by a vector of 2^21 elements, > > where element i consists of a list of numbers, in IEEE double > > precision, that represent the indexes within s at which the > > character c appears, where c is the Unicode scalar value f(i), > > where f is represented by a global association list that maps > > scalar values to indexes (i.e. f-inverse). SRFI-75 allows that > > representation, yet penalizes it. I also consider it to be a > > poor representation. > > You consider this a poor representation because it does not > allow constant-time random access or because it does not allow > linear-time traversal? Or simply because of the space cost?
Yes. And don't forget the implementation complexity, cost of mutation, potential for race conditions in multithreaded systems, and incompatibility with Java, C#, and C++ cultural expectations. > Consider an implementation that internally represents strings > as if they were doubly-linked lists and cached the position of > the last index. This allows linear-time traversal but not > constant-time random access. Would you consider this a poor > implementation? For some purposes, perhaps, but not for others. IMO it is far more reasonable than the representation I described above. BTW, I once used a doubly-linked-list-of-lines representation to represent the text of editor buffers in an Emacs-like editor I wrote in Scheme. Some of the editor's commands treated the text as a single long linear array of characters; other commands treated the text as an array of strings, with double indexing (line+column) to obtain a character. Aided by a couple of cached index translations, this representation ran acceptably fast on an 8 MHz 68000 with 1 Mby of RAM, and ran very nicely on the faster and larger Macintoshes that came later. It was never released as a product, but I used it at home for more than a decade. That experience is one of the reasons I disregard some of the criticisms I have read here concerning SRFI-75's alleged bias against interesting representations. I have expressed my opinion that implementations should provide multiple representations for strings, transparently and adaptively, using the API described in SRFI-75 and to be described in future SRFIs. Not every one of those representations has to do all things well. Indeed, the impossibility of doing all things well with a single representation is the reason I favor a multiplicity. Will