On 16/09/16 14:24, David Matthews wrote: > > The representation of strings has been simplified. Previously, single > character strings were represented by the character itself as a tagged > value. This was largely a relic of SML 90 which didn't have a separate > char type. That has been removed and all strings are now represented as > a vector of bytes. This speeds up string operations since the code no > longer has to consider the special case.
We have actually exploited this singleton string representation systematically since 1998, when we explicitly decided against Unicode (which was 16-bit widechar at that time) and adopted a variable-length character representation of so-called "Isabelle symbols". UTF-8 did not exist yet, but today it fits nicely into the same scheme -- it has room for special characters of arbitrary length. Thus SML90.explode only had to be upgraded to our Symbol.explode, which mostly produces singleton strings and sometimes longer ones. Type char is never used, because it is not sufficient to represent a text character. Note that today, the variable-length nature of Unicode is generally acknowledged (see http://utf8everywhere.org) and arrays of fixed-length characters considered legacy. It should be actually easy for Isabelle/ML to cope with the new representation of Poly/ML, using a table of pre-allocated singleton strings. I have done something similar for the Isabelle/Scala version of Symbol.explode, since the JVM is based on legacy UTF-16 widestring representation. I can do the same for the Isabelle/ML library, but there might be a general benefit if basis/String.sml would do it uniformly in charAsstring and thus in String.str etc. Instead of creating a new singleton string each time, it would refer to a vector of 256 pre-allocated values. Makarius _______________________________________________ polyml mailing list polyml@inf.ed.ac.uk http://lists.inf.ed.ac.uk/mailman/listinfo/polyml