On 16/09/16 14:24, David Matthews wrote:
> The representation of strings has been simplified. Previously, single
> character strings were represented by the character itself as a tagged
> value. This was largely a relic of SML 90 which didn't have a separate
> char type. That has been removed and all strings are now represented as
> a vector of bytes. This speeds up string operations since the code no
> longer has to consider the special case.
We have actually exploited this singleton string representation
systematically since 1998, when we explicitly decided against Unicode
(which was 16-bit widechar at that time) and adopted a variable-length
character representation of so-called "Isabelle symbols". UTF-8 did not
exist yet, but today it fits nicely into the same scheme -- it has room
for special characters of arbitrary length.
Thus SML90.explode only had to be upgraded to our Symbol.explode, which
mostly produces singleton strings and sometimes longer ones. Type char
is never used, because it is not sufficient to represent a text character.
Note that today, the variable-length nature of Unicode is generally
acknowledged (see http://utf8everywhere.org) and arrays of fixed-length
characters considered legacy.
It should be actually easy for Isabelle/ML to cope with the new
representation of Poly/ML, using a table of pre-allocated singleton
strings. I have done something similar for the Isabelle/Scala version of
Symbol.explode, since the JVM is based on legacy UTF-16 widestring
I can do the same for the Isabelle/ML library, but there might be a
general benefit if basis/String.sml would do it uniformly in
charAsstring and thus in String.str etc.
Instead of creating a new singleton string each time, it would refer to
a vector of 256 pre-allocated values.
polyml mailing list