I'm not sure I understand, what is the point of supporting "more" than
utf8?
In the original utf8 standard the encoding is:
The code is encoded as a string of length 1 + additional length.
The additional length is a 0-ary encoding of the length '10' to
'110' (i.e.: 1.. 6)
The first char
Simon occasionally includes code from some other part of the libraries
to avoid requiring, say, Gen to access Sequence or Containers; I don't
remember offhand. In the case of some tiny piece of code thats
sensible. (And so far that is all I have provided)
Pervasives has now a type uchar
(*
Reading recent posts on discuss.ocal.org gives me the impression that
some tiny
number of utf related routines should be more easily available.
Container's Sequence.t and Gen.t, in particular could benefit from a
couple of
simple routines. The code below fits well into that frame work.
I