I'm not sure I understand, what is the point of supporting "more" than
utf8?
In the original utf8 standard the encoding is:
The code is encoded as a string of length 1 + additional length.
The additional length is a 0-ary encoding of the length '10' to '1111110'  (i.e.: 1.. 6)
The first char supplies 1 to 7 bits; the following chars supply 6 bits each.
The maximal # bits is 31 bits. (5 * 6 + low bit from 0-byte).
I am using this encoding but it is no longer 'standard'.  Instead the range
  0xD7FF .. 0xE000 is excluded from the TOTAL range 0 .. 0x10FFFF
In Uutf8 only this range is accepted.


All I tried to say is: My code does not encode the current standard; in fact it does
little checking. (Encodes more - checks less).

Calling it utf31 would be an informal way of signaling this;
we can call anything what we want to call it.

I will write a filter that does verification; especially :
A code that has length 1 + n must have n bytes following with format 10xxxxxx;  if the decoder encounters 0xxxxxxx or x1xxxxxxx or end of string; that is an error.
Uutf8 replaces such sequences with an error code.

peter


_______________________________________________
Containers-users mailing list
Containers-users@lists.ocaml.org
http://lists.ocaml.org/listinfo/containers-users

Reply via email to