Re: [containers-users] Possible additions to Containers and Friends

peter frey Thu, 01 Mar 2018 10:46:29 -0800


I'm not sure I understand, what is the point of supporting "more" than

utf8?

In the original utf8 standard the encoding is:
The code is encoded as a string of length 1 + additional length.

The additional length is a 0-ary encoding of the length '10' to'1111110' (i.e.: 1.. 6)

The first char supplies 1 to 7 bits; the following chars supply 6 bits each.
The maximal # bits is 31 bits. (5 * 6 + low bit from 0-byte).
I am using this encoding but it is no longer 'standard'.  Instead the range
  0xD7FF .. 0xE000 is excluded from the TOTAL range 0 .. 0x10FFFF
In Uutf8 only this range is accepted.

All I tried to say is: My code does not encode the current standard; infact it does

little checking. (Encodes more - checks less).

Calling it utf31 would be an informal way of signaling this;
we can call anything what we want to call it.

I will write a filter that does verification; especially :

A code that has length 1 + n must have n bytes following with format10xxxxxx; if the decoder encounters 0xxxxxxx or x1xxxxxxx or end ofstring; that is an error.

Uutf8 replaces such sequences with an error code.

peter


_______________________________________________
Containers-users mailing list
Containers-users@lists.ocaml.org
http://lists.ocaml.org/listinfo/containers-users

Re: [containers-users] Possible additions to Containers and Friends

Reply via email to