Speaking-too-soon is a valid and powerful code verification technique; it
exploits tempting the bugs to make their move.
--
SP
___
Containers-users mailing list
Containers-users@lists.ocaml.org
http://lists.ocaml.org/listinfo/containers-users
Of course I spoke too soon, and missed so validation cases (that would
have been accepted by Peter's code).
In particular, I just learnt about some interesting corner cases of UTF8,
namely overlong encodings.
If anyone is knowledgeable about UTF8, reviewing the code would be
greatly appreciated!
I merged and adapted the code from Peter:
https://github.com/c-cube/ocaml-containers/blob/master/src/core/CCUtf8_string.mli
https://github.com/c-cube/ocaml-containers/blob/master/src/core/CCUtf8_string.ml
it's stricter (only accepts valid UTF8) and the random tests should
ensure that it agrees
I'm not sure I understand, what is the point of supporting "more" than
utf8?
In the original utf8 standard the encoding is:
The code is encoded as a string of length 1 + additional length.
The additional length is a 0-ary encoding of the length '10' to
'110' (i.e.: 1.. 6)
The first char
Simon occasionally includes code from some other part of the libraries
to avoid requiring, say, Gen to access Sequence or Containers; I don't
remember offhand. In the case of some tiny piece of code thats
sensible. (And so far that is all I have provided)
Pervasives has now a type uchar
Well, there's the standard uchar type, I think compatibility is achievable :) ___
Containers-users mailing list
Containers-users@lists.ocaml.org
http://lists.ocaml.org/listinfo/containers-users
Le Sat, 24 Feb 2018, Drup wrote:
> Shouldn't we just standardize on bunzli's libraries (including the new
> https://github.com/dbuenzli/utext) instead of trying to re-write code that
> usually ends up being quite subtle in each standard library ?
We could build on uutf, it's relatively small and
> Thanks for the suggestions. I'm no expert in unicode, but I do agree
> that such basic functionalities should be more easily available.
> Maybe a `Ustring` module in containers would make sense (as a private
> alias to `string`); most functionalities below would fit there
Is this for