Speaking-too-soon is a valid and powerful code verification technique; it
exploits tempting the bugs to make their move.
--
SP
___
Containers-users mailing list
Containers-users@lists.ocaml.org
http://lists.ocaml.org/listinfo/containers-users
Of course I spoke too soon, and missed so validation cases (that would
have been accepted by Peter's code).
In particular, I just learnt about some interesting corner cases of UTF8,
namely overlong encodings.
If anyone is knowledgeable about UTF8, reviewing the code would be
greatly appreciated!
I merged and adapted the code from Peter:
https://github.com/c-cube/ocaml-containers/blob/master/src/core/CCUtf8_string.mli
https://github.com/c-cube/ocaml-containers/blob/master/src/core/CCUtf8_string.ml
it's stricter (only accepts valid UTF8) and the random tests should
ensure that it agrees
I'm not sure I understand, what is the point of supporting "more" than
utf8?
In the original utf8 standard the encoding is:
The code is encoded as a string of length 1 + additional length.
The additional length is a 0-ary encoding of the length '10' to
'110' (i.e.: 1.. 6)
The first char
Simon occasionally includes code from some other part of the libraries
to avoid requiring, say, Gen to access Sequence or Containers; I don't
remember offhand. In the case of some tiny piece of code thats
sensible. (And so far that is all I have provided)
Pervasives has now a type uchar
Well, there's the standard uchar type, I think compatibility is achievable :) ___
Containers-users mailing list
Containers-users@lists.ocaml.org
http://lists.ocaml.org/listinfo/containers-users
Le Sat, 24 Feb 2018, Drup wrote:
> Shouldn't we just standardize on bunzli's libraries (including the new
> https://github.com/dbuenzli/utext) instead of trying to re-write code that
> usually ends up being quite subtle in each standard library ?
We could build on uutf, it's relatively small and
> Thanks for the suggestions. I'm no expert in unicode, but I do agree
> that such basic functionalities should be more easily available.
> Maybe a `Ustring` module in containers would make sense (as a private
> alias to `string`); most functionalities below would fit there
Is this for
(*
Reading recent posts on discuss.ocal.org gives me the impression that
some tiny
number of utf related routines should be more easily available.
Container's Sequence.t and Gen.t, in particular could benefit from a
couple of
simple routines. The code below fits well into that frame work.
I