Re: Unicode String Models

Daniel Bünzli via Unicode Sun, 09 Sep 2018 06:46:59 -0700

Hello, 

I find your notion of "model" and presentation a bit confusing since it 
conflates what I would call the internal representation and the API.


The internal representation defines how the Unicode text is stored and should 
not really matter to the end user of the string data structure. The API defines 
how the Unicode text is accessed, expressed by what is the result of an 
indexing operation on the string. The latter is really what matters for the 
end-user and what I would call the "model".

I think the presentation would benefit from making a clear distinction between 
the internal representation and the API; you could then easily summarize them 
in a table which would make a nice summary of the design space.

I also think you are missing one API which is the one with ECG I would favour: 
indexing returns Unicode scalar values, internally be it whatever you wish 
UTF-{8,16,32} or a custom encoding. Maybe that's what you intended by the "Code 
Point Model: Internal 8/16/32" but that's not what it says, the distinction 
between code point and scalar value is an important one and I think it would be 
good to insist on it to clarify the minds in such documents.

Best, 

Daniel

Re: Unicode String Models

Reply via email to