On 2024-11-04 6:43, A bughunter via Unicode wrote:
No, it does not answer my question.
Yes 1 byte is 8 bits and UTF-8 is Unicode Text Format - 8 bit. Then
you give me a manual page which is clearly for Unicode version 16.
When I say relevent version, wheather you call it core or not, I
anticipate you would ask about what implimentation of UTF-8: the
answer is the relevent implimentation is android 13 libbionic (bionic
C) which uses UTF-8.
Without the sourcecode you could only guess as to which unicode
version bionicC uses. With slight assumption, android 13 is open
source AOSP and, it would be possible to point out the exact unicode
used in it however this assumes my runtime matches a generic AOSP
android 13 source. So then the way in which I framed my question does
probe as to if there is any way to display the compile time UTF-8.
Sometimes there are --version options.
The part you do not seem to understand is the full circle of
authentication of a checksummed text. In order to fully authenticate:
the codepage of the character to glyph map must be known. Anything
further on this checksumming process would not be directly on topic of
this mailing list and you may ask me on the side. Although stating the
usecase is worth mentioning.
You may go to https://android.googlesource.com/platform/bionic/ to check
the source, but it is a C library, so it may not even know about Unicode
(and UTF-8): it may just care that strings are terminated with \0. And
please note: we are not Android, so you are in the wrong place.
And you will hate me for next link: but we give you the resources, you
need to do the homework and read and look the details. Handling
characters is a huge task, done by many libraries. On some other mail,
it seems you care about position of a character (column). The simple
way: "fixed width characters": Unicode has a table of single and double
width characters (and also characters that do not take space). Double
width are used on complex scripts (e.g. Asian).
But usually things are much more complex, and a lot of working in
progress (so the link on how Google and Android are changing the stack):
there is a text layout library (mixing languages, left-to-right,
right-to-left, justification, italic/bold, paragraphs, etc.). I think
Android uses Minikin. Left-To-Right/Right-ToLeft may use a different
library. Then you have text shaper (Harfbuzz in Android and most
browsers): it find the glyph to use, the dimension, and where to put it.
And you have other libraries to select font, and to display font:
"rendering" (and anti-aliasing) according resolution and other factors.
And probably it is much more difficult and with other libraries (right,
they may use libICU of Unicode, algorithms to split a word at end of
line, etc.).
As you see: there is much, and also a lot of working in progress. So
your task is not easy (just because you need a lot of libraries and read
sparse documentation), no need to be a genius (or a computer guy, in
fact maintainers of such tools have different backgrounds), but it is a
"lonely place". Very few people enter in it, so do not expect help here:
we didn't dare to enter there: Unicode is already too complex. Good luck!
And the link: State of Text Rendering 2024: https://behdad.org/text2024/
giacomo