Re: get the sourcecode [of UTF-8]

Giacomo Catenazzi via Unicode Tue, 05 Nov 2024 00:41:27 -0800

On 2024-11-04 6:43, A bughunter via Unicode wrote:

No, it does not answer my question.
Yes 1 byte is 8 bits and UTF-8 is Unicode Text Format - 8 bit. Thenyou give me a manual page which is clearly for Unicode version 16.When I say relevent version, wheather you call it core or not, Ianticipate you would ask about what implimentation of UTF-8: theanswer is the relevent implimentation is android 13 libbionic (bionicC) which uses UTF-8. Without the sourcecode you could only guess as to which unicodeversion bionicC uses. With slight assumption, android 13 is opensource AOSP and, it would be possible to point out the exact unicodeused in it however this assumes my runtime matches a generic AOSPandroid 13 source. So then the way in which I framed my question doesprobe as to if there is any way to display the compile time UTF-8.Sometimes there are --version options. The part you do not seem to understand is the full circle ofauthentication of a checksummed text. In order to fully authenticate:the codepage of the character to glyph map must be known. Anythingfurther on this checksumming process would not be directly on topic ofthis mailing list and you may ask me on the side. Although stating theusecase is worth mentioning.

You may go to https://android.googlesource.com/platform/bionic/ to checkthe source, but it is a C library, so it may not even know about Unicode(and UTF-8): it may just care that strings are terminated with \0. Andplease note: we are not Android, so you are in the wrong place.

And you will hate me for next link: but we give you the resources, youneed to do the homework and read and look the details. Handlingcharacters is a huge task, done by many libraries. On some other mail,it seems you care about position of a character (column). The simpleway: "fixed width characters": Unicode has a table of single and doublewidth characters (and also characters that do not take space). Doublewidth are used on complex scripts (e.g. Asian).

But usually things are much more complex, and a lot of working inprogress (so the link on how Google and Android are changing the stack):there is a text layout library (mixing languages, left-to-right,right-to-left, justification, italic/bold, paragraphs, etc.). I thinkAndroid uses Minikin. Left-To-Right/Right-ToLeft may use a differentlibrary. Then you have text shaper (Harfbuzz in Android and mostbrowsers): it find the glyph to use, the dimension, and where to put it.And you have other libraries to select font, and to display font:"rendering" (and anti-aliasing) according resolution and other factors.And probably it is much more difficult and with other libraries (right,they may use libICU of Unicode, algorithms to split a word at end ofline, etc.).

As you see: there is much, and also a lot of working in progress. Soyour task is not easy (just because you need a lot of libraries and readsparse documentation), no need to be a genius (or a computer guy, infact maintainers of such tools have different backgrounds), but it is a"lonely place". Very few people enter in it, so do not expect help here:we didn't dare to enter there: Unicode is already too complex. Good luck!


And the link: State of Text Rendering 2024: https://behdad.org/text2024/

giacomo

Re: get the sourcecode [of UTF-8]

Reply via email to