Re: get the sourcecode [of UTF-8]

A bughunter via Unicode Tue, 05 Nov 2024 03:53:54 -0800

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

My reply to Jim is interspersed below.
 Originating Question

Where to get the sourcecode of relevent (version) UTF-8?: in order to checksum 
text against the specific encoding map (codepage).

from [email protected]

Sent with Proton Mail secure email.

On Tuesday, November 5th, 2024 at 05:52, Jim DeLaHunt <[email protected]> 
wrote:

> On 2024-11-04 20:04, A bughunter via Unicode wrote:

Hello Jim, pleased to hear back from you again. You have had the best answer so 
far. You wrote: "If by "source code" you refer to an implementation of the 
UTF-8 format, then is no single answer.", which is about the only correct 
paraphrase I have seen from the five or more pages dumped on me by this mailing 
list in reply to my single line question I have pinned at the top of this 
reply. However you did not proceed to ask the required information: which 
version is relevent?, of which info I preemptively posted in a previous reply: 
UTF-8 as implimented in android 13 libbionicC but the reference of the Unicode 
standard is also relevent and in comparison, both. Generally we put the 
standard into a computer language such as C. Therefore the Unicode V.16 
standard of UTF-8 should also be the sourcecode of the implimentation these 
converge making them synonymous at the convergence. For instance the 
mathematical code of md5sum is encased into C {function} code the formula is 
both sourcecode in the RFC and the implimentation of it. This same convergence 
will happen with a standard such as Unicode. The standard map, wherever it is 
shown to the programmer whome is to impliment the standard, is also the source 
which is imported into the C code to "impliment" it.
 To further keep the definitions in the context of this mailing list thread. I 
say bytecode and character also converge to be synonymous. Where the C language 
would call something a character on disk and RAM this is bytecode. I can say it 
either way; and I gave you that in a previous post: Call it either. I do not 
like to repeat myself because it encourages the common habit of ignoring that 
which I have already said; Yet here again I will say ASCII is a 7bit codepage 
when a programmer would impliment UTF-8 the ascii is determined by only 7 bits 
therefore in the C language this would be bit-code: I have already said this. 
Because UTF-8 is 8 bit ALL of ASCII subset in UTF-8 is then bytecode. 
Furthermore the word glyph is also synonymous with character and that bytecode 
which UTF-8 has for that character. Where they converge on the computer makes 
them synonymous: this is just the facts of speaking English. Do not assimilate 
what others are saying. Stay focused and kindof ignore the distractions.

I defined the words so there is not any issues about meanings. 
> People are trying to help, but the meaning you seem to have for certain
> words are different than the meaning we are used to when discussing
> Unicode and text encoding.
> 
> If you would like to learn how we use the words, consider reading
> Chapter 1 of the Unicode Core Specification:
> https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-1/.
> 
> Particularly, see what it says about code points ane encoding forms.
> Also have a look at the terms in the Unicode glossary, especially
> https://unicode.org/glossary/#glyph and
> 
> https://unicode.org/glossary/#glyph_image and
> 
> https://unicode.org/glossary/#character and
> 
> https://unicode.org/glossary/#code_unit.
> 
> 
No, a problem with common understanding is that ye swap and change meanings 
even with webpages, manuals, and dictionaries. I have already given definitions 
within the self contained context of this mailing list thread. My definitions 
are better but if ye wish to propose some from the consortium docs feel free to 
import them into this mailing list thread. 
> But another way to find common understanding is for you to give an
> example of what you are looking for. For instance, can you show us the
> sourcecode for the ASCII bytecode to glyph map? Or the sourcecode for
> the bytecode to glyph map of another encoding standard?
That is pretty much why I am here: I am asking you for sourcecode. RFC20 would 
be the standard and the sourcecode to be imported into the machine which is 
what ye claim UTF-8 has although RFC20 is 7bit bit-code wherever UTF-8 is 
implimented in sourcecode (glibC) it is then made into byte-code - as the RFC20 
put's it "stored in 8 bits": that is 7bit bit-code stored in 8bit UTF-8 
byte-code.

Sure I can explain how I use it to checksum text. In-fact I have already 
invited ye to ask me outside of this mailing list because checksum is not 
unicode specific.
> Also, given that sourcecode, can you explain how you use it to "checksum
> text"?

RFC20 is the same thing as a codepage. You can pull RFC20 here 
https://github.com/freedom-foundation/ASCII-format-for-Network-Interchange . 
For whatever reason IBM has taken down codepages on the website. You may note 
that ASCII has something like 5 or more versions since 1968 and while I hear 
over and over again that ASCII is a subset of UTF-8 it cannot impliment 5 
differing versions simultainiously. I would need to see the sourcecode I have 
here asked for to identify which version of ASCII UTF-8 is using in what ye 
call a " subset". We would probably be better off without UTF-8 it is more like 
a shim or (slim-jim) was added ontop of ASCII to interfere with it.
> If you can show me the sourcecode for the ASCII bytecode to glyph map,
> and explain how to checksum text, maybe we will better understand what
> you seek.
> 
> --
> . --Jim DeLaHunt, [email protected] http://blog.jdlh.com/
> (http://jdlh.com/)
> multilingual websites consultant, Vancouver, B.C., Canada
-----BEGIN PGP SIGNATURE-----
Version: ProtonMail

wnUEARYKACcFgmcqBMQJkKkWZTlQrvKZFiEEZlQIBcAycZ2lO9z2qRZlOVCu
8pkAAPtpAQClaUbaDoEnBBCWo7U1rzTLsbBMvXBF6dqH/k2gdKweYgD+JMMk
/jWqodXNoGhtWxzhbPvJnnl5Y84cqA24IcE75As=
=4Dvg
-----END PGP SIGNATURE-----

publickey - [email protected] - 0x66540805.asc
Description: application/pgp-keys

publickey - [email protected] - 0x66540805.asc.sig
Description: PGP signature

Re: get the sourcecode [of UTF-8]

Reply via email to