UTF-8  is compatible with char * because the codes between 0 and 127 are
the same (in code and size). This is different from 16 bit Unicode. So,
many functions that are intended for ANSI will work with UTF-8.  It is now
a question of OS support.

See the following program:

#include <stdio.h>
#include <windows.h>
void main() {
  char * m="Конференция u Čačku Οὐχὶ ταὐτὰ παρίσταταί \n";
 // SetConsoleOutputCP(65001);
  printf(m);
}

Compile it and start it under Windows (I tried Windows 10) . The text is
incorrectly written. Now uncomment the  line SetConsoleOutputCP(65001);
and  you will see the text..
However, strlen still returns string length in bytes, not in characters.
Also m[5] accesses fifth byte, not fifth character. So, you need to prepare
your own versions of string handling functions.

On Thu, Jun 9, 2022 at 6:34 AM Larry Doolittle via Tinycc-devel <
[email protected]> wrote:

> lrd -
>
> On Thu, Jun 09, 2022 at 12:01:09PM +0800, lrt via Tinycc-devel wrote:
> > Who can tell me how to make TCC support utf8.
> > I want to use the Unicode API.
>
> Just .. don't.
>
> ‘Trojan Source’ Bug Threatens the Security of All Code
> November 1, 2021
>
> https://krebsonsecurity.com/2021/11/trojan-source-bug-threatens-the-security-of-all-code/
> (as seen on slashdot)
>
>   - Larry
>
> _______________________________________________
> Tinycc-devel mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/tinycc-devel
>
_______________________________________________
Tinycc-devel mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/tinycc-devel

Reply via email to