UTF-8 is compatible with char * because the codes between 0 and 127 are
the same (in code and size). This is different from 16 bit Unicode. So,
many functions that are intended for ANSI will work with UTF-8. It is now
a question of OS support.
See the following program:
#include <stdio.h>
#include <windows.h>
void main() {
char * m="Конференция u Čačku Οὐχὶ ταὐτὰ παρίσταταί \n";
// SetConsoleOutputCP(65001);
printf(m);
}
Compile it and start it under Windows (I tried Windows 10) . The text is
incorrectly written. Now uncomment the line SetConsoleOutputCP(65001);
and you will see the text..
However, strlen still returns string length in bytes, not in characters.
Also m[5] accesses fifth byte, not fifth character. So, you need to prepare
your own versions of string handling functions.
On Thu, Jun 9, 2022 at 6:34 AM Larry Doolittle via Tinycc-devel <
[email protected]> wrote:
> lrd -
>
> On Thu, Jun 09, 2022 at 12:01:09PM +0800, lrt via Tinycc-devel wrote:
> > Who can tell me how to make TCC support utf8.
> > I want to use the Unicode API.
>
> Just .. don't.
>
> ‘Trojan Source’ Bug Threatens the Security of All Code
> November 1, 2021
>
> https://krebsonsecurity.com/2021/11/trojan-source-bug-threatens-the-security-of-all-code/
> (as seen on slashdot)
>
> - Larry
>
> _______________________________________________
> Tinycc-devel mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/tinycc-devel
>
_______________________________________________
Tinycc-devel mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/tinycc-devel