On Sun, Dec 1, 2024 at 1:08 AM Rob Landley <r...@landley.net> wrote:
> On 11/30/24 11:28, Ray Gardner wrote: > > Toybox main.c has this code to support UTF-8: > > > > // Try user's locale, but if that isn't UTF-8 merge in a UTF-8 > locale's > > // character type data. (Fall back to en_US for MacOS.) > > setlocale(LC_CTYPE, ""); > > if (strcmp("UTF-8", nl_langinfo(CODESET))) > > uselocale(newlocale(LC_CTYPE_MASK, "C.UTF-8", 0) ? : > > newlocale(LC_CTYPE_MASK, "en_US.UTF-8", 0)); > > Which is basically result of many long arguments trying to get Android, > MacOS, and various Linux distros (glibc and musl but also differing > locale installation choices) to play nice with each other. > > > For a standalone version of awk, I intend to use this instead: > > > > char *p = setlocale(LC_CTYPE, ""); > > if (!p || !strstr(p, "UTF-8")) p = setlocale(LC_CTYPE, "C.UTF-8"); > > if (!p || !strstr(p, "UTF-8")) p = setlocale(LC_CTYPE, "en_US.UTF-8"); > > > > Rationale is that this compiles on older systems that lack up to date > > locale support. > > Good luck? > > https://landley.net/toybox/faq.html#support_horizon > > > What will be the effective difference between these? I am not familiar > > with the details of locale support in C and POSIX. > > That's probably a question for Elliott. (Or possibly Rich Felker.) > i think it's really a question for someone who knows something about whatever [presumably ancient] systems you're trying to support. bionic and musl are both always utf-8, glibc and macOS let you test with nl_langinfo(3), and i don't think i've used anything that wasn't one of those since the 1990s... nl_langinfo(3) has been in posix since issue 2, so i'd assume historical systems without that also aren't going to understand _anything_ you try to do to convince them to use utf8? > According to "git annotate main.c" that code is a combination of commits > b34ed8132, 75b89012c, and bec202875 and the dates on those commits > incriminate various mailing list threads ala > > http://lists.landley.net/pipermail/toybox-landley.net/2020-December/028293.html > and > > http://lists.landley.net/pipermail/toybox-landley.net/2023-February/029452.html > and probably more. (And that's before I dig further back to see how we > got THERE. I remember arguing about C vs C.utf8 locale support in 2013, > because I remember what office break room I was checking email in and > that contract only lasted 6 months...) > > Rob > _______________________________________________ > Toybox mailing list > Toybox@lists.landley.net > http://lists.landley.net/listinfo.cgi/toybox-landley.net >
_______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net