Re: LC_CTYPE=UTF-8
Hi Alan, Alan Coopersmith wrote on Thu, Jun 25, 2020 at 12:13:33PM -0700: > On 6/25/20 8:31 AM, Ingo Schwarze wrote: >> Whether to standardize only C.UTF-8 or both C.UTF-8 and POSIX.UTF-8 >> as synonyms looks a bit like asking for the best colour of a bikeshed. >> Given that the standard already contains the redundancy of requiring >> both "C" and "POSIX", maybe it is more consistent to also require >> both "C.UTF-8" and "POSIX.UTF-8", but i don't think that matters >> greatly. > The only thought I had along those lines was that I thought the "C" > locale came from the C standard, and might be best left to the C > committee to standardize, while this group controls the "POSIX" > locale definition. I suspect those following the POSIX standards > would end up implementing both, regardless of which specification > defines each. That sounds quite reasonable to me. Yours, Ingo
Re: LC_CTYPE=UTF-8
Hi Alan, Alan Coopersmith wrote on Thu, Jun 25, 2020 at 07:59:39AM -0700: > On 6/25/20 6:33 AM, Hans Aberg wrote: >> Perhaps there should be a default UTF-8 locale: It seems that the >> current construct does not apply so well to it. > If the goal is to standardize existing behavior the standard could define > the C.UTF-8 locale (or perhaps a POSIX.UTF-8 locale) that a number of > systems already have, which is the standard C/POSIX locale with just the > character set changed to UTF-8 instead. This idea makes a lot of sense to me. If the Austin Group decides that it wants to go into that direction, i would make sure that both OpenBSD and the software i publish use that name for a locale with these properties and consistently recommend using that name. Both already support a locale with these properties and select it if the user asks for C.UTF-8 or POSIX.UTF-8, but so far, they recommend that users specify en_US.UTF-8 (for historical reasons), which is a bit unfortunate because it looks like requesting cultural conventions for a particular country, which is not the intention. Whether to standardize only C.UTF-8 or both C.UTF-8 and POSIX.UTF-8 as synonyms looks a bit like asking for the best colour of a bikeshed. Given that the standard already contains the redundancy of requiring both "C" and "POSIX", maybe it is more consistent to also require both "C.UTF-8" and "POSIX.UTF-8", but i don't think that matters greatly. Yours, Ingo
Re: LC_CTYPE=UTF-8
Hi Hans, Hans Aberg wrote on Thu, Jun 25, 2020 at 10:15:03AM +0200: > MacOS sets as default LC_CTYPE=UTF-8, not appearing in the 'locale > -a' list. Then some software interprets this as though the locale > is C/POSIX, disregards the UTF-8 encoding, and converts all non-ASCII > (high bit set) char's into octal escape sequences. What is the > correct interpretation here? The correct interpretation of "LC_CTYPE=UTF-8" is whatever the documentation of the respective operating system says. All POSIX says is: https://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html The locale argument is a pointer to a character string containing the required setting of category. The contents of this string are implementation-defined. POSIX only specifies the meaning of the strings "C" and "POSIX"; any others are implementation-defined. For example, the OpenBSD manual page says: https://man.openbsd.org/setlocale.3 The syntax and semantics of the locale argument are not standardized and vary among operating systems. On OpenBSD, if the locale string ends with ".UTF-8", the UTF-8 locale is selected; otherwise, the "C" locale is selected, which uses the ASCII character set. If the locale contains a dot but does not end with ".UTF-8", setlocale() fails. Which is indeed true here: $ uname -a OpenBSD isnote.usta.de 6.7 GENERIC.MP#224 amd64 $ LC_CTYPE=FOOBAR.UTF-8 locale charmap UTF-8 $ LC_CTYPE=UTF-8 locale charmap US-ASCII To the best of my knowledge, we are POSIX-compliant in this respect. Other systenms are of course free to make different choices. Even though POSIX says this is implementation-defined, which implies that operating systems are expected to document their specific rules, some fail to do so, for example: https://man.bsd.lv/FreeBSD-12.0/setlocale.3 https://man.bsd.lv/NetBSD-8.1/setlocale.3 Some do specify it. For example, according to https://man.bsd.lv/Linux-5.06/setlocale.3 the string "UTF-8" would be invalid because it lacks the "language" part which is mandatory on Linux. For example, on a very old Linux system i have access to: $ uname -a Linux donnerwolke.asta.kit.edu 4.9.0-0.bpo.3-686 #1 SMP \ Debian 4.9.30-2+deb9u5~bpo8+1 (2017-09-28) i686 GNU/Linux $ LC_CTYPE=en_US.UTF-8 locale charmap UTF-8 $ LC_CTYPE=UTF-8 locale charmap locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory ANSI_X3.4-1968 Yours, Ingo
Re: behaviour upon non-matching globs (Was: Arrays)
Hi Steven, Steven Penny wrote on Sun, Jun 02, 2019 at 10:48:53AM -0500: > On Sun, Jun 2, 2019 at 10:39 AM Chet Ramey wrote: >> You might want to reconsider this proposal, given the pervasive use of >> tools like grep as filters and as components of pipelines. > Its misleading that you omitted the next paragraph. I admire Chet's polite understatement, it made my day when i first saw it. I would have expressed my reaction more bluntly: You are welcome to design your own operating system elsewhere, but what you propose is off-topic on this list, which is about UNIX. Please just stop talking about totally breaking basic tools like grep and ls. And no, adding knobs is not an excuse for that. Back to lurking, Ingo
Re: In defence of compound aliases
Hi Martijn, Martijn Dekker wrote on Mon, Jan 14, 2019 at 06:59:14AM +0100: > Indeed, it's not as if aliases are some sort of strange anomaly in the > programming world. In other languages, similar features are usually > called 'macros'. C library sources are loaded with them. It depends. Some software is indeed riddled with macros. Other software does actively avoid them where possible. > Nobody calls that a bad idea, or tells library programmers that they > should use functions instead -- We do exactly that in OpenBSD, and we say exactly that, even with a special emphasis, and we have been saying so for a long time. Weeding out as much usage of macros as possible has been among the most important refactoring techniques employed in LibreSSL, but not only there; it is a general consensus diligently applied throughout the system. Using a C preprocessor macro containing unbalanced braces or parantheses almost guarantees to get your patch rejected outright and instantly. Even using normal function-style macros is among the most frequent reasons for getting patches rejected or tweaking asked for before commit. I actively avoid macros even for integer constants where possible and try to use enums instead. About the only use of macros that is uncontroversial is for named bits in unsigned integer variables intended for use with "|" and "&" operators, i.e. to store groups of boolean flags. In a nutshell, you more easily get away with using "goto" than with using macros, and you certainly get away more easily with using "goto" in creative ways than with using macros in creative ways. > because macros/aliases and functions are good at different things. Macros are best at making code unreadable, prone to bugs, hostile to debugging, and at giving away many benefits of compile-time checks and type safety. > Creative use of aliases may be a Bad Idea for casual shell scripters, We definitely consider creative use of macros a Bad Idea even for the most capable C programmers. Also note that we generally advise against using the shell for any kind of serious programming since there a few languages making safe programming practices as hard as the shell, even when used in very conventional ways without any special creativity. Note that i'm intentionally not commenting on the standardization of shell aliases. I don't really care how they are standardized, and i don't think i would ever use them in any non-trivial shell program. Even in interactive shell use, i only use a very small number of aliases in the most trivial ways - and avoid them for mostly the same reasons as frowning upon C macros. I merely thought it might be useful to point out that the statement of "Nobody calls that a bad idea" grossly mismatches reality. Now, back to lurking... Yours, Ingo