Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?
On Thu, Feb 07, 2019 at 10:21:49PM +0100, Ansgar wrote: (And you get 24-hour time, but very strange Endian in C.UTF-8: WEEKDAY MMM DD HH:MM:SS TZ while en_US.UTF-8 has at least DD MMM ... Having -MM-DD HH:MM:SS[+] instead would be much nicer if we were to create an arbitrary set of new rules for a new universal "en" locale ;-) ) Exactly: using "C" implies compatability with the old POSIX rules, "en" implies you can do whatever you want. :)
Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?
Michael Stone writes: > On Thu, Feb 07, 2019 at 09:20:07PM +0100, Ondřej Surý wrote: >>en_DK.UTF-8 is a good default locale? > > I think the suggestion of just "en" made the most sense--specify the > language and an arbitrary set of rules that aren't tied to a specific > country. C.UTF-8 has the default of already existing and always being available. Other locales are not guaranteed to be around (well, except "C"). FWIW systemd will set LANG=C.UTF-8 if no other locale is specified since systemd 240: * When no /etc/locale.conf file exists (and hence no locale settings are in place), systemd will now use the "C.UTF-8" locale by default, and set LANG= to it. This locale is supported by various distributions including Fedora, with clear indications that upstream glibc is going to make it available too. This locale enables UTF-8 mode by default, which appears appropriate for 2018. That seems a reasonable choice and d-i could just use that by not specifying any locale if the user wishes so. (There is a small problem that getty@.service unsets LANG again.) (And you get 24-hour time, but very strange Endian in C.UTF-8: WEEKDAY MMM DD HH:MM:SS TZ while en_US.UTF-8 has at least DD MMM ... Having -MM-DD HH:MM:SS[+] instead would be much nicer if we were to create an arbitrary set of new rules for a new universal "en" locale ;-) ) Ansgar
Re: C + POSIX locale question
On Thu, Feb 7, 2019 at 10:35 AM Michael Stone wrote: > Per the standard, the C locale is supposed to be a synonym for the POSIX > locale. Can someone give a quick explanation for why in debian the C > locale definition is 162k and the POSIX locale is 8k? Shouldn't they be > identical? The C/POSIX locales are built *into* glibc, they are not distinct locales. I expect that you are looking at the C.UTF-8 locale which is a distinct locale that supports UTF-8 and is not related to the C/POSIX locales. Cheers, Carlos.
Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?
On Thu, Feb 07, 2019 at 04:08:21PM +0100, Ansgar wrote: > On Thu, 2019-02-07 at 09:59 -0500, Michael Stone wrote: > > POSIX specifies the output format for various utilities in the C locale, > > which defeats my understanding of the purpose of this proposal. So, for > > example, in ls -l: > > I don't think the "C.UTF-8" locale covered by any promises POSIX might > make for "C". (Nor is what happens when no LC_*, LANG vairables are > set at all.) Here's the latter: http://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html "POSIX" Specifies the minimal environment for C-language translation called the POSIX locale. The POSIX locale is the default global locale at entry to main(). "C" Equivalent to "POSIX". "" Specifies an implementation-defined native environment. [CX] The determination of the name of the new locale for the specified category depends on the value of the associated environment variables, LC_* and LANG; see XBD Locale and Environment Variables. So a process that doesn't call setlocale() at all must work within the requirements of "C" (which C.UTF-8 almost meets standards-wise, but too many programs misinterpret as raw 8-bit for a switch to be safe) -- but a process that _does_ call setlocale("") when the env vars are unset may get anything reasonable. I argue that C.UTF-8 is more reasonable than "C". -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands ⢿⡄⠘⠷⠚⠋⠀ for Privacy. ⠈⠳⣄
Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?
On Thu, Feb 07, 2019 at 02:40:06PM +, Simon McVittie wrote: > On Thu, 07 Feb 2019 at 14:05:33 +0100, Adam Borowski wrote: > > a locale for a silly country with weird customs > > Please don't take this tone. Insulting people who disagree with you[1] > is rarely an effective way to persuade them that you're right and > they're wrong. I don't quite see how speech peppered with words like "imperialism" could be taken seriously as insults, aside from bad-old-days soviet propaganda. If I still didn't mark the tone as in jest enough, then apologies. > > • promoting C.UTF-8 in our user interfaces (allowing to select it in d-i, > > making dpkg-reconfigure locales DTRT, making it the d-i default) > > I think this is exactly the "international/culture-neutral English" > locale that you're looking for. Yeah. > (Well, the C/POSIX locale is the formally > standardized form of that, but breaks text outside the ASCII range; > C.UTF-8 is the C locale with Unicode support added.) Not really -- behaviour of C/POSIX for characters above 126 is _undefined_. That locale is defined in a weird convoluted way designed to allow both ASCII and IBM's encryption standards (aka variants of EBCDIC). The only way I found so far that our current C.UTF-8 fails POSIX's demands for "C" is: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html 7.3.1 LC_CTYPE blank # In the POSIX locale, only the and shall be included. Another point is that setlocale(..., "") if the env vars are unset is implementation-defined. I'd change it to result not in "C" but in C.UTF-8. > > • inventing a new locale "en" without a country bias > > -- good in the long term but problematic a month before freeze > > I assume this would be a UTF-8 locale like en_US.utf8 and en_GB.utf8, > so probably en.utf8, possibly with a simple "en" alias? Yeah, with a non-US time and date format. Possibly also collation where a space is not ignored -- ie, dictionary order common to most of the world but not the US -- "foo xxx" < "foobar". C.* does this, en_US.* does not. Even worse, en_US ignores all (or most) non-letters, inconsistently with other operating systems and libcs: glibc: 0 9 0.9.0 0.9.0-a0-foo-bar ({---=[ 0.9.0-a11 ]=---}) 0.9.0-a17-quux (0.9.0-a2) 0.9.0+a99-1 0.9.0-rc1 0.9.1 0 9 9 ({---=[ 0.9-a11 ]=---}) 0.9 ab Windows, musl, ...: (0.9.0-a2) ({---=[ 0.9.0-a11 ]=---}) ({---=[ 0.9-a11 ]=---}) 0 9 0 9 9 0.9 ab 0.9.0 0.9.0+a99-1 0.9.0-a0-foo-bar 0.9.0-a17-quux 0.9.0-rc1 0.9.1 > As you say, I don't think a country-neutral specifically-English locale > is going to happen before buster. On the other hand, adding it but not using by default would probably be a very good idea: in the future, it'd avoid situations where ssh-ing from one machine to one running stable would have the default locale fail. > How would this locale differ from C.UTF-8? Is the only difference > that C.UTF-8 has strict lexicographical sorting, whereas "en" would have > case-insensitive sorting like en_GB.utf8 does? (If that's the only > difference, then perhaps something like "LANG=C.utf8 LC_COLLATE=en_US.utf8" > is enough.) I can't recall any other difference out of the top of my head, yeah. LC_COLLATE=en_US.UTF-8 has that ignoring space nastiness, though. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands ⢿⡄⠘⠷⠚⠋⠀ for Privacy. ⠈⠳⣄
Bug#874160: systemd _sometimes_ does this
On Thu, Feb 07, 2019 at 03:36:59PM +0100, Adam Borowski wrote: > Turns out systemd independently does this, although not in every case. > If you have unset locale, it changes it to C.UTF-8 for X (gdm3) but not > for console logins. Turns out that console logins are the only exception; ansgar found this: https://github.com/systemd/systemd/issues/11668 > It'd be good to have this consistent both for X vs console, and systemd vs > other inits/rc systems. You said: # Even is the change is small, that might still change the behavior of # some programs, so I am not sure we want to diverge from upstream and # other distributions here. So with systemd forcing this, the result is us diverging from most other distributions only when init/rc is not systemd. Thus, could you please apply this patch -- or, should I bother sysvinit folks (and perhaps implement this in openrc) instead? Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands ⢿⡄⠘⠷⠚⠋⠀ for Privacy. ⠈⠳⣄
C + POSIX locale question
Per the standard, the C locale is supposed to be a synonym for the POSIX locale. Can someone give a quick explanation for why in debian the C locale definition is 162k and the POSIX locale is 8k? Shouldn't they be identical?
Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?
On Thu, Feb 07, 2019 at 04:08:21PM +0100, Ansgar wrote: On Thu, 2019-02-07 at 09:59 -0500, Michael Stone wrote: On Thu, Feb 07, 2019 at 02:40:06PM +, Simon McVittie wrote: > How would this locale differ from C.UTF-8? Is the only difference > that C.UTF-8 has strict lexicographical sorting, whereas "en" would > have > case-insensitive sorting like en_GB.utf8 does? (If that's the only > difference, then perhaps something like "LANG=C.utf8 > LC_COLLATE=en_US.utf8" > is enough.) POSIX specifies the output format for various utilities in the C locale, which defeats my understanding of the purpose of this proposal. So, for example, in ls -l: I don't think the "C.UTF-8" locale covered by any promises POSIX might make for "C". (Nor is what happens when no LC_*, LANG vairables are set at all.) IMO, the principle of least surprise applies here: if C.UTF-8 is meant to be something other than the C locale with UTF-8 semantics added, it should be called something other than C, no?
Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?
On Thu, 2019-02-07 at 09:59 -0500, Michael Stone wrote: > On Thu, Feb 07, 2019 at 02:40:06PM +, Simon McVittie wrote: > > How would this locale differ from C.UTF-8? Is the only difference > > that C.UTF-8 has strict lexicographical sorting, whereas "en" would > > have > > case-insensitive sorting like en_GB.utf8 does? (If that's the only > > difference, then perhaps something like "LANG=C.utf8 > > LC_COLLATE=en_US.utf8" > > is enough.) > > POSIX specifies the output format for various utilities in the C locale, > which defeats my understanding of the purpose of this proposal. So, for > example, in ls -l: I don't think the "C.UTF-8" locale covered by any promises POSIX might make for "C". (Nor is what happens when no LC_*, LANG vairables are set at all.) Ansgar
Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?
On Thu, Feb 07, 2019 at 02:40:06PM +, Simon McVittie wrote: How would this locale differ from C.UTF-8? Is the only difference that C.UTF-8 has strict lexicographical sorting, whereas "en" would have case-insensitive sorting like en_GB.utf8 does? (If that's the only difference, then perhaps something like "LANG=C.utf8 LC_COLLATE=en_US.utf8" is enough.) POSIX specifies the output format for various utilities in the C locale, which defeats my understanding of the purpose of this proposal. So, for example, in ls -l: (quoting http://pubs.opengroup.org/onlinepubs/009695399/utilities/ls.html) The field shall contain the appropriate date and timestamp of when the file was last modified. In the POSIX locale, the field shall be the equivalent of the output of the following date command: date "+%b %e %H:%M" if the file has been modified in the last six months, or: date "+%b %e %Y" (where two s are used between %e and %Y ) if the file has not been modified in the last six months or if the modification date is in the future, except that, in both cases, the final produced by date shall not be included and the output shall be as if the date command were executed at the time of the last modification date of the file rather than the current time. When the LC_TIME locale category is not set to the POSIX locale, a different format and order of presentation of this field may be used. Mike Stone
Bug#874160: systemd _sometimes_ does this
Turns out systemd independently does this, although not in every case. If you have unset locale, it changes it to C.UTF-8 for X (gdm3) but not for console logins. It'd be good to have this consistent both for X vs console, and systemd vs other inits/rc systems. Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands ⢿⡄⠘⠷⠚⠋⠀ for Privacy. ⠈⠳⣄
Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?
On Thu, 07 Feb 2019 at 14:05:33 +0100, Adam Borowski wrote: > a locale for a silly country with weird customs Please don't take this tone. Insulting people who disagree with you[1] is rarely an effective way to persuade them that you're right and they're wrong. > • promoting C.UTF-8 in our user interfaces (allowing to select it in d-i, > making dpkg-reconfigure locales DTRT, making it the d-i default) I think this is exactly the "international/culture-neutral English" locale that you're looking for. (Well, the C/POSIX locale is the formally standardized form of that, but breaks text outside the ASCII range; C.UTF-8 is the C locale with Unicode support added.) > • inventing a new locale "en" without a country bias > -- good in the long term but problematic a month before freeze I assume this would be a UTF-8 locale like en_US.utf8 and en_GB.utf8, so probably en.utf8, possibly with a simple "en" alias? As you say, I don't think a country-neutral specifically-English locale is going to happen before buster. How would this locale differ from C.UTF-8? Is the only difference that C.UTF-8 has strict lexicographical sorting, whereas "en" would have case-insensitive sorting like en_GB.utf8 does? (If that's the only difference, then perhaps something like "LANG=C.utf8 LC_COLLATE=en_US.utf8" is enough.) smcv [1] As it happens, I do agree with you that AM/PM time and middle-endian dates are not a good default; but I'm from a different English-speaking country with its own weird customs.
Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?
Peter Silva writes ("Re: Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?"): > iso_en ? That sounds smart... > > English for most of the world that aren't necessarily native English speakers? > https://en.wikipedia.org/wiki/International_English > Use ISO dates and stuff, and pick a random spelling. As a Canadian, I'm pretty > sure about colour, but unclear about whether we should standardize on disc. > Dates should be iso, even better if it used UTC as the timezone. This would > be a default that would include US keyboard bindings (by default.) > as the easiest thing to default to during installation, etc.. but perhaps I > should be disqualified, being both a unix greybeard, and a recovering ntp > admin. I don't see that this exists as a locale already. It is probably too late for buster to introduce it. Realistically our sensible choices for the default are C.UTF-8 One of en_{AU,GB,NZ}.UTF-8 All of these would be better than en_US.UTF-8 for the reasons given by Adam (although, Adam, really, could you try to be a little less rude?). The middle-endian dates and 12-hour clock are particularly poor defaults. Ian. -- Ian JacksonThese opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?
iso_en ? That sounds smart... English for most of the world that aren't necessarily native English speakers? https://en.wikipedia.org/wiki/International_English Use ISO dates and stuff, and pick a random spelling. As a Canadian, I'm pretty sure about colour, but unclear about whether we should standardize on disc. Dates should be iso, even better if it used UTC as the timezone. This would be a default that would include US keyboard bindings (by default.) as the easiest thing to default to during installation, etc.. but perhaps I should be disqualified, being both a unix greybeard, and a recovering ntp admin. On Thu, Feb 7, 2019 at 8:06 AM Adam Borowski wrote: > On Thu, Feb 07, 2019 at 02:55:33PM +0500, Roman Mamedov wrote: > > So for those of us (the entire world), who have been relying on this > behavior: > > > > > * en_US (.UTF-8) is used as the default English locale for all places > that > > > don't have a specific variant (and often even then). Generally, > technical > > > users use English as a system locale > > > > How do we roll-back what you have done here, and still get en_US.UTF-8 > while > > retaining the proper 24-hour time? > > > dpkg-reconfigure locales does not list "C.UTF-8" in the main "locales to > > generate" list, but does offer it on the next screen as "Default locale > for the > > system environment". After selecting it, we get: > > > > # locale > > LANG=C.UTF-8 > > LANGUAGE= > > LC_TIME="en_US.UTF-8" > > LC_ALL=en_US.UTF-8 > > > > But still: > > > > # date > > Thu 07 Feb 2019 09:53:47 AM UTC > > The root of this issue is worth raising on debian-devel: > > The en_US.UTF-8 locale has two purposes: > • a locale for a silly country with weird customs (such as time going in > four discontinuous segments during the day, writing date in a > middle-endian format, an unit being shorter on land than surveyed but > longer than that in the air, or another unit changing when measuring wet > vs dry vs slightly moist things) > • base locale for the most of the world save for a few places (UK, AU, ...) > that have their specific locale -- and often even they use en_US for > consistency reasons. > > > So I wonder what would be the best solution? I can think of: > • promoting C.UTF-8 in our user interfaces (allowing to select it in d-i, > making dpkg-reconfigure locales DTRT, making it the d-i default) > -- nice for Unix greybeards, but some users might want case-insensitive > sort, etc > • inventing a new locale "en" without a country bias > -- good in the long term but problematic a month before freeze > -- could be good to have it anyway but not use it until after buster > • ask glibc maintainers to revert the cherry-pick in #877900 for buster, > then pick a long-term solution > > > One particular regression caused by this change is sorting no longer > working: "12:01am" "1:01am" "12:01pm" "1:01pm" will be ordered wrong. > > On one hand, leftpondians may be entitled to their own locale. On the > other, let's punish the bastards for imperialism and imposing their own > settings on the rest of the world. :p > > > Meow! > -- > ⢀⣴⠾⠻⢶⣦⠀ > ⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands > ⢿⡄⠘⠷⠚⠋⠀ for Privacy. > ⠈⠳⣄ > >
Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?
On Thu, Feb 07, 2019 at 02:55:33PM +0500, Roman Mamedov wrote: > So for those of us (the entire world), who have been relying on this behavior: > > > * en_US (.UTF-8) is used as the default English locale for all places that > > don't have a specific variant (and often even then). Generally, technical > > users use English as a system locale > > How do we roll-back what you have done here, and still get en_US.UTF-8 while > retaining the proper 24-hour time? > dpkg-reconfigure locales does not list "C.UTF-8" in the main "locales to > generate" list, but does offer it on the next screen as "Default locale for > the > system environment". After selecting it, we get: > > # locale > LANG=C.UTF-8 > LANGUAGE= > LC_TIME="en_US.UTF-8" > LC_ALL=en_US.UTF-8 > > But still: > > # date > Thu 07 Feb 2019 09:53:47 AM UTC The root of this issue is worth raising on debian-devel: The en_US.UTF-8 locale has two purposes: • a locale for a silly country with weird customs (such as time going in four discontinuous segments during the day, writing date in a middle-endian format, an unit being shorter on land than surveyed but longer than that in the air, or another unit changing when measuring wet vs dry vs slightly moist things) • base locale for the most of the world save for a few places (UK, AU, ...) that have their specific locale -- and often even they use en_US for consistency reasons. So I wonder what would be the best solution? I can think of: • promoting C.UTF-8 in our user interfaces (allowing to select it in d-i, making dpkg-reconfigure locales DTRT, making it the d-i default) -- nice for Unix greybeards, but some users might want case-insensitive sort, etc • inventing a new locale "en" without a country bias -- good in the long term but problematic a month before freeze -- could be good to have it anyway but not use it until after buster • ask glibc maintainers to revert the cherry-pick in #877900 for buster, then pick a long-term solution One particular regression caused by this change is sorting no longer working: "12:01am" "1:01am" "12:01pm" "1:01pm" will be ordered wrong. On one hand, leftpondians may be entitled to their own locale. On the other, let's punish the bastards for imperialism and imposing their own settings on the rest of the world. :p Meow! -- ⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands ⢿⡄⠘⠷⠚⠋⠀ for Privacy. ⠈⠳⣄
Bug#738575: valentine's day
for all lovers of t-shirt ideas to offer or to wear on Valentine's Day click on the amazon link below https://amzn.to/2CdX642
Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?
So for those of us (the entire world), who have been relying on this behavior: > * en_US (.UTF-8) is used as the default English locale for all places that > don't have a specific variant (and often even then). Generally, technical > users use English as a system locale How do we roll-back what you have done here, and still get en_US.UTF-8 while retaining the proper 24-hour time? dpkg-reconfigure locales does not list "C.UTF-8" in the main "locales to generate" list, but does offer it on the next screen as "Default locale for the system environment". After selecting it, we get: # locale LANG=C.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=en_US.UTF-8 But still: # date Thu 07 Feb 2019 09:53:47 AM UTC -- With respect, Roman
Bug#738575: valentine's day tees
for all lovers of t-shirt ideas to offer or to wear on Valentine's Day click on the amazon link below https://amzn.to/2CdX642 best regards