Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?

2019-02-07 Thread Michael Stone

On Thu, Feb 07, 2019 at 10:21:49PM +0100, Ansgar wrote:

(And you get 24-hour time, but very strange Endian in C.UTF-8:
 WEEKDAY MMM DD HH:MM:SS TZ 
while en_US.UTF-8 has at least DD MMM ...  Having
 -MM-DD HH:MM:SS[+]
instead would be much nicer if we were to create an arbitrary set of new
rules for a new universal "en" locale ;-) )


Exactly: using "C" implies compatability with the old POSIX rules, "en" 
implies you can do whatever you want. :)




Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?

2019-02-07 Thread Ansgar
Michael Stone writes:
> On Thu, Feb 07, 2019 at 09:20:07PM +0100, Ondřej Surý wrote:
>>en_DK.UTF-8 is a good default locale?
>
> I think the suggestion of just "en" made the most sense--specify the
> language and an arbitrary set of rules that aren't tied to a specific
> country.

C.UTF-8 has the default of already existing and always being available.
Other locales are not guaranteed to be around (well, except "C").

FWIW systemd will set LANG=C.UTF-8 if no other locale is specified since
systemd 240:

 * When no /etc/locale.conf file exists (and hence no locale settings
   are in place), systemd will now use the "C.UTF-8" locale by default,
   and set LANG= to it. This locale is supported by various
   distributions including Fedora, with clear indications that upstream
   glibc is going to make it available too. This locale enables UTF-8
   mode by default, which appears appropriate for 2018.

That seems a reasonable choice and d-i could just use that by not
specifying any locale if the user wishes so.

(There is a small problem that getty@.service unsets LANG again.)

(And you get 24-hour time, but very strange Endian in C.UTF-8:
  WEEKDAY MMM DD HH:MM:SS TZ 
while en_US.UTF-8 has at least DD MMM ...  Having
  -MM-DD HH:MM:SS[+]
instead would be much nicer if we were to create an arbitrary set of new
rules for a new universal "en" locale ;-) )

Ansgar



Re: C + POSIX locale question

2019-02-07 Thread Carlos O'Donell
On Thu, Feb 7, 2019 at 10:35 AM Michael Stone  wrote:
> Per the standard, the C locale is supposed to be a synonym for the POSIX
> locale. Can someone give a quick explanation for why in debian the C
> locale definition is 162k and the POSIX locale is 8k? Shouldn't they be
> identical?

The C/POSIX locales are built *into* glibc, they are not distinct locales.

I expect that you are looking at the C.UTF-8 locale which is a
distinct locale that supports UTF-8 and is not related to the C/POSIX
locales.

Cheers,
Carlos.



Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?

2019-02-07 Thread Adam Borowski
On Thu, Feb 07, 2019 at 04:08:21PM +0100, Ansgar wrote:
> On Thu, 2019-02-07 at 09:59 -0500, Michael Stone wrote:
> > POSIX specifies the output format for various utilities in the C locale, 
> > which defeats my understanding of the purpose of this proposal. So, for 
> > example, in ls -l:
> 
> I don't think the "C.UTF-8" locale covered by any promises POSIX might
> make for "C".  (Nor is what happens when no LC_*, LANG vairables are
> set at all.)

Here's the latter:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html

"POSIX"
Specifies the minimal environment for C-language translation called the
POSIX locale.  The POSIX locale is the default global locale at entry to
main().
"C"
Equivalent to "POSIX".
""
Specifies an implementation-defined native environment.  [CX] The
determination of the name of the new locale for the specified category
depends on the value of the associated environment variables, LC_* and
LANG; see XBD Locale and Environment Variables.

So a process that doesn't call setlocale() at all must work within the
requirements of "C" (which C.UTF-8 almost meets standards-wise, but too many
programs misinterpret as raw 8-bit for a switch to be safe) -- but a process
that _does_ call setlocale("") when the env vars are unset may get anything
reasonable.  I argue that C.UTF-8 is more reasonable than "C".

-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄



Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?

2019-02-07 Thread Adam Borowski
On Thu, Feb 07, 2019 at 02:40:06PM +, Simon McVittie wrote:
> On Thu, 07 Feb 2019 at 14:05:33 +0100, Adam Borowski wrote:
> > a locale for a silly country with weird customs
> 
> Please don't take this tone. Insulting people who disagree with you[1]
> is rarely an effective way to persuade them that you're right and
> they're wrong.

I don't quite see how speech peppered with words like "imperialism" could be
taken seriously as insults, aside from bad-old-days soviet propaganda.

If I still didn't mark the tone as in jest enough, then apologies.

> > • promoting C.UTF-8 in our user interfaces (allowing to select it in d-i,
> >   making dpkg-reconfigure locales DTRT, making it the d-i default)
> 
> I think this is exactly the "international/culture-neutral English"
> locale that you're looking for.

Yeah.

> (Well, the C/POSIX locale is the formally
> standardized form of that, but breaks text outside the ASCII range;
> C.UTF-8 is the C locale with Unicode support added.)

Not really -- behaviour of C/POSIX for characters above 126 is _undefined_.

That locale is defined in a weird convoluted way designed to allow both
ASCII and IBM's encryption standards (aka variants of EBCDIC).

The only way I found so far that our current C.UTF-8 fails POSIX's demands
for "C" is:

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html
7.3.1 LC_CTYPE
blank
# In the POSIX locale, only the  and  shall be included.


Another point is that setlocale(..., "") if the env vars are unset is
implementation-defined.  I'd change it to result not in "C" but in C.UTF-8.

> > • inventing a new locale "en" without a country bias
> >   -- good in the long term but problematic a month before freeze
> 
> I assume this would be a UTF-8 locale like en_US.utf8 and en_GB.utf8,
> so probably en.utf8, possibly with a simple "en" alias?

Yeah, with a non-US time and date format.  Possibly also collation where a
space is not ignored -- ie, dictionary order common to most of the world but
not the US -- "foo xxx" < "foobar".  C.* does this, en_US.* does not.  Even
worse, en_US ignores all (or most) non-letters, inconsistently with other
operating systems and libcs:

glibc:

0 9
0.9.0
0.9.0-a0-foo-bar
({---=[ 0.9.0-a11 ]=---})
0.9.0-a17-quux
(0.9.0-a2)
0.9.0+a99-1
0.9.0-rc1
0.9.1
0 9 9
({---=[ 0.9-a11 ]=---})
0.9 ab

Windows, musl, ...:

(0.9.0-a2)
({---=[ 0.9.0-a11 ]=---})
({---=[ 0.9-a11 ]=---})
0 9
0 9 9
0.9 ab
0.9.0
0.9.0+a99-1
0.9.0-a0-foo-bar
0.9.0-a17-quux
0.9.0-rc1
0.9.1


> As you say, I don't think a country-neutral specifically-English locale
> is going to happen before buster.

On the other hand, adding it but not using by default would probably be a
very good idea: in the future, it'd avoid situations where ssh-ing from one
machine to one running stable would have the default locale fail.

> How would this locale differ from C.UTF-8? Is the only difference
> that C.UTF-8 has strict lexicographical sorting, whereas "en" would have
> case-insensitive sorting like en_GB.utf8 does? (If that's the only
> difference, then perhaps something like "LANG=C.utf8 LC_COLLATE=en_US.utf8"
> is enough.)

I can't recall any other difference out of the top of my head, yeah.
LC_COLLATE=en_US.UTF-8 has that ignoring space nastiness, though.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄



Bug#874160: systemd _sometimes_ does this

2019-02-07 Thread Adam Borowski
On Thu, Feb 07, 2019 at 03:36:59PM +0100, Adam Borowski wrote:
> Turns out systemd independently does this, although not in every case.
> If you have unset locale, it changes it to C.UTF-8 for X (gdm3) but not
> for console logins.

Turns out that console logins are the only exception; ansgar found this:
https://github.com/systemd/systemd/issues/11668

> It'd be good to have this consistent both for X vs console, and systemd vs
> other inits/rc systems.

You said:
# Even is the change is small, that might still change the behavior of
# some programs, so I am not sure we want to diverge from upstream and
# other distributions here.

So with systemd forcing this, the result is us diverging from most other
distributions only when init/rc is not systemd.  Thus, could you please
apply this patch -- or, should I bother sysvinit folks (and perhaps
implement this in openrc) instead?


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄



C + POSIX locale question

2019-02-07 Thread Michael Stone
Per the standard, the C locale is supposed to be a synonym for the POSIX 
locale. Can someone give a quick explanation for why in debian the C 
locale definition is 162k and the POSIX locale is 8k? Shouldn't they be 
identical?




Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?

2019-02-07 Thread Michael Stone

On Thu, Feb 07, 2019 at 04:08:21PM +0100, Ansgar wrote:

On Thu, 2019-02-07 at 09:59 -0500, Michael Stone wrote:

On Thu, Feb 07, 2019 at 02:40:06PM +, Simon McVittie wrote:
> How would this locale differ from C.UTF-8? Is the only difference
> that C.UTF-8 has strict lexicographical sorting, whereas "en" would
> have
> case-insensitive sorting like en_GB.utf8 does? (If that's the only
> difference, then perhaps something like "LANG=C.utf8
> LC_COLLATE=en_US.utf8"
> is enough.)

POSIX specifies the output format for various utilities in the C locale,
which defeats my understanding of the purpose of this proposal. So, for
example, in ls -l:


I don't think the "C.UTF-8" locale covered by any promises POSIX might
make for "C".  (Nor is what happens when no LC_*, LANG vairables are
set at all.)


IMO, the principle of least surprise applies here: if C.UTF-8 is meant 
to be something other than the C locale with UTF-8 semantics added, it 
should be called something other than C, no?




Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?

2019-02-07 Thread Ansgar
On Thu, 2019-02-07 at 09:59 -0500, Michael Stone wrote:
> On Thu, Feb 07, 2019 at 02:40:06PM +, Simon McVittie wrote:
> > How would this locale differ from C.UTF-8? Is the only difference
> > that C.UTF-8 has strict lexicographical sorting, whereas "en" would
> > have
> > case-insensitive sorting like en_GB.utf8 does? (If that's the only
> > difference, then perhaps something like "LANG=C.utf8
> > LC_COLLATE=en_US.utf8"
> > is enough.)
> 
> POSIX specifies the output format for various utilities in the C locale, 
> which defeats my understanding of the purpose of this proposal. So, for 
> example, in ls -l:

I don't think the "C.UTF-8" locale covered by any promises POSIX might
make for "C".  (Nor is what happens when no LC_*, LANG vairables are
set at all.)

Ansgar



Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?

2019-02-07 Thread Michael Stone

On Thu, Feb 07, 2019 at 02:40:06PM +, Simon McVittie wrote:

How would this locale differ from C.UTF-8? Is the only difference
that C.UTF-8 has strict lexicographical sorting, whereas "en" would have
case-insensitive sorting like en_GB.utf8 does? (If that's the only
difference, then perhaps something like "LANG=C.utf8 LC_COLLATE=en_US.utf8"
is enough.)


POSIX specifies the output format for various utilities in the C locale, 
which defeats my understanding of the purpose of this proposal. So, for 
example, in ls -l:


(quoting http://pubs.opengroup.org/onlinepubs/009695399/utilities/ls.html)
The  field shall contain the appropriate date and 
timestamp of when the file was last modified. In the POSIX locale, the 
field shall be the equivalent of the output of the following date 
command:


date "+%b %e %H:%M"

if the file has been modified in the last six months, or:

date "+%b %e %Y"

(where two s are used between %e and %Y ) if the file has not 
been modified in the last six months or if the modification date is in 
the future, except that, in both cases, the final  produced by 
date shall not be included and the output shall be as if the date 
command were executed at the time of the last modification date of the 
file rather than the current time. When the LC_TIME locale category is 
not set to the POSIX locale, a different format and order of 
presentation of this field may be used.


Mike Stone



Bug#874160: systemd _sometimes_ does this

2019-02-07 Thread Adam Borowski
Turns out systemd independently does this, although not in every case.
If you have unset locale, it changes it to C.UTF-8 for X (gdm3) but not
for console logins.

It'd be good to have this consistent both for X vs console, and systemd vs
other inits/rc systems.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄



Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?

2019-02-07 Thread Simon McVittie
On Thu, 07 Feb 2019 at 14:05:33 +0100, Adam Borowski wrote:
> a locale for a silly country with weird customs

Please don't take this tone. Insulting people who disagree with you[1]
is rarely an effective way to persuade them that you're right and
they're wrong.

> • promoting C.UTF-8 in our user interfaces (allowing to select it in d-i,
>   making dpkg-reconfigure locales DTRT, making it the d-i default)

I think this is exactly the "international/culture-neutral English"
locale that you're looking for. (Well, the C/POSIX locale is the formally
standardized form of that, but breaks text outside the ASCII range;
C.UTF-8 is the C locale with Unicode support added.)

> • inventing a new locale "en" without a country bias
>   -- good in the long term but problematic a month before freeze

I assume this would be a UTF-8 locale like en_US.utf8 and en_GB.utf8,
so probably en.utf8, possibly with a simple "en" alias?

As you say, I don't think a country-neutral specifically-English locale
is going to happen before buster.

How would this locale differ from C.UTF-8? Is the only difference
that C.UTF-8 has strict lexicographical sorting, whereas "en" would have
case-insensitive sorting like en_GB.utf8 does? (If that's the only
difference, then perhaps something like "LANG=C.utf8 LC_COLLATE=en_US.utf8"
is enough.)

smcv

[1] As it happens, I do agree with you that AM/PM time and middle-endian
dates are not a good default; but I'm from a different English-speaking
country with its own weird customs.



Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?

2019-02-07 Thread Ian Jackson
Peter Silva writes ("Re: Bug#877900: How to get 24-hour time on en_US.UTF-8 
locale now?"):
> iso_en ?  That sounds smart...
> 
> English for most of the world that aren't necessarily native English speakers?
> https://en.wikipedia.org/wiki/International_English
> Use ISO dates and stuff, and pick a random spelling. As a Canadian, I'm pretty
> sure about colour, but unclear about whether we should standardize on disc.
> Dates should be iso, even better if it used UTC as the timezone.   This would
> be a default that would include US keyboard bindings (by default.)
> as the easiest thing to default to during installation, etc.. but perhaps I
> should be disqualified, being both a unix greybeard, and a recovering ntp
> admin.

I don't see that this exists as a locale already.  It is probably too
late for buster to introduce it.

Realistically our sensible choices for the default are
  C.UTF-8
  One of en_{AU,GB,NZ}.UTF-8

All of these would be better than en_US.UTF-8 for the reasons given
by Adam (although, Adam, really, could you try to be a little less
rude?).

The middle-endian dates and 12-hour clock are particularly poor
defaults.

Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.



Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?

2019-02-07 Thread Peter Silva
iso_en ?  That sounds smart...

English for most of the world that aren't necessarily native English
speakers? https://en.wikipedia.org/wiki/International_English
Use ISO dates and stuff, and pick a random spelling. As a Canadian, I'm
pretty sure about colour, but unclear about whether we should standardize
on disc. Dates should be iso, even better if it used UTC as the timezone.
 This would be a default that would include US keyboard bindings (by
default.)
as the easiest thing to default to during installation, etc.. but perhaps I
should be disqualified, being both a unix greybeard, and a recovering ntp
admin.


On Thu, Feb 7, 2019 at 8:06 AM Adam Borowski  wrote:

> On Thu, Feb 07, 2019 at 02:55:33PM +0500, Roman Mamedov wrote:
> > So for those of us (the entire world), who have been relying on this
> behavior:
> >
> > > * en_US (.UTF-8) is used as the default English locale for all places
> that
> > >   don't have a specific variant (and often even then).  Generally,
> technical
> > >   users use English as a system locale
> >
> > How do we roll-back what you have done here, and still get en_US.UTF-8
> while
> > retaining the proper 24-hour time?
>
> > dpkg-reconfigure locales does not list "C.UTF-8" in the main "locales to
> > generate" list, but does offer it on the next screen as "Default locale
> for the
> > system environment". After selecting it, we get:
> >
> > # locale
> > LANG=C.UTF-8
> > LANGUAGE=
> > LC_TIME="en_US.UTF-8"
> > LC_ALL=en_US.UTF-8
> >
> > But still:
> >
> > # date
> > Thu 07 Feb 2019 09:53:47 AM UTC
>
> The root of this issue is worth raising on debian-devel:
>
> The en_US.UTF-8 locale has two purposes:
> • a locale for a silly country with weird customs (such as time going in
>   four discontinuous segments during the day, writing date in a
>   middle-endian format, an unit being shorter on land than surveyed but
>   longer than that in the  air, or another unit changing when measuring wet
>   vs dry vs slightly moist things)
> • base locale for the most of the world save for a few places (UK, AU, ...)
>   that have their specific locale -- and often even they use en_US for
>   consistency reasons.
>
>
> So I wonder what would be the best solution?  I can think of:
> • promoting C.UTF-8 in our user interfaces (allowing to select it in d-i,
>   making dpkg-reconfigure locales DTRT, making it the d-i default)
>   -- nice for Unix greybeards, but some users might want case-insensitive
>  sort, etc
> • inventing a new locale "en" without a country bias
>   -- good in the long term but problematic a month before freeze
>   -- could be good to have it anyway but not use it until after buster
> • ask glibc maintainers to revert the cherry-pick in #877900 for buster,
>   then pick a long-term solution
>
>
> One particular regression caused by this change is sorting no longer
> working: "12:01am" "1:01am" "12:01pm" "1:01pm" will be ordered wrong.
>
> On one hand, leftpondians may be entitled to their own locale.  On the
> other, let's punish the bastards for imperialism and imposing their own
> settings on the rest of the world. :p
>
>
> Meow!
> --
> ⢀⣴⠾⠻⢶⣦⠀
> ⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
> ⢿⡄⠘⠷⠚⠋⠀ for Privacy.
> ⠈⠳⣄
>
>


Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?

2019-02-07 Thread Adam Borowski
On Thu, Feb 07, 2019 at 02:55:33PM +0500, Roman Mamedov wrote:
> So for those of us (the entire world), who have been relying on this behavior:
> 
> > * en_US (.UTF-8) is used as the default English locale for all places that
> >   don't have a specific variant (and often even then).  Generally, technical
> >   users use English as a system locale
> 
> How do we roll-back what you have done here, and still get en_US.UTF-8 while
> retaining the proper 24-hour time?

> dpkg-reconfigure locales does not list "C.UTF-8" in the main "locales to
> generate" list, but does offer it on the next screen as "Default locale for 
> the
> system environment". After selecting it, we get:
> 
> # locale
> LANG=C.UTF-8
> LANGUAGE=
> LC_TIME="en_US.UTF-8"
> LC_ALL=en_US.UTF-8
> 
> But still:
> 
> # date
> Thu 07 Feb 2019 09:53:47 AM UTC

The root of this issue is worth raising on debian-devel:

The en_US.UTF-8 locale has two purposes:
• a locale for a silly country with weird customs (such as time going in
  four discontinuous segments during the day, writing date in a
  middle-endian format, an unit being shorter on land than surveyed but
  longer than that in the  air, or another unit changing when measuring wet
  vs dry vs slightly moist things)
• base locale for the most of the world save for a few places (UK, AU, ...)
  that have their specific locale -- and often even they use en_US for
  consistency reasons.


So I wonder what would be the best solution?  I can think of:
• promoting C.UTF-8 in our user interfaces (allowing to select it in d-i,
  making dpkg-reconfigure locales DTRT, making it the d-i default)
  -- nice for Unix greybeards, but some users might want case-insensitive
 sort, etc
• inventing a new locale "en" without a country bias
  -- good in the long term but problematic a month before freeze
  -- could be good to have it anyway but not use it until after buster
• ask glibc maintainers to revert the cherry-pick in #877900 for buster,
  then pick a long-term solution


One particular regression caused by this change is sorting no longer
working: "12:01am" "1:01am" "12:01pm" "1:01pm" will be ordered wrong.

On one hand, leftpondians may be entitled to their own locale.  On the
other, let's punish the bastards for imperialism and imposing their own
settings on the rest of the world. :p


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Remember, the S in "IoT" stands for Security, while P stands
⢿⡄⠘⠷⠚⠋⠀ for Privacy.
⠈⠳⣄



Bug#738575: valentine's day

2019-02-07 Thread isaac jacobs
for all lovers of t-shirt ideas to offer or to wear on Valentine's Day

click on the amazon link below

https://amzn.to/2CdX642


Bug#877900: How to get 24-hour time on en_US.UTF-8 locale now?

2019-02-07 Thread Roman Mamedov
So for those of us (the entire world), who have been relying on this behavior:

> * en_US (.UTF-8) is used as the default English locale for all places that
>   don't have a specific variant (and often even then).  Generally, technical
>   users use English as a system locale

How do we roll-back what you have done here, and still get en_US.UTF-8 while
retaining the proper 24-hour time?

dpkg-reconfigure locales does not list "C.UTF-8" in the main "locales to
generate" list, but does offer it on the next screen as "Default locale for the
system environment". After selecting it, we get:

# locale
LANG=C.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

But still:

# date
Thu 07 Feb 2019 09:53:47 AM UTC

-- 
With respect,
Roman



Bug#738575: valentine's day tees

2019-02-07 Thread Khaled Hamadi
for all lovers of t-shirt ideas to offer or to wear on Valentine's Day

click on the amazon link below

https://amzn.to/2CdX642



best regards