Hi Stefan and Todd,
Stefan Sperling wrote on Fri, Apr 17, 2020 at 08:55:29AM +0200:
> On Thu, Apr 16, 2020 at 09:35:18PM +0200, Ingo Schwarze wrote:
>>$ locale -m
>> UTF-8
>>$ locale charmap
>> UTF-8
>>$ LC_ALL=C locale charmap
>> US-ASCII
>>$ LC_ALL=POSIX locale charmap
>> US-ASCII
> I am OK with your diff,
Thanks to both of you for checking, i have put it in.
> and noticed a separate issue with -m which
> is exposed by this change:
>
> If US-ASCII is an available charmap, shouldn't locale -m list "US-ASCII"
> in addition to "UTF-8"?
I'm not completely sure what "available charmaps" is supposed to mean
in the POSIX standard.
Testing on an old Debian system, is see this:
$ locale -m > charmaps.loc
$ wc -l charmaps.loc
235
$ ls /usr/share/i18n/charmaps | sed 's/.gz$//' | sort > charmaps.ls
$ diff -u charmaps.ls charmaps.loc | grep '^[+-][^+-]'
+MAC_CENTRALEUROPE
+NF_Z_62-010_(1973)
+WIN-SAMI-2
$ locale charmap
UTF-8
$ locale -m | grep UTF
UTF-8
$ LC_CTYPE=C locale charmap
ANSI_X3.4-1968
$ locale -m | grep 1968
ANSI_X3.4-1968
So "locale -m" gives almost a directory listing, but not quite;
it produces a few additional entries that aren't in the directory.
The return values from "locale charset" appear in "locale -m".
Then again, Linux is not a certified UNIX system. So let's try
with something certified:
> uname -a
SunOS unstable11s 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
> locale charmap
UTF-8
> LC_CTYPE=C locale charmap
646
> locale -m | wc
0 0 0
It's a bit difficult because Solaris 11 does not provide locate(1),
but i failed to find any charmap files there. Both UTF-8 and
US-ASCII work (i tested that by compiling and running mandoc)
but still "locale -m" returns nothing.
> uname -a
SunOS unstable10s 5.10 Generic_150400-17 sun4v sparc
SUNW,SPARC-Enterprise-T5220
> locale charmap
646
> LC_CTYPE=en_US.UTF-8 locale charmap
UTF-8
> locale -m
iso_8859_1/charmap.src
> ls -F /usr/lib/localedef/src/
charmaps/ en_US.UTF-8/ extensions/ iso_8859_1/ locales/
> ls -F /usr/lib/localedef/src/charmaps/
charmap.ANSI1251.bz2 charmap.ISO8859-9.bz2 charmap.iso-8859-5.bz2
charmap.ISO8859-1.bz2 charmap.KOI8-R.bz2charmap.iso-8859-6.bz2
charmap.ISO8859-13.bz2charmap.UTF-8.bz2 charmap.iso-8859-7.bz2
charmap.ISO8859-15.bz2charmap.ansi-1251.bz2 charmap.iso-8859-8.bz2
charmap.ISO8859-2.bz2 charmap.ar.bz2@ charmap.iso-8859-9.bz2
charmap.ISO8859-4.bz2 charmap.he.bz2@ charmap.koi8-r.bz2
charmap.ISO8859-5.bz2 charmap.iso-8859-1.bz2charmap.utf-8.bz2
charmap.ISO8859-6.bz2 charmap.iso-8859-13.bz2 charmap.utf8.bz2@
charmap.ISO8859-7.bz2 charmap.iso-8859-15.bz2
charmap.ISO8859-8.bz2 charmap.iso-8859-2.bz2
Same vendor, different version, different behaviour. Again, both UTF-8
and US-ASCII work, and there are several charmap files, but "locale -m"
returns something that is neither a charmap name nor a filename for any
of the locales, nor a list of anything.
Frankly, i doubt the usefulness of "locale -m" in general, and even more
so on OpenBSD: if i understand correctly, it is supposed to be used to
determine valid input for the -f option of the localedef(1) utility,
which we don't even have.
Naively, it does seem like it would make sense to have "locale -m"
print a list of possible output values of "locale chardef", so i'm
not opposed to adding "US-ASCII" to it. But that doesn't appear to
be how it works elsewhere, at least not everywhere. I found no
documentation stating clearly what it is supposed to do, POSIX feels
murky at best.
Also, look at this:
http://man.bsd.lv/FreeBSD-12.0/locale#BUGS
http://man.bsd.lv/NetBSD-8.1/locale#BUGS
"BUGS
Since FreeBSD does not support charmaps in their POSIX meaning,
locale emulates the -m option using the CODESETs listing of all
available locales."
That does look somehwat similar to what you are suggesting,
but *they* call it a bug!
Feel free to add "US-ASCII\n" if you like, it does feel as if it
might add some minor clarity, but i hardly expect any real practical
benefit.
Yours,
Ingo