Re: implement locale(1) charmap argument

Ingo Schwarze Fri, 17 Apr 2020 06:06:13 -0700

Hi Stefan and Todd,

Stefan Sperling wrote on Fri, Apr 17, 2020 at 08:55:29AM +0200:
> On Thu, Apr 16, 2020 at 09:35:18PM +0200, Ingo Schwarze wrote:


>>    $ locale -m
>>   UTF-8
>>    $ locale charmap
>>   UTF-8
>>    $ LC_ALL=C locale charmap
>>   US-ASCII
>>    $ LC_ALL=POSIX locale charmap
>>   US-ASCII

> I am OK with your diff,

Thanks to both of you for checking, i have put it in.

> and noticed a separate issue with -m which
> is exposed by this change:
> 
> If US-ASCII is an available charmap, shouldn't locale -m list "US-ASCII"
> in addition to "UTF-8"?

I'm not completely sure what "available charmaps" is supposed to mean
in the POSIX standard.

Testing on an old Debian system, is see this:

   $ locale -m > charmaps.loc
   $ wc -l charmaps.loc
  235
   $ ls /usr/share/i18n/charmaps | sed 's/.gz$//' | sort > charmaps.ls 
   $ diff -u charmaps.ls charmaps.loc | grep '^[+-][^+-]'
  +MAC_CENTRALEUROPE
  +NF_Z_62-010_(1973)
  +WIN-SAMI-2
   $ locale charmap
  UTF-8
   $ locale -m | grep UTF
  UTF-8
   $ LC_CTYPE=C locale charmap
  ANSI_X3.4-1968
   $ locale -m | grep 1968
  ANSI_X3.4-1968

So "locale -m" gives almost a directory listing, but not quite;
it produces a few additional entries that aren't in the directory.
The return values from "locale charset" appear in "locale -m".
Then again, Linux is not a certified UNIX system.  So let's try
with something certified:

   > uname -a
  SunOS unstable11s 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
   > locale charmap
  UTF-8
   > LC_CTYPE=C locale charmap
  646
   > locale -m | wc
       0       0       0

It's a bit difficult because Solaris 11 does not provide locate(1),
but i failed to find any charmap files there.  Both UTF-8 and
US-ASCII work (i tested that by compiling and running mandoc)
but still "locale -m" returns nothing.

   > uname -a
  SunOS unstable10s 5.10 Generic_150400-17 sun4v sparc 
SUNW,SPARC-Enterprise-T5220
   > locale charmap
  646
   > LC_CTYPE=en_US.UTF-8 locale charmap
  UTF-8
   > locale -m
  iso_8859_1/charmap.src
   > ls -F /usr/lib/localedef/src/
  charmaps/     en_US.UTF-8/  extensions/   iso_8859_1/   locales/
   > ls -F /usr/lib/localedef/src/charmaps/
  charmap.ANSI1251.bz2      charmap.ISO8859-9.bz2     charmap.iso-8859-5.bz2
  charmap.ISO8859-1.bz2     charmap.KOI8-R.bz2        charmap.iso-8859-6.bz2
  charmap.ISO8859-13.bz2    charmap.UTF-8.bz2         charmap.iso-8859-7.bz2
  charmap.ISO8859-15.bz2    charmap.ansi-1251.bz2     charmap.iso-8859-8.bz2
  charmap.ISO8859-2.bz2     charmap.ar.bz2@           charmap.iso-8859-9.bz2
  charmap.ISO8859-4.bz2     charmap.he.bz2@           charmap.koi8-r.bz2
  charmap.ISO8859-5.bz2     charmap.iso-8859-1.bz2    charmap.utf-8.bz2
  charmap.ISO8859-6.bz2     charmap.iso-8859-13.bz2   charmap.utf8.bz2@
  charmap.ISO8859-7.bz2     charmap.iso-8859-15.bz2
  charmap.ISO8859-8.bz2     charmap.iso-8859-2.bz2

Same vendor, different version, different behaviour.  Again, both UTF-8
and US-ASCII work, and there are several charmap files, but "locale -m"
returns something that is neither a charmap name nor a filename for any
of the locales, nor a list of anything.

Frankly, i doubt the usefulness of "locale -m" in general, and even more
so on OpenBSD: if i understand correctly, it is supposed to be used to
determine valid input for the -f option of the localedef(1) utility,
which we don't even have.

Naively, it does seem like it would make sense to have "locale -m"
print a list of possible output values of "locale chardef", so i'm
not opposed to adding "US-ASCII" to it.  But that doesn't appear to
be how it works elsewhere, at least not everywhere.  I found no
documentation stating clearly what it is supposed to do, POSIX feels
murky at best.

Also, look at this:

  http://man.bsd.lv/FreeBSD-12.0/locale#BUGS
  http://man.bsd.lv/NetBSD-8.1/locale#BUGS

  "BUGS
   Since FreeBSD does not support charmaps in their POSIX meaning,
   locale emulates the -m option using the CODESETs listing of all
   available locales."

That does look somehwat similar to what you are suggesting,
but *they* call it a bug!

Feel free to add "US-ASCII\n" if you like, it does feel as if it
might add some minor clarity, but i hardly expect any real practical
benefit.

Yours,
  Ingo

Re: implement locale(1) charmap argument

Reply via email to