Re: implement locale(1) charmap argument

2020-04-17 Thread Stefan Sperling
On Fri, Apr 17, 2020 at 03:05:06PM +0200, Ingo Schwarze wrote:
> Naively, it does seem like it would make sense to have "locale -m"
> print a list of possible output values of "locale chardef", so i'm
> not opposed to adding "US-ASCII" to it.  But that doesn't appear to
> be how it works elsewhere, at least not everywhere.  I found no
> documentation stating clearly what it is supposed to do, POSIX feels
> murky at best.

Good grief! Well, we can leave good enough alone then, I suppose :)

Thank you for doing such elaborate research.



Re: implement locale(1) charmap argument

2020-04-17 Thread Ingo Schwarze
Hi Stefan and Todd,

Stefan Sperling wrote on Fri, Apr 17, 2020 at 08:55:29AM +0200:
> On Thu, Apr 16, 2020 at 09:35:18PM +0200, Ingo Schwarze wrote:

>>$ locale -m
>>   UTF-8
>>$ locale charmap
>>   UTF-8
>>$ LC_ALL=C locale charmap
>>   US-ASCII
>>$ LC_ALL=POSIX locale charmap
>>   US-ASCII

> I am OK with your diff,

Thanks to both of you for checking, i have put it in.

> and noticed a separate issue with -m which
> is exposed by this change:
> 
> If US-ASCII is an available charmap, shouldn't locale -m list "US-ASCII"
> in addition to "UTF-8"?

I'm not completely sure what "available charmaps" is supposed to mean
in the POSIX standard.

Testing on an old Debian system, is see this:

   $ locale -m > charmaps.loc
   $ wc -l charmaps.loc
  235
   $ ls /usr/share/i18n/charmaps | sed 's/.gz$//' | sort > charmaps.ls 
   $ diff -u charmaps.ls charmaps.loc | grep '^[+-][^+-]'
  +MAC_CENTRALEUROPE
  +NF_Z_62-010_(1973)
  +WIN-SAMI-2
   $ locale charmap
  UTF-8
   $ locale -m | grep UTF
  UTF-8
   $ LC_CTYPE=C locale charmap
  ANSI_X3.4-1968
   $ locale -m | grep 1968
  ANSI_X3.4-1968

So "locale -m" gives almost a directory listing, but not quite;
it produces a few additional entries that aren't in the directory.
The return values from "locale charset" appear in "locale -m".
Then again, Linux is not a certified UNIX system.  So let's try
with something certified:

   > uname -a
  SunOS unstable11s 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise
   > locale charmap
  UTF-8
   > LC_CTYPE=C locale charmap
  646
   > locale -m | wc
   0   0   0

It's a bit difficult because Solaris 11 does not provide locate(1),
but i failed to find any charmap files there.  Both UTF-8 and
US-ASCII work (i tested that by compiling and running mandoc)
but still "locale -m" returns nothing.

   > uname -a
  SunOS unstable10s 5.10 Generic_150400-17 sun4v sparc 
SUNW,SPARC-Enterprise-T5220
   > locale charmap
  646
   > LC_CTYPE=en_US.UTF-8 locale charmap
  UTF-8
   > locale -m
  iso_8859_1/charmap.src
   > ls -F /usr/lib/localedef/src/
  charmaps/ en_US.UTF-8/  extensions/   iso_8859_1/   locales/
   > ls -F /usr/lib/localedef/src/charmaps/
  charmap.ANSI1251.bz2  charmap.ISO8859-9.bz2 charmap.iso-8859-5.bz2
  charmap.ISO8859-1.bz2 charmap.KOI8-R.bz2charmap.iso-8859-6.bz2
  charmap.ISO8859-13.bz2charmap.UTF-8.bz2 charmap.iso-8859-7.bz2
  charmap.ISO8859-15.bz2charmap.ansi-1251.bz2 charmap.iso-8859-8.bz2
  charmap.ISO8859-2.bz2 charmap.ar.bz2@   charmap.iso-8859-9.bz2
  charmap.ISO8859-4.bz2 charmap.he.bz2@   charmap.koi8-r.bz2
  charmap.ISO8859-5.bz2 charmap.iso-8859-1.bz2charmap.utf-8.bz2
  charmap.ISO8859-6.bz2 charmap.iso-8859-13.bz2   charmap.utf8.bz2@
  charmap.ISO8859-7.bz2 charmap.iso-8859-15.bz2
  charmap.ISO8859-8.bz2 charmap.iso-8859-2.bz2

Same vendor, different version, different behaviour.  Again, both UTF-8
and US-ASCII work, and there are several charmap files, but "locale -m"
returns something that is neither a charmap name nor a filename for any
of the locales, nor a list of anything.

Frankly, i doubt the usefulness of "locale -m" in general, and even more
so on OpenBSD: if i understand correctly, it is supposed to be used to
determine valid input for the -f option of the localedef(1) utility,
which we don't even have.

Naively, it does seem like it would make sense to have "locale -m"
print a list of possible output values of "locale chardef", so i'm
not opposed to adding "US-ASCII" to it.  But that doesn't appear to
be how it works elsewhere, at least not everywhere.  I found no
documentation stating clearly what it is supposed to do, POSIX feels
murky at best.

Also, look at this:

  http://man.bsd.lv/FreeBSD-12.0/locale#BUGS
  http://man.bsd.lv/NetBSD-8.1/locale#BUGS

  "BUGS
   Since FreeBSD does not support charmaps in their POSIX meaning,
   locale emulates the -m option using the CODESETs listing of all
   available locales."

That does look somehwat similar to what you are suggesting,
but *they* call it a bug!

Feel free to add "US-ASCII\n" if you like, it does feel as if it
might add some minor clarity, but i hardly expect any real practical
benefit.

Yours,
  Ingo



Re: implement locale(1) charmap argument

2020-04-17 Thread Stefan Sperling
On Thu, Apr 16, 2020 at 09:35:18PM +0200, Ingo Schwarze wrote:
>$ locale -m
>   UTF-8
>$ locale charmap
>   UTF-8
>$ LC_ALL=C locale charmap
>   US-ASCII
>$ LC_ALL=POSIX locale charmap
>   US-ASCII

I am OK with your diff, and noticed a separate issue with -m which
is exposed by this change:

If US-ASCII is an available charmap, shouldn't locale -m list "US-ASCII"
in addition to "UTF-8"?



Re: implement locale(1) charmap argument

2020-04-16 Thread Todd C . Miller
Makes sense to me.  OK millert@

 - todd