Re: Rethinking configuration tuples

2023-09-19 Thread Po Lu
John Ericson  writes:

> On 9/19/23 21:07, Po Lu wrote:
>
>  Why not?
>
> I have on my hand several programs which use -winnt*, such as many old
> releases of Emacs.  And users should be capable of replacing
> config.sub and config.guess with newer versions without ill effect.
>
> At no point has anyone proposed removing *-winnt-*. And while I think
> a deprecation message is a good idea, no one has submitted yet a patch
> for that either.
>
> With Dmitry's plan, you can still upgrade config.sub in those old
> versions of Emacs if you like without any issue.

Why should config.sub print anything at all?  An extraneous message
constitutes ill effect.



Re: Rethinking configuration tuples

2023-09-19 Thread John Ericson

On 9/19/23 21:07, Po Lu wrote:

Why not?

I have on my hand several programs which use -winnt*, such as many old
releases of Emacs.  And users should be capable of replacing config.sub
and config.guess with newer versions without ill effect.


At no point has anyone proposed removing *-winnt-*. And while I think a 
deprecation message is a good idea, no one has submitted yet a patch for 
that either.


With Dmitry's plan, you can still upgrade config.sub in those old 
versions of Emacs if you like without any issue.


John


Re: Rethinking configuration tuples

2023-09-19 Thread Po Lu
"Dmitry V. Levin"  writes:

> I'm inclined to remove windows-gnu from config.sub instead of renaming or
> canonicalizing it because, firstly, there is no GNU libc on windows, and,
> secondly, windows-gnu as used by LLVM means MinGW, but for that we already
> have mingw*, and we should avoid adding new canonical names for the same
> thing.  We could add canonicalization of windows-mingw* into mingw*, but
> if nobody uses the former, why bother?
>
> At the same time, I'm inclined to leave windows-msvc as is because,
> unlike windows-gnu, it does exist, and the only one who objected against
> windows-msvc and suggested to canonicalize windows-msvc into winnt was
> Po Lu, but the arguments provided against windows-msvc were not convincing.

Why not?

I have on my hand several programs which use -winnt*, such as many old
releases of Emacs.  And users should be capable of replacing config.sub
and config.guess with newer versions without ill effect.



Re: Rethinking configuration tuples

2023-09-19 Thread John Ericson
Thanks Dmitry. This is an acceptable outcome to me. It is a nice middle 
ground between Po Lu's and my first choice options.


John

On 9/19/23 19:58, Dmitry V. Levin wrote:

On Thu, Sep 14, 2023 at 12:55:06AM -0400, John Ericson wrote:

OK here we go:

  1. https://github.com/ericson2314/gnu-config/commit/windows-mingnu.patch
  2. https://github.com/ericson2314/gnu-config/commit/windows-mingw.patch
  3. https://github.com/ericson2314/gnu-config/commit/no-windows-gnu.patch

I tried to honestly argue for each of them the best I could in the
commit message. I know I prefer (1); I am guessing Jacob prefers (2),
and Po Lu prefers (3).

Have fun, Dmitry :).

I'm inclined to remove windows-gnu from config.sub instead of renaming or
canonicalizing it because, firstly, there is no GNU libc on windows, and,
secondly, windows-gnu as used by LLVM means MinGW, but for that we already
have mingw*, and we should avoid adding new canonical names for the same
thing.  We could add canonicalization of windows-mingw* into mingw*, but
if nobody uses the former, why bother?

At the same time, I'm inclined to leave windows-msvc as is because,
unlike windows-gnu, it does exist, and the only one who objected against
windows-msvc and suggested to canonicalize windows-msvc into winnt was
Po Lu, but the arguments provided against windows-msvc were not convincing.






Re: Rethinking configuration tuples

2023-09-19 Thread Dmitry V. Levin
On Thu, Sep 14, 2023 at 12:55:06AM -0400, John Ericson wrote:
> OK here we go:
> 
>  1. https://github.com/ericson2314/gnu-config/commit/windows-mingnu.patch
>  2. https://github.com/ericson2314/gnu-config/commit/windows-mingw.patch
>  3. https://github.com/ericson2314/gnu-config/commit/no-windows-gnu.patch
> 
> I tried to honestly argue for each of them the best I could in the 
> commit message. I know I prefer (1); I am guessing Jacob prefers (2), 
> and Po Lu prefers (3).
> 
> Have fun, Dmitry :).

I'm inclined to remove windows-gnu from config.sub instead of renaming or
canonicalizing it because, firstly, there is no GNU libc on windows, and,
secondly, windows-gnu as used by LLVM means MinGW, but for that we already
have mingw*, and we should avoid adding new canonical names for the same
thing.  We could add canonicalization of windows-mingw* into mingw*, but
if nobody uses the former, why bother?

At the same time, I'm inclined to leave windows-msvc as is because,
unlike windows-gnu, it does exist, and the only one who objected against
windows-msvc and suggested to canonicalize windows-msvc into winnt was
Po Lu, but the arguments provided against windows-msvc were not convincing.


-- 
ldv



Re: Rethinking configuration tuples

2023-09-13 Thread Po Lu
John Ericson  writes:

> I used to do that, but see commit
> f0f728324021f38b0d31de399b9974535300167c : Dmitry opted to switch to
> just using Git's commit messages as the source of truth, and providing
> a make rule to generate the ChangeLog.
>
> The document you linked endorses such a choice, saying
>
>> Projects that maintain such VCS repositories can decide not to
>> maintain separate change log files, and instead rely on the VCS to
>> keep the change log.
>> If you decide not to maintain separate change log files, you should
>> still consider providing them in the release tarballs [...].
>
> I think doing this is a fine decision.

The text you quoted means that you are meant to record the individual
ChangeLog entries within each VCS log message, rather than updating a
separate ChangeLog file with each check-in.  These are subsequently
reproduced in the generated ChangeLog file.

It does not excuse you from writing log entries!

Refer to Emacs commit messages:

  https://git.savannah.gnu.org/cgit/emacs.git/log?h=master

for canonical examples of such log messages.



Re: Rethinking configuration tuples

2023-09-13 Thread John Ericson
I used to do that, but see commit 
f0f728324021f38b0d31de399b9974535300167c : Dmitry opted to switch to 
just using Git's commit messages as the source of truth, and providing a 
make rule to generate the ChangeLog.


The document you linked endorses such a choice, saying

Projects that maintain such VCS repositories can decide not to 
maintain separate change log files, and instead rely on the VCS to 
keep the change log.
If you decide not to maintain separate change log files, you should 
still consider providing them in the release tarballs [...].


I think doing this is a fine decision.

John

On 9/14/23 01:37, Po Lu wrote:

John Ericson  writes:


I had meant to just deal with windows-gnu in those 3 options,
otherwise we have a combinatorial explosion of patches (and commit
messages) for me to write :). Once we deal with that one we can deal
with the others, right?

Incidentally, if you want to make it easier for others to interpret your
patches, please provide ChangeLog entries along with them.  Refer to
`(standards)Change Logs':

   https://www.gnu.org/prep/standards/standards.html#Change-Logs





Re: Rethinking configuration tuples

2023-09-13 Thread Po Lu
John Ericson  writes:

> I had meant to just deal with windows-gnu in those 3 options,
> otherwise we have a combinatorial explosion of patches (and commit
> messages) for me to write :). Once we deal with that one we can deal
> with the others, right?

Incidentally, if you want to make it easier for others to interpret your
patches, please provide ChangeLog entries along with them.  Refer to
`(standards)Change Logs':

  https://www.gnu.org/prep/standards/standards.html#Change-Logs



Re: Rethinking configuration tuples

2023-09-13 Thread John Ericson
I had meant to just deal with windows-gnu in those 3 options, otherwise 
we have a combinatorial explosion of patches (and commit messages) for 
me to write :). Once we deal with that one we can deal with the others, 
right?


John

On 9/14/23 01:00, Po Lu wrote:

John Ericson  writes:


OK here we go:

1 https://github.com/ericson2314/gnu-config/commit/windows-mingnu.patch
2 https://github.com/ericson2314/gnu-config/commit/windows-mingw.patch
3 https://github.com/ericson2314/gnu-config/commit/no-windows-gnu.patch

I tried to honestly argue for each of them the best I could in the commit 
message. I know I prefer (1); I am guessing Jacob prefers (2),
and Po Lu prefers (3).

I prefer eliminating windows-msvc too.  It's also a misnomer, and we
already have *-winnt*, which represents MSVC.




Re: Rethinking configuration tuples

2023-09-13 Thread Po Lu
John Ericson  writes:

> OK here we go:
>
> 1 https://github.com/ericson2314/gnu-config/commit/windows-mingnu.patch
> 2 https://github.com/ericson2314/gnu-config/commit/windows-mingw.patch
> 3 https://github.com/ericson2314/gnu-config/commit/no-windows-gnu.patch
>
> I tried to honestly argue for each of them the best I could in the commit 
> message. I know I prefer (1); I am guessing Jacob prefers (2),
> and Po Lu prefers (3).

I prefer eliminating windows-msvc too.  It's also a misnomer, and we
already have *-winnt*, which represents MSVC.



Re: Rethinking configuration tuples

2023-09-13 Thread John Ericson

OK here we go:

1. https://github.com/ericson2314/gnu-config/commit/windows-mingnu.patch
2. https://github.com/ericson2314/gnu-config/commit/windows-mingw.patch
3. https://github.com/ericson2314/gnu-config/commit/no-windows-gnu.patch

I tried to honestly argue for each of them the best I could in the 
commit message. I know I prefer (1); I am guessing Jacob prefers (2), 
and Po Lu prefers (3).


Have fun, Dmitry :).

I suppose rather than just idly speculating on how nice it would be to 
standardize with LLVM, this might be a good time to actually post to 
their Discourse instance and solicit feedback. If anyone else agrees I 
will happily do so.


Cheers,

John

On 9/11/23 17:55, John Ericson wrote:
I can submit two patches (effectively amending my prior, landed patch) 
with options that I think people would prefer. Will do that shortly.


On 9/11/23 17:53, Dmitry V. Levin wrote:

Hi,

On Mon, Sep 11, 2023 at 10:11:39AM +0800, Po Lu wrote:

Where are the config maintainers?  Karl Barry and company?
(I don't remember his e-mail nor do I have it at hand.)

I would expect them to be actively reading this list, but instead my
original request has been left twisting in the wind.

I'm the maintainer and I'm actively reading this list now,
a bit surprised to find so many messages at this time of year. :)

Apparently, you don't quite like commit
91f6a7f616b161c25ba2001861a40e662e18c4ad that added
$cpu-$vendor-windows-{gnu,msvc} support to config.sub, but I'm not sure
I understood what exactly do you suggest to change in this case.



Re: Rethinking configuration tuples

2023-09-13 Thread John Ericson
Oops I had this email as draft and didn't hit send. The conversation has 
moved on since a bit, but I'll send it anyways.


John



On 9/6/23 19:46, Jacob Bachmeyer wrote:

The problem is that system(3) probably /does/ exist in that 
configuration, but such a system is only usable as a cross-compilation 
target, since building the GNU tools requires a shell.


Agree! This is exactly what I mean of "ops time" vs build time question: 
if we do have system(3) we don't know what shell it will be hooked up to 
in general (cross compilation is the general case).


I would say even if we are native compiling, and we find that the shell 
via system(3) supports x y z, we shouldn't bake the results of a 
configure-time check for that in at build time --- the installed binary 
might be copied to another system with the same libc but a different 
shell, and then system(3) would do something else.


I think configs are mainly useful for the host and target systems, and 
thus should focus on information needed for them. Sounds like we agree 
that "do we have a shell [that can do x y z]" is question that is fine 
to ask of the build platform, but not really appropriate to ask of the 
host or target platform. Thus, things like "presence of shell" are not 
really good to include in configs.


This is all to say that telling apart OSes (as opposed to merely libcs) 
seems too fraught to configs for me.


Maybe this could go in the config per the "arbitrary many components, 
finer distinctions to the right" "converging sequence" approach, but 
then I would want this further to the right, e.g. 
aarch64-unknown-musl-noshell not aarch64-unknown-noshell-musl.


The idea of lacking a shell was intended as an example of something 
that would make a Free system /very/ different from the GNU system, 
enough that it cannot be considered a GNU variant.  In other words, a 
Linux-based system that is clearly /not/ GNU/Linux.
Agreed it is not GNU/Linux. But I think anything that uses glibc should 
use "gnu" in the config because configs have evolved to be about libc 
more than OS.
The choice of system service management is orthogonal to this, 
since it has minimal impact on user programs.  (Unless systemd 
gets even more outrageously invasive...)

Agreed, just wanted to double check.
Of course, if systemd *does* get sufficiently outrageously invasive, 
we might need a *-*-linux-systemd-glibc tuple... (Since systemd 
gleefully makes extensive use of Linux-kernel-specific features, it 
cannot possibly be a standard on the GNU system, which supports 
multiple Free kernels.)


Yes I agree systemd probably can't be "bonafide GNU OS", but I take 
the opposite conclusion that this is evidence for the "gnu" for glibc 
is more important than the "gnu" for "true GNU OS".


In this hypothetical (that needs to *stay* hypothetical) example, 
"systemd" has somehow become an "OS" distinct from the GNU system.

Yes.


Except configure usually does not need a "fully disambiguated" 
form---the canonical form produced by config.sub is fine, since 
configure is usually matching against the full tuple using shell 
case patterns.  The flat list with a defined order is optimal for 
this strategy, since it allows to easily check for the presence of 
any tag or combination of tags.
Shell case patterns can be a bit of a footgun. For example, a 
common mistake is doing * instead of *-*.


If the allowed pattern elements are sufficiently unambiguous, there 
is no mistake, since `*' matches text including `-'.  In fact, when 
testing n "is tag FOO present?" predicate `*-foo-* | *-foo' would be 
correct.  (I assume that a CPU type will remain required and will 
remain first in the list.)


Sorry I meant as part of a larger pattern. With things like *-stuff-* 
vs *-*-stuff-*-*, the extra dashes are needed to make sure "stuff" 
matches the right component, and even then it only works if one knows 
the exact number of components (which can be accomplished by *-*... 
and the ordering of patterns). It is quite subtle!


For the "converging sequence" model, omitting the extra dashes is 
important, since the number of tags prior to a "floating tag" can 
vary.  (I would actually suggest making "gnu" such a floating tag in 
this model, with an exact definition to be obtained from a later 
discussion that would need to include RMS.)
Yeah I just mean the more components we have, the more a "sparse" 
representation is desirable, vs `something--another_thing` 
explicitly skipping things, because the latter is so annoying (needing 
to count). But that also creates ambiguities.


Allow the hypothetical --parse option to accept a PREFIX argument 
and you are pretty much there:


$ ./config.sub --parse=host x86_64-linux-gnu
host=x86_64-pc-linux-gnu
host_cpu=x86_64
host_vendor=pc
host_kernel=linux
host_os=gnu
$

That form should be both easily parsed by other tools and suitable 
for `eval` in shell scripts.

Yup! We're in agreement.


I agree testing is more robust, 

Re: Rethinking configuration tuples

2023-09-12 Thread Adam Joseph
Quoting Po Lu (2023-08-24 21:18:13)
> People, the nature and widespread use of config.* precludes any efforts
> aimed at ``rethinking'' the tuples they accept and generate.

+1



Re: Rethinking configuration tuples

2023-09-11 Thread Po Lu
"Dmitry V. Levin"  writes:

> Hi,
>
> On Mon, Sep 11, 2023 at 10:11:39AM +0800, Po Lu wrote:
>> Where are the config maintainers?  Karl Barry and company?
>> (I don't remember his e-mail nor do I have it at hand.)
>> 
>> I would expect them to be actively reading this list, but instead my
>> original request has been left twisting in the wind.
>
> I'm the maintainer and I'm actively reading this list now,
> a bit surprised to find so many messages at this time of year. :)
>
> Apparently, you don't quite like commit
> 91f6a7f616b161c25ba2001861a40e662e18c4ad that added
> $cpu-$vendor-windows-{gnu,msvc} support to config.sub, but I'm not sure
> I understood what exactly do you suggest to change in this case.

To either revert the change, or to canonicalize them to
CPU-VENDOR-mingw* and CPU-VENDOR-winnt* respectively.  Neither `gnu' nor
`msvc' are appropriate for the operating system field.




Re: Rethinking configuration tuples

2023-09-11 Thread Po Lu
"Zack Weinberg"  writes:

> If you could provide me a reference to your original request (e.g. URL
> in the mailing list archive) I will undertake to get it done. If I try
> to find it myself I'm afraid I will pick the wrong thing.
>
> If there is a specific git commit or commits you want reverted, the full
> hashes of those commits would also be very helpful.
>
> zw

This commit:

  
https://git.savannah.gnu.org/cgit/config.git/commit/?id=91f6a7f616b161c25ba2001861a40e662e18c4ad

should be reverted or modified to canonicalize such invalid tuples into
either *-windows-mingw* or *-windows-winnt*, which they are intended to
represent.

Thanks.



Re: Rethinking configuration tuples

2023-09-11 Thread John Ericson
I can submit two patches (effectively amending my prior, landed patch) 
with options that I think people would prefer. Will do that shortly.


On 9/11/23 17:53, Dmitry V. Levin wrote:

Hi,

On Mon, Sep 11, 2023 at 10:11:39AM +0800, Po Lu wrote:

Where are the config maintainers?  Karl Barry and company?
(I don't remember his e-mail nor do I have it at hand.)

I would expect them to be actively reading this list, but instead my
original request has been left twisting in the wind.

I'm the maintainer and I'm actively reading this list now,
a bit surprised to find so many messages at this time of year. :)

Apparently, you don't quite like commit
91f6a7f616b161c25ba2001861a40e662e18c4ad that added
$cpu-$vendor-windows-{gnu,msvc} support to config.sub, but I'm not sure
I understood what exactly do you suggest to change in this case.






Re: Rethinking configuration tuples

2023-09-11 Thread Dmitry V. Levin
Hi,

On Mon, Sep 11, 2023 at 10:11:39AM +0800, Po Lu wrote:
> Where are the config maintainers?  Karl Barry and company?
> (I don't remember his e-mail nor do I have it at hand.)
> 
> I would expect them to be actively reading this list, but instead my
> original request has been left twisting in the wind.

I'm the maintainer and I'm actively reading this list now,
a bit surprised to find so many messages at this time of year. :)

Apparently, you don't quite like commit
91f6a7f616b161c25ba2001861a40e662e18c4ad that added
$cpu-$vendor-windows-{gnu,msvc} support to config.sub, but I'm not sure
I understood what exactly do you suggest to change in this case.


-- 
ldv



Re: Rethinking configuration tuples

2023-09-11 Thread Zack Weinberg
On Sun, Sep 10, 2023, at 10:11 PM, Po Lu wrote:
> Where are the config maintainers?  Karl Barry and company? (I don't
> remember his e-mail nor do I have it at hand.)

Karl Berry is the Automake maintainer. I'm not sure if there *is* an
official config.* maintainer. The person most appropriately described as
the de facto maintainer is probably Dmitry V. Levin.

> I would expect them to be actively reading this list, but instead my
> original request has been left twisting in the wind.

If you could provide me a reference to your original request (e.g. URL
in the mailing list archive) I will undertake to get it done. If I try
to find it myself I'm afraid I will pick the wrong thing.

If there is a specific git commit or commits you want reverted, the full
hashes of those commits would also be very helpful.

zw



Re: Rethinking configuration tuples

2023-09-10 Thread Po Lu
Where are the config maintainers?  Karl Barry and company?
(I don't remember his e-mail nor do I have it at hand.)

I would expect them to be actively reading this list, but instead my
original request has been left twisting in the wind.



Re: Rethinking configuration tuples

2023-09-10 Thread Jacob Bachmeyer

Zack Weinberg wrote:

I haven't been following this long discussion very closely but I do have some opinions 
(with my "de facto autoconf maintainer" hat on):

1. As a general rule, it is not safe to change the canonicalization (i.e. the config.sub 
output) of an existing system name, *at all*; in many cases, not even if it is wrong. I 
find that people working on GNU tools often don't realize just how broadly used these 
names are. Changing the canonicalization of "CPU-VENDOR-mingw32", for example, 
is very likely to break things like Ansible playbooks and Travis-style CI build matrices 
-- one-off files that exist by the tens of thousands and there's no practical way to 
*enumerate* them all, let alone get them all changed to satisfy a GNU-internal desire for 
a more consistent naming convention.
  


Perhaps I have been misunderstood; I have been suggesting to change our 
interpretation but to keep all existing tuples as they are.  I am very 
much aware of this issue.



*Very recently introduced* names can be adjusted to correct technical errors.  For 
example, "CPU-VENDOR-windows-gnu" is a misnomer IMHO as there is no GNU libc 
port to Windows (see below); config.guess should not produce it and config.sub should not 
convert anything into it.  But if the patch that had introduced this mistake were more 
than a few months old, we would be stuck with it, permanently.
  


Fortunately, this particular error was caught relatively quickly.


2. We should avoid adding any more information to canonical system names.  Things like 
the availability of Bourne shell, which of the several available implementations of 
"init" (Unix PID 1) is in use, etc. should be handled with Autoconf-style 
feature probes.  Yes, it's difficult to run ./configure if you don't have a Bourne shell, 
but I suspect most of the environments where that's an issue are used primarily as 
cross-compilation targets rather than native-build hosts.
  


A platform without a Bourne shell is (as far as the GNU build system is 
concerned) only usable as a cross-compilation target.  Issues like shell 
availability or choice of init(8) are a reasonable use for the "OS" 
field, where an operating system tag is essentially a gestalt summary of 
the target environment.  The combinatorial explosion that would cause in 
modern use is a different issue.



My suggested place to draw the line is, if you reasonably need a cross-compiler 
targeting A to be different from a cross-compiler targeting B, then the 
distinction between A and B can go in the canonical system name; if you don't, 
then it shouldn't.  This should be pretty close to existing practice (because 
that's exactly how GCC uses CSNs, via ./configure --target) and should give us 
concrete reasons to make a decision in each case.
  


Agreed that calling the third field "operating system" is a holdover 
from a past where that actually mattered and operating systems were 
proprietary monoliths.  This also provides a good first guess at a limit 
for what environment details should be in an CSN and what should not:  
if the same cross-compiler targets both environments, they should have 
the same CSN.  However, a system with both GNU libc and Musl libc could 
possibly use GCC's multilib facility instead of separate instances of 
the compiler, so multilib targets probably need some form of disambiguation.



[...]

3. I like the idea of a "--parseable" option to config.sub/guess that make them 
spit out something easier to parse.  My preferred syntax would be a newline- or 
semicolon-separated sequence of Bourne shell assignment statements, because, if there was 
also a way to ask config.sub/guess to add a prefix to every variable name, that would let 
Autoconf scripts process the output with `eval` rather than the nasty bit of parser goo 
we have now (_AC_CANONICAL_SPLIT, 
https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/general.m4#n1987).  It 
would need to be something like

$ ./config.guess
aarch64-unknown-linux-gnu
$ ./config.guess --prefix=host --parseable
host_cpu=aarch64
host_vendor=unknown
host_os=linux-gnu

It would be OK to introduce additional key=value pairs at that point (kernel, 
abi, etc), but the existing three (cpu, vendor, os) need to keep emitting 
exactly what they do now.
  


I was proposing adding a --parse option only to config.sub to avoid code 
duplication.  I also do not think of this as a "parseable" form but as a 
pre-parsed form.  I disagree with using --prefix here when --parse could 
easily accept that same prefix as its optional argument, especially 
since config.{sub,guess} are in such close proximity to configure, which 
uses --prefix for a very different purpose.



4. We should deemphasize and possibly explicitly deprecate the vendor component 
of a CSN.  Nowadays, in my experience, it just confuses people.
The problem is that VENDOR was actually important in the dim past and 
could still be useful in some contexts today (I expect it to be 

Re: Rethinking configuration tuples

2023-09-10 Thread connor horman
I'd note that I don't see "rethinking target tuples" as changing how any
given name is assigned, but rather changing how they are defined and how
they are thought about.

We wouldn't break anything by changing the fourth field to mean
"Environment" rather than "Operating System", to be more well-defined -
every existing tuple would still be the same, and even some existing
erroneous ones would be validated rather than existing in a state of being
incorrect, but impossible to change. Any tuple with `elf` as the final
component, for example, would be correct as an Environment, not as an
Operating System, and now those existing tuples would be sound, and not
just "hanging on because things break if they cease to exist".

On Sun, 10 Sept 2023 at 20:56, Po Lu  wrote:

> "Zack Weinberg"  writes:
>
> > I haven't been following this long discussion very closely but I do
> > have some opinions (with my "de facto autoconf maintainer" hat on):
> >
> > 1. As a general rule, it is not safe to change the canonicalization
> > (i.e. the config.sub output) of an existing system name, *at all*; in
> > many cases, not even if it is wrong. I find that people working on GNU
> > tools often don't realize just how broadly used these names
> > are. Changing the canonicalization of "CPU-VENDOR-mingw32", for
> > example, is very likely to break things like Ansible playbooks and
> > Travis-style CI build matrices -- one-off files that exist by the tens
> > of thousands and there's no practical way to *enumerate* them all, let
> > alone get them all changed to satisfy a GNU-internal desire for a more
> > consistent naming convention.
> >
> > *Very recently introduced* names can be adjusted to correct technical
> > errors.  For example, "CPU-VENDOR-windows-gnu" is a misnomer IMHO as
> > there is no GNU libc port to Windows (see below); config.guess should
> > not produce it and config.sub should not convert anything into it.
> > But if the patch that had introduced this mistake were more than a few
> > months old, we would be stuck with it, permanently.
>
> This mistake is only two months old, thankfully.  I believe it can be
> corrected without consequence.
>


Re: Rethinking configuration tuples

2023-09-10 Thread Po Lu
"Zack Weinberg"  writes:

> I haven't been following this long discussion very closely but I do
> have some opinions (with my "de facto autoconf maintainer" hat on):
>
> 1. As a general rule, it is not safe to change the canonicalization
> (i.e. the config.sub output) of an existing system name, *at all*; in
> many cases, not even if it is wrong. I find that people working on GNU
> tools often don't realize just how broadly used these names
> are. Changing the canonicalization of "CPU-VENDOR-mingw32", for
> example, is very likely to break things like Ansible playbooks and
> Travis-style CI build matrices -- one-off files that exist by the tens
> of thousands and there's no practical way to *enumerate* them all, let
> alone get them all changed to satisfy a GNU-internal desire for a more
> consistent naming convention.
>
> *Very recently introduced* names can be adjusted to correct technical
> errors.  For example, "CPU-VENDOR-windows-gnu" is a misnomer IMHO as
> there is no GNU libc port to Windows (see below); config.guess should
> not produce it and config.sub should not convert anything into it.
> But if the patch that had introduced this mistake were more than a few
> months old, we would be stuck with it, permanently.

This mistake is only two months old, thankfully.  I believe it can be
corrected without consequence.



Re: Rethinking configuration tuples

2023-09-10 Thread Zack Weinberg
I haven't been following this long discussion very closely but I do have some 
opinions (with my "de facto autoconf maintainer" hat on):

1. As a general rule, it is not safe to change the canonicalization (i.e. the 
config.sub output) of an existing system name, *at all*; in many cases, not 
even if it is wrong. I find that people working on GNU tools often don't 
realize just how broadly used these names are. Changing the canonicalization of 
"CPU-VENDOR-mingw32", for example, is very likely to break things like Ansible 
playbooks and Travis-style CI build matrices -- one-off files that exist by the 
tens of thousands and there's no practical way to *enumerate* them all, let 
alone get them all changed to satisfy a GNU-internal desire for a more 
consistent naming convention.

*Very recently introduced* names can be adjusted to correct technical errors.  
For example, "CPU-VENDOR-windows-gnu" is a misnomer IMHO as there is no GNU 
libc port to Windows (see below); config.guess should not produce it and 
config.sub should not convert anything into it.  But if the patch that had 
introduced this mistake were more than a few months old, we would be stuck with 
it, permanently.

2. We should avoid adding any more information to canonical system names.  
Things like the availability of Bourne shell, which of the several available 
implementations of "init" (Unix PID 1) is in use, etc. should be handled with 
Autoconf-style feature probes.  Yes, it's difficult to run ./configure if you 
don't have a Bourne shell, but I suspect most of the environments where that's 
an issue are used primarily as cross-compilation targets rather than 
native-build hosts.

My suggested place to draw the line is, if you reasonably need a cross-compiler 
targeting A to be different from a cross-compiler targeting B, then the 
distinction between A and B can go in the canonical system name; if you don't, 
then it shouldn't.  This should be pretty close to existing practice (because 
that's exactly how GCC uses CSNs, via ./configure --target) and should give us 
concrete reasons to make a decision in each case.

For example, this rule says that the combination of Linux kernel with musl libc 
should be identified as "CPU-VENDOR-linux-musl", not 
"CPU-VENDOR-linux-gnu-musl", regardless of whether the overall system uses 
other GNU components.  This is because the presence or absence of GNU libc 
*does* affect cross-compilation of C programs, but the presence or absence of 
other GNU software doesn't.  [Note: I don't know whether RMS has said anything 
about this, and if he has, I don't care.]

A compiled language *other than* the C family might, in the future, want us to 
make a distinction between cross-compilation targets that existing CSNs do not 
capture, but we can worry about that when it actually happens.

3. I like the idea of a "--parseable" option to config.sub/guess that make them 
spit out something easier to parse.  My preferred syntax would be a newline- or 
semicolon-separated sequence of Bourne shell assignment statements, because, if 
there was also a way to ask config.sub/guess to add a prefix to every variable 
name, that would let Autoconf scripts process the output with `eval` rather 
than the nasty bit of parser goo we have now (_AC_CANONICAL_SPLIT, 
https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/general.m4#n1987).
  It would need to be something like

$ ./config.guess
aarch64-unknown-linux-gnu
$ ./config.guess --prefix=host --parseable
host_cpu=aarch64
host_vendor=unknown
host_os=linux-gnu

It would be OK to introduce additional key=value pairs at that point (kernel, 
abi, etc), but the existing three (cpu, vendor, os) need to keep emitting 
exactly what they do now.

4. We should deemphasize and possibly explicitly deprecate the vendor component 
of a CSN.  Nowadays, in my experience, it just confuses people.

zw



Re: Rethinking configuration tuples

2023-09-06 Thread Jacob Bachmeyer

John Ericson wrote:

On 8/30/23 22:24, Jacob Bachmeyer wrote:


John Ericson wrote:

Err I mean, is there am example of a *-*-linux-$nongnu-musl?


I would expect that to name an embedded environment using Musl libc 
and the Linux kernel, but that is not a full system. (Example:  may 
not even have a shell at all)


I suppose except for the system() function in libc, I would consider 
this a distinction not needed for configs. The choice of what other 
programs to run (be they init or shell) feels to me not like a 
build-time / development configuration decision, but a a runtime / ops 
configuration decision. They aren't "viral" decisions in the way that 
the choice of libc is (since all shared objects that may be combined 
together need to agree on their deps, most notably libc).


The problem is that system(3) probably /does/ exist in that 
configuration, but such a system is only usable as a cross-compilation 
target, since building the GNU tools requires a shell.


Maybe this could go in the config per the "arbitrary many components, 
finer distinctions to the right" "converging sequence" approach, but 
then I would want this further to the right, e.g. 
aarch64-unknown-musl-noshell not aarch64-unknown-noshell-musl.


The idea of lacking a shell was intended as an example of something that 
would make a Free system /very/ different from the GNU system, enough 
that it cannot be considered a GNU variant.  In other words, a 
Linux-based system that is clearly /not/ GNU/Linux.


The choice of system service management is orthogonal to this, 
since it has minimal impact on user programs.  (Unless systemd gets 
even more outrageously invasive...)

Agreed, just wanted to double check.
Of course, if systemd *does* get sufficiently outrageously invasive, 
we might need a *-*-linux-systemd-glibc tuple...  (Since systemd 
gleefully makes extensive use of Linux-kernel-specific features, it 
cannot possibly be a standard on the GNU system, which supports 
multiple Free kernels.)


Yes I agree systemd probably can't be "bonafide GNU OS", but I take 
the opposite conclusion that this is evidence for the "gnu" for glibc 
is more important than the "gnu" for "true GNU OS".


In this hypothetical (that needs to *stay* hypothetical) example, 
"systemd" has somehow become an "OS" distinct from the GNU system.


Except configure usually does not need a "fully disambiguated" 
form---the canonical form produced by config.sub is fine, since 
configure is usually matching against the full tuple using shell 
case patterns.  The flat list with a defined order is optimal for 
this strategy, since it allows to easily check for the presence of 
any tag or combination of tags.
Shell case patterns can be a bit of a footgun. For example, a common 
mistake is doing * instead of *-*.


If the allowed pattern elements are sufficiently unambiguous, there 
is no mistake, since `*' matches text including `-'.  In fact, when 
testing n "is tag FOO present?" predicate `*-foo-* | *-foo' would be 
correct.  (I assume that a CPU type will remain required and will 
remain first in the list.)


Sorry I meant as part of a larger pattern. With things like *-stuff-* 
vs *-*-stuff-*-*, the extra dashes are needed to make sure "stuff" 
matches the right component, and even then it only works if one knows 
the exact number of components (which can be accomplished by *-*... 
and the ordering of patterns). It is quite subtle!


For the "converging sequence" model, omitting the extra dashes is 
important, since the number of tags prior to a "floating tag" can vary.  
(I would actually suggest making "gnu" such a floating tag in this 
model, with an exact definition to be obtained from a later discussion 
that would need to include RMS.)


Allow the hypothetical --parse option to accept a PREFIX argument and 
you are pretty much there:


$ ./config.sub --parse=host x86_64-linux-gnu
host=x86_64-pc-linux-gnu
host_cpu=x86_64
host_vendor=pc
host_kernel=linux
host_os=gnu
$

That form should be both easily parsed by other tools and suitable 
for `eval` in shell scripts.

Yup! We're in agreement.


I agree testing is more robust, but for better or worse I still do 
see scripts using those host_* variables mentioned above. (Testing 
is possible but requires more care to get right for 
cross-compilation, for one.)




In this case the test is `case $host in ... esac`.

I would say it is better to case on (combinations of `host_*` 
variables than `$host`, because then knows exactly what components are 
being cased upon; there is no ambiguity. I think one should basically 
only use `host` as a block-box identifier (e.g. prefixing binaries) 
and and other time one would like to use `host` they should use the 
`host_*` variables instead.


This comes back to the "converging sequence" model issue:  what to do 
with the "floating tags" that are not in fixed fields?


The problem is still getting it /into/ config.sub:  config.sub 
expects a single command-line argument, 

Re: Rethinking configuration tuples

2023-09-05 Thread John Ericson

On 8/30/23 22:24, Jacob Bachmeyer wrote:


John Ericson wrote:

Err I mean, is there am example of a *-*-linux-$nongnu-musl?


I would expect that to name an embedded environment using Musl libc 
and the Linux kernel, but that is not a full system. (Example:  may 
not even have a shell at all)


I suppose except for the system() function in libc, I would consider 
this a distinction not needed for configs. The choice of what other 
programs to run (be they init or shell) feels to me not like a 
build-time / development configuration decision, but a a runtime / ops 
configuration decision. They aren't "viral" decisions in the way that 
the choice of libc is (since all shared objects that may be combined 
together need to agree on their deps, most notably libc).


Maybe this could go in the config per the "arbitrary many components, 
finer distinctions to the right" "converging sequence" approach, but 
then I would want this further to the right, e.g. 
aarch64-unknown-musl-noshell not aarch64-unknown-noshell-musl.


The choice of system service management is orthogonal to this, since 
it has minimal impact on user programs.  (Unless systemd gets even 
more outrageously invasive...)

Agreed, just wanted to double check.
Of course, if systemd *does* get sufficiently outrageously invasive, 
we might need a *-*-linux-systemd-glibc tuple...  (Since systemd 
gleefully makes extensive use of Linux-kernel-specific features, it 
cannot possibly be a standard on the GNU system, which supports 
multiple Free kernels.)


Yes I agree systemd probably can't be "bonafide GNU OS", but I take the 
opposite conclusion that this is evidence for the "gnu" for glibc is 
more important than the "gnu" for "true GNU OS".


Except configure usually does not need a "fully disambiguated" 
form---the canonical form produced by config.sub is fine, since 
configure is usually matching against the full tuple using shell 
case patterns.  The flat list with a defined order is optimal for 
this strategy, since it allows to easily check for the presence of 
any tag or combination of tags.
Shell case patterns can be a bit of a footgun. For example, a common 
mistake is doing * instead of *-*.


If the allowed pattern elements are sufficiently unambiguous, there is 
no mistake, since `*' matches text including `-'.  In fact, when 
testing n "is tag FOO present?" predicate `*-foo-* | *-foo' would be 
correct.  (I assume that a CPU type will remain required and will 
remain first in the list.)


Sorry I meant as part of a larger pattern. With things like *-stuff-* vs 
*-*-stuff-*-*, the extra dashes are needed to make sure "stuff" matches 
the right component, and even then it only works if one knows the exact 
number of components (which can be accomplished by *-*... and the 
ordering of patterns). It is quite subtle!


Allow the hypothetical --parse option to accept a PREFIX argument and 
you are pretty much there:


$ ./config.sub --parse=host x86_64-linux-gnu
host=x86_64-pc-linux-gnu
host_cpu=x86_64
host_vendor=pc
host_kernel=linux
host_os=gnu
$

That form should be both easily parsed by other tools and suitable for 
`eval` in shell scripts.

Yup! We're in agreement.


I agree testing is more robust, but for better or worse I still do 
see scripts using those host_* variables mentioned above. (Testing is 
possible but requires more care to get right for cross-compilation, 
for one.)




In this case the test is `case $host in ... esac`.

I would say it is better to case on (combinations of `host_*` variables 
than `$host`, because then knows exactly what components are being cased 
upon; there is no ambiguity. I think one should basically only use 
`host` as a block-box identifier (e.g. prefixing binaries) and and other 
time one would like to use `host` they should use the `host_*` variables 
instead.


The problem is still getting it /into/ config.sub:  config.sub expects 
a single command-line argument, while pre-parsed form spans a few lines.


I don't think that is so hard. config.sub accepts --gnu-long-args 
already )without confusing them as configs) so we can simply do 
something like


./config.sub --pre-categorized cpu=x86_64 vendor=pc kernel=linux os=gnu

and then there is no confusing the two forms of input.


[...]
I am not entirely certain why, but I know that there is some reason 
we call the common GNU/Linux systems *-*-linux-gnu instead of 
*-*-linux.


To be honest, I think this is basically the "call it GNU/Linux not 
Linux" controversy --- i.e. at the time it was done for social not 
technical reasons. I don't mind, since now that we have multiple 
libcs there /is/ a technical reason to distinguish. But this circles 
back to my hunch that Kernel (syscall interface) + libc (ABI) 
determines OS uniquely enough for config.sub's purposes.




That is possible, but still a valid reason for the GNU Project to stay 
with that angle.


Yeah I have no problem with the term GNU/Linux, I just don't think "OS" 
is useful for 

Re: Rethinking configuration tuples

2023-08-30 Thread Jacob Bachmeyer

John Ericson wrote:

On 8/27/23 23:59, Jacob Bachmeyer wrote:
[...]
This is also the framework in which *-*-linux-gnu-musl makes sense 
for a system that uses Musl libc but is otherwise a GNU/Linux system.


Right but again where do we draw the line? For example, can one use 
systemd and its large entourage of intertwined software, or must one 
use GNU Shepherd or System V init?




In the case of *-*-linux-gnu and *-*-linux-gnu-musl, the difference 
is the C runtime library (GNU libc vs. Musl libc) such that shared 
objects linked for one ABI are not compatible with the other.  If 
Musl libc were exactly 100% binary compatible with GNU libc, then 
there would be no *-*-linux-gnu-musl platform, since it would be 
indistinguishable from *-*-linux-gnu.


Err I mean, is there am example of a *-*-linux-$nongnu-musl?



I would expect that to name an embedded environment using Musl libc and 
the Linux kernel, but that is not a full system.  (Example:  may not 
even have a shell at all)



[...]

The choice of system service management is orthogonal to this, since 
it has minimal impact on user programs.  (Unless systemd gets even 
more outrageously invasive...)


Agreed, just wanted to double check.



Of course, if systemd *does* get sufficiently outrageously invasive, we 
might need a *-*-linux-systemd-glibc tuple...  (Since systemd gleefully 
makes extensive use of Linux-kernel-specific features, it cannot 
possibly be a standard on the GNU system, which supports multiple Free 
kernels.)


Except configure usually does not need a "fully disambiguated" 
form---the canonical form produced by config.sub is fine, since 
configure is usually matching against the full tuple using shell case 
patterns.  The flat list with a defined order is optimal for this 
strategy, since it allows to easily check for the presence of any tag 
or combination of tags.


Shell case patterns can be a bit of a footgun. For example, a common 
mistake is doing * instead of *-*.




If the allowed pattern elements are sufficiently unambiguous, there is 
no mistake, since `*' matches text including `-'.  In fact, when testing 
an "is tag FOO present?" predicate `*-foo-* | *-foo' would be correct.  
(I assume that a CPU type will remain required and will remain first in 
the list.)


I would rather case on disambiguated variables. Indeed, 
AC_CANONICAL_HOST computes host_cpu, host_vendor, and host_os for 
precisely that purpose. If config.sub could split out the 
disambiguated form, those variables could be defined more simply and 
robustly.




Allow the hypothetical --parse option to accept a PREFIX argument and 
you are pretty much there:


$ ./config.sub --parse=host x86_64-linux-gnu
host=x86_64-pc-linux-gnu
host_cpu=x86_64
host_vendor=pc
host_kernel=linux
host_os=gnu
$

That form should be both easily parsed by other tools and suitable for 
`eval` in shell scripts.



Note that config.sub is itself a shell script, and handling JSON in 
shell is a giant pain.  The most we could reasonably do is what 
config.sub already does:  determine each component as a separate 
variable and then output that by substituting text into a template.
Yes I agree config.sub in its current form (must be highly portable 
across different Bourne-shell derivatives) has no hope of parsing 
JSON. It could output it or it could also output your 
${key}=${value}\n format, and it could also consume your format. 
Your format is ideal for it!
Adding a prefix to each key in the key=value format is trivial and 
would further help shell scripts that want to "parse by eval" but 
configure itself tests predicates rather than caring exactly what 
part of the configuration tuple means what.  Put another way, 
configure is usually looking for a yes/no answer, so a pre-parsed 
form is less useful than a single string that can be used for pattern 
matches.


I agree testing is more robust, but for better or worse I still do see 
scripts using those host_* variables mentioned above. (Testing is 
possible but requires more care to get right for cross-compilation, 
for one.)




In this case the test is `case $host in ... esac`.

There is no reasonable way to feed the key=value format /into/ 
config.sub: configuration tuples are hyphen-delimited lists.


I think there is. The overall algorithm is roughly "(a) decide which 
component is which, (b) sanitize and normalize components decision to 
that decision". We would skip step (a) and go straight to step (b) in 
order to do this.


This indicates part of the value of doing this: rather than just 
"system testing" the entirety of config.sub, we would now have 
something closer to a "unit test" of part of it in isolation.


FWIW, this is similar to a rearranging the code to a support a mode 
where non-normal-form configs are rejected instead of normalized.


The problem is still getting it /into/ config.sub:  config.sub expects a 
single command-line argument, while pre-parsed form spans a few lines.



[...]
I am not entirely 

Re: Rethinking configuration tuples

2023-08-27 Thread John Ericson
On 8/27/23 23:59, Jacob Bachmeyer wrote:
>> I am OK with duck-typing, but what is "all meaningful ways"? Sure, POSIX is 
>> meaningful, the exact output of uname is not, etc. but where do we draw the 
>> line?
> That is a question for which I do not currently have a certain answer.  :/
Thanks, we'll keep trying to tease one out.

>>> This is also the framework in which *-*-linux-gnu-musl makes sense for a 
>>> system that uses Musl libc but is otherwise a GNU/Linux system.
>> 
>> Right but again where do we draw the line? For example, can one use systemd 
>> and its large entourage of intertwined software, or must one use GNU 
>> Shepherd or System V init?
>> 
> 
> In the case of *-*-linux-gnu and *-*-linux-gnu-musl, the difference is the C 
> runtime library (GNU libc vs. Musl libc) such that shared objects linked for 
> one ABI are not compatible with the other.  If Musl libc were exactly 100% 
> binary compatible with GNU libc, then there would be no *-*-linux-gnu-musl 
> platform, since it would be indistinguishable from *-*-linux-gnu.
Err I mean, is there am example of a *-*-linux-$nongnu-musl?

Agreed that if Musl was binary compatible with glibc, there would be no need to 
distinguish at the config level.

> The choice of system service management is orthogonal to this, since it has 
> minimal impact on user programs.  (Unless systemd gets even more outrageously 
> invasive...)
Agreed, just wanted to double check.

> Except configure usually does not need a "fully disambiguated" form---the 
> canonical form produced by config.sub is fine, since configure is usually 
> matching against the full tuple using shell case patterns.  The flat list 
> with a defined order is optimal for this strategy, since it allows to easily 
> check for the presence of any tag or combination of tags.
Shell case patterns can be a bit of a footgun. For example, a common mistake is 
doing * instead of *-*. I would rather case on disambiguated variables. Indeed, 
AC_CANONICAL_HOST computes host_cpu, host_vendor, and host_os for precisely 
that purpose. If config.sub could split out the disambiguated form, those 
variables could be defined more simply and robustly.

> Note that config.sub is itself a shell script, and handling JSON in shell is 
> a giant pain.  The most we could reasonably do is what config.sub already 
> does:  determine each component as a separate variable and then output that 
> by substituting text into a template.
>> Yes I agree config.sub in its current form (must be highly portable across 
>> different Bourne-shell derivatives) has no hope of parsing JSON. It could 
>> output it or it could also output your ${key}=${value}\n format, and it 
>> could also consume your format. Your format is ideal for it!
> Adding a prefix to each key in the key=value format is trivial and would 
> further help shell scripts that want to "parse by eval" but configure itself 
> tests predicates rather than caring exactly what part of the configuration 
> tuple means what.  Put another way, configure is usually looking for a yes/no 
> answer, so a pre-parsed form is less useful than a single string that can be 
> used for pattern matches.
I agree testing is more robust, but for better or worse I still do see scripts 
using those host_* variables mentioned above. (Testing is possible but requires 
more care to get right for cross-compilation, for one.)

> There is no reasonable way to feed the key=value format *into* config.sub: 
> configuration tuples are hyphen-delimited lists.
I think there is. The overall algorithm is roughly "(a) decide which component 
is which, (b) sanitize and normalize components decision to that decision". We 
would skip step (a) and go straight to step (b) in order to do this.

This indicates part of the value of doing this: rather than just "system 
testing" the entirety of config.sub, we would now have something closer to a 
"unit test" of part of it in isolation.

FWIW, this is similar to a rearranging the code to a support a mode where 
non-normal-form configs are rejected instead of normalized.

> Producing key=value format using config.sub's knowledge of valid tuples might 
> be reasonable for *other* systems to use instead of needing their own parsers.
Yes it is definitely necessary for that, and that is a good use-case for sure.

>>> Thank you; as I mentioned above, the goal is to best support heterogeneous 
>>> multi-arch systems, but recognizing a tension here.  For configure, the 
>>> configuration tuple should not contain information that can be determined 
>>> by testing, but for storing multiple binary sets, ABIs do need to be part 
>>> of the name, even if they can be determined by configure tests.
>> 
>> Agreed configure tests are better for the "long tail" of other attributes. 
>> (IMO if we were to define "operating system", it would be something like the 
>> "limit" of all configure checks.) 
>> 
>> But a big part of my "kernel-libc" thinking (and I think also Connor's) is 

Re: Rethinking configuration tuples

2023-08-27 Thread Jacob Bachmeyer

John Ericson wrote:

On 8/27/23 01:06, Jacob Bachmeyer wrote:
[...]
Ah sorry, I shouldn't have made reference to JSON at all --- what I 
really was getting at is the /abstract syntax/. In particular, 
rather than having an abstract syntax of "list of strings" (parsing 
today's concrete syntax by breaking on dash), where the meaning of 
each string is ambiguous / context-sensative, we have of "keys 
mapped to enumerations", i.e. one always knows the meaning of each 
component explicitly / without inspecting it or its context.


JSON or your flat list in canonical ordering (where I assume we are 
careful to never skip a type of component) are both valid concrete 
syntaxes that can be parsed / printed from this abstract syntax.




JSON is far too complicated to use here, except possibly as a 
"pre-parsed" form that config.sub could output on request for 
programs that want a structured form instead of parsing the tuple 
themselves.  But for that case, why use JSON instead of a trivial 
multi-line key=value format?


Hypothetical Example:
$ config.sub --parse x86_64-linux-gnu
cpu=x86_64
vendor=pc
kernel=linux
os=gnu
$

Note that this example both canonicalizes and parses.


Yes that looks great to me. This shares the abstract syntax with what 
I had in mind, and anything that understands JSON can easily convert 
back and forth between the two.


I argue for "duck-typing" here from the user's perspective:  if and 
only if the system in all meaningful ways appears to be the GNU 
system, there should be a *-gnu* somewhere in the configuration tuple.


I am OK with duck-typing, but what is "all meaningful ways"? Sure, 
POSIX is meaningful, the exact output of uname is not, etc. but where 
do we draw the line?




That is a question for which I do not currently have a certain answer.  :/

This is also the framework in which *-*-linux-gnu-musl makes sense 
for a system that uses Musl libc but is otherwise a GNU/Linux system.


Right but again where do we draw the line? For example, can one use 
systemd and its large entourage of intertwined software, or must one 
use GNU Shepherd or System V init?




In the case of *-*-linux-gnu and *-*-linux-gnu-musl, the difference is 
the C runtime library (GNU libc vs. Musl libc) such that shared objects 
linked for one ABI are not compatible with the other.  If Musl libc were 
exactly 100% binary compatible with GNU libc, then there would be no 
*-*-linux-gnu-musl platform, since it would be indistinguishable from 
*-*-linux-gnu.  The choice of system service management is orthogonal to 
this, since it has minimal impact on user programs.  (Unless systemd 
gets even more outrageously invasive...)



[...]

I still oppose JSON because it is way too verbose for this:  
configuration tuples need to be both expressive and simple enough 
to type at a shell prompt as arguments to configure.  Using JSON by 
default would also be a very nasty "flag day" that would break all 
existing programs that use config.sub.  Perhaps config.sub could 
accept an --as=json parameter for JSON output?
Yes exactly, JSON is a no-go for prefixed binaries, but probably 
better for things like Autoconf which needs to parse the output of 
config.sub either way.
No, because Autoconf uses the shell and JSON is a [*profanity 
elided*] to parse using shell constructs.  A flat list of 
hyphen-delimited tags is almost ideal for the parsing that configure 
needs to do.  In fact, with a few restrictions (met by using 
canonical ordering) this is what configure /already/ parses.


Oops, yes I was being sloppy confusing concrete and abstract syntax 
again. Sorry!


I think while that for something like Meson or CMake JSON could be 
better, for Autoconf your ${key}=${value}\n format is perfect. Easy to 
parse and fully disambiguated.


And of course, GNU config should care more about Autoconf than Meson 
or CMake.




Except configure usually does not need a "fully disambiguated" 
form---the canonical form produced by config.sub is fine, since 
configure is usually matching against the full tuple using shell case 
patterns.  The flat list with a defined order is optimal for this 
strategy, since it allows to easily check for the presence of any tag or 
combination of tags.


Note that config.sub is itself a shell script, and handling JSON in 
shell is a giant pain.  The most we could reasonably do is what 
config.sub already does:  determine each component as a separate 
variable and then output that by substituting text into a template.


Yes I agree config.sub in its current form (must be highly portable 
across different Bourne-shell derivatives) has no hope of parsing 
JSON. It could output it or it could also output your 
${key}=${value}\n format, and it could also consume your format. Your 
format is ideal for it!




Adding a prefix to each key in the key=value format is trivial and would 
further help shell scripts that want to "parse by eval" but configure 
itself tests predicates rather than caring 

Re: Rethinking configuration tuples

2023-08-27 Thread John Ericson

On 8/27/23 01:06, Jacob Bachmeyer wrote:
As I understand the history, Linux was the first clearly Free kernel 
available.  At the time, BSD still had a dark cloud hanging over it 
due to its (distant) origins at AT the BSD and AT UNIX codebases 
would not be legally recognized as separate until February 1994, 
although BSD had honestly (almost?) completely diverged from the AT 
codebase in June 1991 with Net/2.  Mach was still proprietary; RMS was 
(or would later be) campaigning for its liberation, which would not 
occur until some years later.  It is worth noting that Linux was 
originally a toy kernel, and it only attracted the effort it did and 
grew like it did because it was basically the last missing piece for 
fully Free systems at the time.


Yes that is how I understand it too

Ah sorry, I shouldn't have made reference to JSON at all --- what I 
really was getting at is the /abstract syntax/. In particular, rather 
than having an abstract syntax of "list of strings" (parsing today's 
concrete syntax by breaking on dash), where the meaning of each 
string is ambiguous / context-sensative, we have of "keys mapped to 
enumerations", i.e. one always knows the meaning of each component 
explicitly / without inspecting it or its context.


JSON or your flat list in canonical ordering (where I assume we are 
careful to never skip a type of component) are both valid concrete 
syntaxes that can be parsed / printed from this abstract syntax.




JSON is far too complicated to use here, except possibly as a 
"pre-parsed" form that config.sub could output on request for programs 
that want a structured form instead of parsing the tuple themselves.  
But for that case, why use JSON instead of a trivial multi-line 
key=value format?


Hypothetical Example:
$ config.sub --parse x86_64-linux-gnu
cpu=x86_64
vendor=pc
kernel=linux
os=gnu
$

Note that this example both canonicalizes and parses.


Yes that looks great to me. This shares the abstract syntax with what I 
had in mind, and anything that understands JSON can easily convert back 
and forth between the two.


I argue for "duck-typing" here from the user's perspective:  if and 
only if the system in all meaningful ways appears to be the GNU 
system, there should be a *-gnu* somewhere in the configuration tuple.


I am OK with duck-typing, but what is "all meaningful ways"? Sure, POSIX 
is meaningful, the exact output of uname is not, etc. but where do we 
draw the line?


This is also the framework in which *-*-linux-gnu-musl makes sense for 
a system that uses Musl libc but is otherwise a GNU/Linux system.


Right but again where do we draw the line? For example, can one use 
systemd and its large entourage of intertwined software, or must one use 
GNU Shepherd or System V init?



Effectively, a different libc is a different ABI.


Agreed, especially when the syscall interface isn't stable, like with 
many non-Windows kernels.


My larger goal here is to smooth the way for multi-arch systems, with 
/usr/CPU-VENDOR-KERNEL-OS-ABI or so as the --prefix for binaries built 
for each architecture.  This means that configuration tuples should be 
detailed enough to allow the needed distinctions, but not so detailed 
as to themselves become an artificial incompatibility.  In larger 
networked environments, even KERNEL and OS could vary.


It's a great goal, and mine too! :)


Yeah whatever windows-something we settle on for MinGW, I promise my 
offer still stands to try to get get LLVM to (a) accept it, and (b) 
steer people away from windows-gnu towards it.

Thanks.

No problem! :)
This is the major expectation that using *-*-windows-gnu for MinGW 
violates:  GNU implements POSIX and MinGW does not.  Using *-mingnu 
still leaves considerable room for confusion in my view, which using 
*-mingw avoids. 


That is fine with me. Agreed "mingnu" takes the proper noun and turns it 
back into a common noun phrase --- i.e. "minimal GNU" has many valid 
interpretations while "MinGW" avoids that be being a known quantity.


After that, I think we are close enough to convene a working group 
for a JSON/whatever explicit standard. And that would be amazing.


I still oppose JSON because it is way too verbose for this: 
configuration tuples need to be both expressive and simple enough to 
type at a shell prompt as arguments to configure. Using JSON by 
default would also be a very nasty "flag day" that would break all 
existing programs that use config.sub. Perhaps config.sub could 
accept an --as=json parameter for JSON output?
Yes exactly, JSON is a no-go for prefixed binaries, but probably 
better for things like Autoconf which needs to parse the output of 
config.sub either way.
No, because Autoconf uses the shell and JSON is a [*profanity elided*] 
to parse using shell constructs.  A flat list of hyphen-delimited tags 
is almost ideal for the parsing that configure needs to do.  In fact, 
with a few restrictions (met by using canonical ordering) this is what 

Re: Rethinking configuration tuples

2023-08-26 Thread Jacob Bachmeyer

John Ericson wrote:

On 8/24/23 23:54, Jacob Bachmeyer wrote:

John Ericson wrote:


This is why I opened with "Operating System" lacks a coherent 
objective definition.


[...]


As I understand, historically, "operating systems" were proprietary 
monoliths and the GNU Project originally expected to produce another 
monolith, but /our/ monolith would be Free Software.  As an interim 
measure, the GNU utilities were designed to be widely portable across 
the various individually-monolithic proprietary operating systems 
then in use across a wide variety of hardware.  The broader Free 
Software Movement unexpectedly shattered that state of affairs, 
leading to the 4-element configuration tuple form, when the Linux 
kernel became available and it was noticed that---oops!---GNU on 
Linux and GNU on HURD would have significant differences that at 
least some of the GNU packages would need to handle.  (For example, 
GNU libc is very different between Linux, where POSIX I/O maps fairly 
directly to underlying syscalls, and HURD, where POSIX I/O must be 
translated to Mach IPC, but both of these are Free GNU systems.)


This means that the GNU system is a somewhat blurry category, with 
many variants possible, and is orthogonal to "Linux":  there are 
GNU/Linux systems, GNU systems using other kernels, and Linux-based 
systems not using GNU at all.  This latter category is fairly common 
in embedded systems, where the GNU utilities are often eschewed for 
lighter-weight alternatives to save flash space (or, less honorably, 
to avoid GPL3).


Yes I agree with this state of affairs. I sometimes (but not always!) 
detect a sort of "Linux Scooped us" sentiment in GNU quarters, but as 
I see it portability and diversity of distros was pretty much 
inevitable --- replacing propriety Unix userlands with GNU software 
was a huge point in how GNU got going in academic/institutional 
environments in the early days, and even if Hurd got there before 
Linux there would be no reason to rip out that portability.




As I understand the history, Linux was the first clearly Free kernel 
available.  At the time, BSD still had a dark cloud hanging over it due 
to its (distant) origins at AT the BSD and AT UNIX codebases would 
not be legally recognized as separate until February 1994, although BSD 
had honestly (almost?) completely diverged from the AT codebase in 
June 1991 with Net/2.  Mach was still proprietary; RMS was (or would 
later be) campaigning for its liberation, which would not occur until 
some years later.  It is worth noting that Linux was originally a toy 
kernel, and it only attracted the effort it did and grew like it did 
because it was basically the last missing piece for fully Free systems 
at the time.


JSON is pretty much a hard no for me:  it is far too complex for what 
really needs to be a simple structure.  Flat strings work very well 
for the way that GNU software typically expects to parse a 
configuration tuple using shell constructs.  Perhaps it would be 
better to redefine configuration tuples as a flat list of tags with a 
canonical ordering?  (The reason for a canonical ordering is in part 
to ensure that all existing coherent configuration tuple strings 
remain valid and to ensure that text-based pattern matching continues 
to work.)


Ah sorry, I shouldn't have made reference to JSON at all --- what I 
really was getting at is the /abstract syntax/. In particular, rather 
than having an abstract syntax of "list of strings" (parsing today's 
concrete syntax by breaking on dash), where the meaning of each string 
is ambiguous / context-sensative, we have of "keys mapped to 
enumerations", i.e. one always knows the meaning of each component 
explicitly / without inspecting it or its context.


JSON or your flat list in canonical ordering (where I assume we are 
careful to never skip a type of component) are both valid concrete 
syntaxes that can be parsed / printed from this abstract syntax.




JSON is far too complicated to use here, except possibly as a 
"pre-parsed" form that config.sub could output on request for programs 
that want a structured form instead of parsing the tuple themselves.  
But for that case, why use JSON instead of a trivial multi-line 
key=value format?


Hypothetical Example:
$ config.sub --parse x86_64-linux-gnu
cpu=x86_64
vendor=pc
kernel=linux
os=gnu
$

Note that this example both canonicalizes and parses.


[...]
I know Po Lu doesn't like them, because they overlap with existing 
ones. But what about you two, Adam and Jacob? I am trying to 
compromise between what various things do already, and and also 
correct things like windows-gnu (even if there is no such thing as 
the GNU operating system (only multiple GNU Hurd-supporting 
distros), I agree that MinGW is clearly not a complete enough of set 
of GNU software to earn the right to drop the "minimal" part).


The logical problem with your parenthetical is that it ignores 
GNU/Linux, which *is* also a GNU 

Re: Rethinking configuration tuples (was: Re: config.sub should normalize *-*-windows-*)

2023-08-26 Thread John Ericson


On 8/24/23 23:54, Jacob Bachmeyer wrote:

John Ericson wrote:


This is why I opened with "Operating System" lacks a coherent 
objective definition.


[...]


As I understand, historically, "operating systems" were proprietary 
monoliths and the GNU Project originally expected to produce another 
monolith, but /our/ monolith would be Free Software.  As an interim 
measure, the GNU utilities were designed to be widely portable across 
the various individually-monolithic proprietary operating systems then 
in use across a wide variety of hardware.  The broader Free Software 
Movement unexpectedly shattered that state of affairs, leading to the 
4-element configuration tuple form, when the Linux kernel became 
available and it was noticed that---oops!---GNU on Linux and GNU on 
HURD would have significant differences that at least some of the GNU 
packages would need to handle.  (For example, GNU libc is very 
different between Linux, where POSIX I/O maps fairly directly to 
underlying syscalls, and HURD, where POSIX I/O must be translated to 
Mach IPC, but both of these are Free GNU systems.)


This means that the GNU system is a somewhat blurry category, with 
many variants possible, and is orthogonal to "Linux":  there are 
GNU/Linux systems, GNU systems using other kernels, and Linux-based 
systems not using GNU at all.  This latter category is fairly common 
in embedded systems, where the GNU utilities are often eschewed for 
lighter-weight alternatives to save flash space (or, less honorably, 
to avoid GPL3).


Yes I agree with this state of affairs. I sometimes (but not always!) 
detect a sort of "Linux Scooped us" sentiment in GNU quarters, but as I 
see it portability and diversity of distros was pretty much inevitable 
--- replacing propriety Unix userlands with GNU software was a huge 
point in how GNU got going in academic/institutional environments in the 
early days, and even if Hurd got there before Linux there would be no 
reason to rip out that portability.


JSON is pretty much a hard no for me:  it is far too complex for what 
really needs to be a simple structure.  Flat strings work very well 
for the way that GNU software typically expects to parse a 
configuration tuple using shell constructs.  Perhaps it would be 
better to redefine configuration tuples as a flat list of tags with a 
canonical ordering?  (The reason for a canonical ordering is in part 
to ensure that all existing coherent configuration tuple strings 
remain valid and to ensure that text-based pattern matching continues 
to work.)


Ah sorry, I shouldn't have made reference to JSON at all --- what I 
really was getting at is the /abstract syntax/. In particular, rather 
than having an abstract syntax of "list of strings" (parsing today's 
concrete syntax by breaking on dash), where the meaning of each string 
is ambiguous / context-sensative, we have of "keys mapped to 
enumerations", i.e. one always knows the meaning of each component 
explicitly / without inspecting it or its context.


JSON or your flat list in canonical ordering (where I assume we are 
careful to never skip a type of component) are both valid concrete 
syntaxes that can be parsed / printed from this abstract syntax.





---

Concretely, I think these are pretty clear configs:

CPU-VENDOR-windows-mingnu # MinGW, MS C + GNU C++ and other GNU-ish 
things, TODO distinguish between MSVCRT and UCRT




I say that this one really should just be *-mingw.


Sure. I went with mingnu because the "w" is redundant with the 
"windows", but ultimately I care more about the pattern than the exact 
choice of identifiers / enumeration tags. (As we way in programming 
language land, I care about the thing "up to alpha-renaming").


Note that there are both MinGW32 and MinGW64, corresponding to 32-bit 
and 64-bit Windows APIs.  Should that be included or should the CPU 
type be used to distinguish?  (e.g.  i686-pc-windows-mingw is MinGW32 
and x86_64-pc-windows-mingw is MinGW64?)


Yes I think so. If you look at https://www.mingw-w64.org/downloads/ one 
even sees |x86_64-w64-mingw32| which is quite something, and 64-bit!


I think what happened is that "w32" to was chosen to mean the then-new 
win32 API/ABI, as opposed to DOS. Win64 as I understand is necessarily a 
new ABI because of the change in CPU arch, but not really a new API, 
being more of a "let's make the minimal amount of changes so the 
source/headers are portable" situation. So a combination of "same API" 
and "too lazy to update GNU config" made "mingw32" stick around.


f16804b79ee5a23a9994a1cdc760cd9ba813148a added mingw64 to GNU config in 
2012, which is far after the advent of 64-bit Windows.


In the proposed five-element form, MSVCRT and UCRT are easily 
distinguished.  Example:


i686-pc-windows-mingw-msvcrt
i686-pc-windows-mingw-ucrt
x86_64-pc-windows-mingw-msvcrt
x86_64-pc-windows-mingw-ucrt


That is very true, I will grant you that :)


CPU-VENDOR-windows-cygnus # Cygwin


Re: Rethinking configuration tuples

2023-08-24 Thread Jacob Bachmeyer

Po Lu wrote:

People, the nature and widespread use of config.* precludes any efforts
aimed at ``rethinking'' the tuples they accept and generate.  If you
want your own format, then by all means, proceed with your own project.
But please leave config.* in peace.
  


I will say right now that backwards compatibility, specifically that 
existing tuples remain unchanged as much as possible (blatantly 
incorrect tuples such as *-windows-gnu for MinGW excepted) is an 
absolute requirement here.


Existing code expects the existing strings.  Those must be preserved.


-- Jacob




Re: Rethinking configuration tuples

2023-08-24 Thread Po Lu
People, the nature and widespread use of config.* precludes any efforts
aimed at ``rethinking'' the tuples they accept and generate.  If you
want your own format, then by all means, proceed with your own project.
But please leave config.* in peace.



Rethinking configuration tuples (was: Re: config.sub should normalize *-*-windows-*)

2023-08-24 Thread Jacob Bachmeyer

John Ericson wrote:


This is why I opened with "Operating System" lacks a coherent 
objective definition.


The more pugilistic message is to say the rest of the world doesn't 
think the GNU operating system exists --- that there is simply a 
choice of kernel (Linux, k*BSD, Hurd, something else...) and choices 
of libraries and system components on top of that, and many 
combinations are possible. The rest of the world might say this in a 
mean way, but I say it is actually a /good/ thing --- software freedom 
means one /can/ choose my components à la carte, and only a lack of 
software freedom results in a kernel and mass of libraries outside 
one's control blurring together into a scary "take it or leave it" 
monolith we call an operating system.




As I understand, historically, "operating systems" were proprietary 
monoliths and the GNU Project originally expected to produce another 
monolith, but /our/ monolith would be Free Software.  As an interim 
measure, the GNU utilities were designed to be widely portable across 
the various individually-monolithic proprietary operating systems then 
in use across a wide variety of hardware.  The broader Free Software 
Movement unexpectedly shattered that state of affairs, leading to the 
4-element configuration tuple form, when the Linux kernel became 
available and it was noticed that---oops!---GNU on Linux and GNU on HURD 
would have significant differences that at least some of the GNU 
packages would need to handle.  (For example, GNU libc is very different 
between Linux, where POSIX I/O maps fairly directly to underlying 
syscalls, and HURD, where POSIX I/O must be translated to Mach IPC, but 
both of these are Free GNU systems.)


This means that the GNU system is a somewhat blurry category, with many 
variants possible, and is orthogonal to "Linux":  there are GNU/Linux 
systems, GNU systems using other kernels, and Linux-based systems not 
using GNU at all.  This latter category is fairly common in embedded 
systems, where the GNU utilities are often eschewed for lighter-weight 
alternatives to save flash space (or, less honorably, to avoid GPL3).



On 8/24/23 08:51, Adam Joseph wrote:

[...]
It seems like a lot of the proposals in this thread are being evaluated not
based on whether or not they are coherent, but rather on whether or not they
take us a few nanometers closer to whatever happens to whatever LLVM's internal
implementation details happen to be this week.



I care about coherence, the reason I like to see what LLVM does that 
working from a parsed representation forces the software to be much 
more honest. Since GNU config doesn't reveal its categories but just 
spits out another opaque string, there is no external pressure for its 
categorization to be any good. LLVM, on the other hand, dispenses with 
strings entirely and just uses the enums, so it is forced to make sure 
those enums make sense and work for the branching the program has to do.


LLVM parsing of configs is ad-hoc Postel's law stuff like everyone 
else, but its internal representation is actually quite stable. 
Parsing is the ugly nasty part that gets to the pristine clear 
ontology on the other side.


Ultimately I would like to convene everyone to commit to an agreed 
upon internal representation too. E.g. clang and GNU config could both 
spit out some JSON that is unambiguous and should match. I think that 
would alleviate a lot of Adam's concerns about "following LLVM". But I 
don't think it is possible to convene the working group needed to 
standardize such a format yet, because there is little trust between 
parties. Moving us a "a few nanometers closer" on each side 
demonstrates that there is willingness to compromise.




JSON is pretty much a hard no for me:  it is far too complex for what 
really needs to be a simple structure.  Flat strings work very well for 
the way that GNU software typically expects to parse a configuration 
tuple using shell constructs.  Perhaps it would be better to redefine 
configuration tuples as a flat list of tags with a canonical ordering?  
(The reason for a canonical ordering is in part to ensure that all 
existing coherent configuration tuple strings remain valid and to ensure 
that text-based pattern matching continues to work.)



---

Concretely, I think these are pretty clear configs:

CPU-VENDOR-windows-mingnu # MinGW, MS C + GNU C++ and other GNU-ish 
things, TODO distinguish between MSVCRT and UCRT




I say that this one really should just be *-mingw.  Note that there are 
both MinGW32 and MinGW64, corresponding to 32-bit and 64-bit Windows 
APIs.  Should that be included or should the CPU type be used to 
distinguish?  (e.g.  i686-pc-windows-mingw is MinGW32 and 
x86_64-pc-windows-mingw is MinGW64?)


In the proposed five-element form, MSVCRT and UCRT are easily 
distinguished.  Example:


i686-pc-windows-mingw-msvcrt
i686-pc-windows-mingw-ucrt
x86_64-pc-windows-mingw-msvcrt