Re: Rethinking configuration tuples
John Ericson writes: > On 9/19/23 21:07, Po Lu wrote: > > Why not? > > I have on my hand several programs which use -winnt*, such as many old > releases of Emacs. And users should be capable of replacing > config.sub and config.guess with newer versions without ill effect. > > At no point has anyone proposed removing *-winnt-*. And while I think > a deprecation message is a good idea, no one has submitted yet a patch > for that either. > > With Dmitry's plan, you can still upgrade config.sub in those old > versions of Emacs if you like without any issue. Why should config.sub print anything at all? An extraneous message constitutes ill effect.
Re: Rethinking configuration tuples
On 9/19/23 21:07, Po Lu wrote: Why not? I have on my hand several programs which use -winnt*, such as many old releases of Emacs. And users should be capable of replacing config.sub and config.guess with newer versions without ill effect. At no point has anyone proposed removing *-winnt-*. And while I think a deprecation message is a good idea, no one has submitted yet a patch for that either. With Dmitry's plan, you can still upgrade config.sub in those old versions of Emacs if you like without any issue. John
Re: Rethinking configuration tuples
"Dmitry V. Levin" writes: > I'm inclined to remove windows-gnu from config.sub instead of renaming or > canonicalizing it because, firstly, there is no GNU libc on windows, and, > secondly, windows-gnu as used by LLVM means MinGW, but for that we already > have mingw*, and we should avoid adding new canonical names for the same > thing. We could add canonicalization of windows-mingw* into mingw*, but > if nobody uses the former, why bother? > > At the same time, I'm inclined to leave windows-msvc as is because, > unlike windows-gnu, it does exist, and the only one who objected against > windows-msvc and suggested to canonicalize windows-msvc into winnt was > Po Lu, but the arguments provided against windows-msvc were not convincing. Why not? I have on my hand several programs which use -winnt*, such as many old releases of Emacs. And users should be capable of replacing config.sub and config.guess with newer versions without ill effect.
Re: Rethinking configuration tuples
Thanks Dmitry. This is an acceptable outcome to me. It is a nice middle ground between Po Lu's and my first choice options. John On 9/19/23 19:58, Dmitry V. Levin wrote: On Thu, Sep 14, 2023 at 12:55:06AM -0400, John Ericson wrote: OK here we go: 1. https://github.com/ericson2314/gnu-config/commit/windows-mingnu.patch 2. https://github.com/ericson2314/gnu-config/commit/windows-mingw.patch 3. https://github.com/ericson2314/gnu-config/commit/no-windows-gnu.patch I tried to honestly argue for each of them the best I could in the commit message. I know I prefer (1); I am guessing Jacob prefers (2), and Po Lu prefers (3). Have fun, Dmitry :). I'm inclined to remove windows-gnu from config.sub instead of renaming or canonicalizing it because, firstly, there is no GNU libc on windows, and, secondly, windows-gnu as used by LLVM means MinGW, but for that we already have mingw*, and we should avoid adding new canonical names for the same thing. We could add canonicalization of windows-mingw* into mingw*, but if nobody uses the former, why bother? At the same time, I'm inclined to leave windows-msvc as is because, unlike windows-gnu, it does exist, and the only one who objected against windows-msvc and suggested to canonicalize windows-msvc into winnt was Po Lu, but the arguments provided against windows-msvc were not convincing.
Re: Rethinking configuration tuples
On Thu, Sep 14, 2023 at 12:55:06AM -0400, John Ericson wrote: > OK here we go: > > 1. https://github.com/ericson2314/gnu-config/commit/windows-mingnu.patch > 2. https://github.com/ericson2314/gnu-config/commit/windows-mingw.patch > 3. https://github.com/ericson2314/gnu-config/commit/no-windows-gnu.patch > > I tried to honestly argue for each of them the best I could in the > commit message. I know I prefer (1); I am guessing Jacob prefers (2), > and Po Lu prefers (3). > > Have fun, Dmitry :). I'm inclined to remove windows-gnu from config.sub instead of renaming or canonicalizing it because, firstly, there is no GNU libc on windows, and, secondly, windows-gnu as used by LLVM means MinGW, but for that we already have mingw*, and we should avoid adding new canonical names for the same thing. We could add canonicalization of windows-mingw* into mingw*, but if nobody uses the former, why bother? At the same time, I'm inclined to leave windows-msvc as is because, unlike windows-gnu, it does exist, and the only one who objected against windows-msvc and suggested to canonicalize windows-msvc into winnt was Po Lu, but the arguments provided against windows-msvc were not convincing. -- ldv
Re: Rethinking configuration tuples
John Ericson writes: > I used to do that, but see commit > f0f728324021f38b0d31de399b9974535300167c : Dmitry opted to switch to > just using Git's commit messages as the source of truth, and providing > a make rule to generate the ChangeLog. > > The document you linked endorses such a choice, saying > >> Projects that maintain such VCS repositories can decide not to >> maintain separate change log files, and instead rely on the VCS to >> keep the change log. >> If you decide not to maintain separate change log files, you should >> still consider providing them in the release tarballs [...]. > > I think doing this is a fine decision. The text you quoted means that you are meant to record the individual ChangeLog entries within each VCS log message, rather than updating a separate ChangeLog file with each check-in. These are subsequently reproduced in the generated ChangeLog file. It does not excuse you from writing log entries! Refer to Emacs commit messages: https://git.savannah.gnu.org/cgit/emacs.git/log?h=master for canonical examples of such log messages.
Re: Rethinking configuration tuples
I used to do that, but see commit f0f728324021f38b0d31de399b9974535300167c : Dmitry opted to switch to just using Git's commit messages as the source of truth, and providing a make rule to generate the ChangeLog. The document you linked endorses such a choice, saying Projects that maintain such VCS repositories can decide not to maintain separate change log files, and instead rely on the VCS to keep the change log. If you decide not to maintain separate change log files, you should still consider providing them in the release tarballs [...]. I think doing this is a fine decision. John On 9/14/23 01:37, Po Lu wrote: John Ericson writes: I had meant to just deal with windows-gnu in those 3 options, otherwise we have a combinatorial explosion of patches (and commit messages) for me to write :). Once we deal with that one we can deal with the others, right? Incidentally, if you want to make it easier for others to interpret your patches, please provide ChangeLog entries along with them. Refer to `(standards)Change Logs': https://www.gnu.org/prep/standards/standards.html#Change-Logs
Re: Rethinking configuration tuples
John Ericson writes: > I had meant to just deal with windows-gnu in those 3 options, > otherwise we have a combinatorial explosion of patches (and commit > messages) for me to write :). Once we deal with that one we can deal > with the others, right? Incidentally, if you want to make it easier for others to interpret your patches, please provide ChangeLog entries along with them. Refer to `(standards)Change Logs': https://www.gnu.org/prep/standards/standards.html#Change-Logs
Re: Rethinking configuration tuples
I had meant to just deal with windows-gnu in those 3 options, otherwise we have a combinatorial explosion of patches (and commit messages) for me to write :). Once we deal with that one we can deal with the others, right? John On 9/14/23 01:00, Po Lu wrote: John Ericson writes: OK here we go: 1 https://github.com/ericson2314/gnu-config/commit/windows-mingnu.patch 2 https://github.com/ericson2314/gnu-config/commit/windows-mingw.patch 3 https://github.com/ericson2314/gnu-config/commit/no-windows-gnu.patch I tried to honestly argue for each of them the best I could in the commit message. I know I prefer (1); I am guessing Jacob prefers (2), and Po Lu prefers (3). I prefer eliminating windows-msvc too. It's also a misnomer, and we already have *-winnt*, which represents MSVC.
Re: Rethinking configuration tuples
John Ericson writes: > OK here we go: > > 1 https://github.com/ericson2314/gnu-config/commit/windows-mingnu.patch > 2 https://github.com/ericson2314/gnu-config/commit/windows-mingw.patch > 3 https://github.com/ericson2314/gnu-config/commit/no-windows-gnu.patch > > I tried to honestly argue for each of them the best I could in the commit > message. I know I prefer (1); I am guessing Jacob prefers (2), > and Po Lu prefers (3). I prefer eliminating windows-msvc too. It's also a misnomer, and we already have *-winnt*, which represents MSVC.
Re: Rethinking configuration tuples
OK here we go: 1. https://github.com/ericson2314/gnu-config/commit/windows-mingnu.patch 2. https://github.com/ericson2314/gnu-config/commit/windows-mingw.patch 3. https://github.com/ericson2314/gnu-config/commit/no-windows-gnu.patch I tried to honestly argue for each of them the best I could in the commit message. I know I prefer (1); I am guessing Jacob prefers (2), and Po Lu prefers (3). Have fun, Dmitry :). I suppose rather than just idly speculating on how nice it would be to standardize with LLVM, this might be a good time to actually post to their Discourse instance and solicit feedback. If anyone else agrees I will happily do so. Cheers, John On 9/11/23 17:55, John Ericson wrote: I can submit two patches (effectively amending my prior, landed patch) with options that I think people would prefer. Will do that shortly. On 9/11/23 17:53, Dmitry V. Levin wrote: Hi, On Mon, Sep 11, 2023 at 10:11:39AM +0800, Po Lu wrote: Where are the config maintainers? Karl Barry and company? (I don't remember his e-mail nor do I have it at hand.) I would expect them to be actively reading this list, but instead my original request has been left twisting in the wind. I'm the maintainer and I'm actively reading this list now, a bit surprised to find so many messages at this time of year. :) Apparently, you don't quite like commit 91f6a7f616b161c25ba2001861a40e662e18c4ad that added $cpu-$vendor-windows-{gnu,msvc} support to config.sub, but I'm not sure I understood what exactly do you suggest to change in this case.
Re: Rethinking configuration tuples
Oops I had this email as draft and didn't hit send. The conversation has moved on since a bit, but I'll send it anyways. John On 9/6/23 19:46, Jacob Bachmeyer wrote: The problem is that system(3) probably /does/ exist in that configuration, but such a system is only usable as a cross-compilation target, since building the GNU tools requires a shell. Agree! This is exactly what I mean of "ops time" vs build time question: if we do have system(3) we don't know what shell it will be hooked up to in general (cross compilation is the general case). I would say even if we are native compiling, and we find that the shell via system(3) supports x y z, we shouldn't bake the results of a configure-time check for that in at build time --- the installed binary might be copied to another system with the same libc but a different shell, and then system(3) would do something else. I think configs are mainly useful for the host and target systems, and thus should focus on information needed for them. Sounds like we agree that "do we have a shell [that can do x y z]" is question that is fine to ask of the build platform, but not really appropriate to ask of the host or target platform. Thus, things like "presence of shell" are not really good to include in configs. This is all to say that telling apart OSes (as opposed to merely libcs) seems too fraught to configs for me. Maybe this could go in the config per the "arbitrary many components, finer distinctions to the right" "converging sequence" approach, but then I would want this further to the right, e.g. aarch64-unknown-musl-noshell not aarch64-unknown-noshell-musl. The idea of lacking a shell was intended as an example of something that would make a Free system /very/ different from the GNU system, enough that it cannot be considered a GNU variant. In other words, a Linux-based system that is clearly /not/ GNU/Linux. Agreed it is not GNU/Linux. But I think anything that uses glibc should use "gnu" in the config because configs have evolved to be about libc more than OS. The choice of system service management is orthogonal to this, since it has minimal impact on user programs. (Unless systemd gets even more outrageously invasive...) Agreed, just wanted to double check. Of course, if systemd *does* get sufficiently outrageously invasive, we might need a *-*-linux-systemd-glibc tuple... (Since systemd gleefully makes extensive use of Linux-kernel-specific features, it cannot possibly be a standard on the GNU system, which supports multiple Free kernels.) Yes I agree systemd probably can't be "bonafide GNU OS", but I take the opposite conclusion that this is evidence for the "gnu" for glibc is more important than the "gnu" for "true GNU OS". In this hypothetical (that needs to *stay* hypothetical) example, "systemd" has somehow become an "OS" distinct from the GNU system. Yes. Except configure usually does not need a "fully disambiguated" form---the canonical form produced by config.sub is fine, since configure is usually matching against the full tuple using shell case patterns. The flat list with a defined order is optimal for this strategy, since it allows to easily check for the presence of any tag or combination of tags. Shell case patterns can be a bit of a footgun. For example, a common mistake is doing * instead of *-*. If the allowed pattern elements are sufficiently unambiguous, there is no mistake, since `*' matches text including `-'. In fact, when testing n "is tag FOO present?" predicate `*-foo-* | *-foo' would be correct. (I assume that a CPU type will remain required and will remain first in the list.) Sorry I meant as part of a larger pattern. With things like *-stuff-* vs *-*-stuff-*-*, the extra dashes are needed to make sure "stuff" matches the right component, and even then it only works if one knows the exact number of components (which can be accomplished by *-*... and the ordering of patterns). It is quite subtle! For the "converging sequence" model, omitting the extra dashes is important, since the number of tags prior to a "floating tag" can vary. (I would actually suggest making "gnu" such a floating tag in this model, with an exact definition to be obtained from a later discussion that would need to include RMS.) Yeah I just mean the more components we have, the more a "sparse" representation is desirable, vs `something--another_thing` explicitly skipping things, because the latter is so annoying (needing to count). But that also creates ambiguities. Allow the hypothetical --parse option to accept a PREFIX argument and you are pretty much there: $ ./config.sub --parse=host x86_64-linux-gnu host=x86_64-pc-linux-gnu host_cpu=x86_64 host_vendor=pc host_kernel=linux host_os=gnu $ That form should be both easily parsed by other tools and suitable for `eval` in shell scripts. Yup! We're in agreement. I agree testing is more robust,
Re: Rethinking configuration tuples
Quoting Po Lu (2023-08-24 21:18:13) > People, the nature and widespread use of config.* precludes any efforts > aimed at ``rethinking'' the tuples they accept and generate. +1
Re: Rethinking configuration tuples
"Dmitry V. Levin" writes: > Hi, > > On Mon, Sep 11, 2023 at 10:11:39AM +0800, Po Lu wrote: >> Where are the config maintainers? Karl Barry and company? >> (I don't remember his e-mail nor do I have it at hand.) >> >> I would expect them to be actively reading this list, but instead my >> original request has been left twisting in the wind. > > I'm the maintainer and I'm actively reading this list now, > a bit surprised to find so many messages at this time of year. :) > > Apparently, you don't quite like commit > 91f6a7f616b161c25ba2001861a40e662e18c4ad that added > $cpu-$vendor-windows-{gnu,msvc} support to config.sub, but I'm not sure > I understood what exactly do you suggest to change in this case. To either revert the change, or to canonicalize them to CPU-VENDOR-mingw* and CPU-VENDOR-winnt* respectively. Neither `gnu' nor `msvc' are appropriate for the operating system field.
Re: Rethinking configuration tuples
"Zack Weinberg" writes: > If you could provide me a reference to your original request (e.g. URL > in the mailing list archive) I will undertake to get it done. If I try > to find it myself I'm afraid I will pick the wrong thing. > > If there is a specific git commit or commits you want reverted, the full > hashes of those commits would also be very helpful. > > zw This commit: https://git.savannah.gnu.org/cgit/config.git/commit/?id=91f6a7f616b161c25ba2001861a40e662e18c4ad should be reverted or modified to canonicalize such invalid tuples into either *-windows-mingw* or *-windows-winnt*, which they are intended to represent. Thanks.
Re: Rethinking configuration tuples
I can submit two patches (effectively amending my prior, landed patch) with options that I think people would prefer. Will do that shortly. On 9/11/23 17:53, Dmitry V. Levin wrote: Hi, On Mon, Sep 11, 2023 at 10:11:39AM +0800, Po Lu wrote: Where are the config maintainers? Karl Barry and company? (I don't remember his e-mail nor do I have it at hand.) I would expect them to be actively reading this list, but instead my original request has been left twisting in the wind. I'm the maintainer and I'm actively reading this list now, a bit surprised to find so many messages at this time of year. :) Apparently, you don't quite like commit 91f6a7f616b161c25ba2001861a40e662e18c4ad that added $cpu-$vendor-windows-{gnu,msvc} support to config.sub, but I'm not sure I understood what exactly do you suggest to change in this case.
Re: Rethinking configuration tuples
Hi, On Mon, Sep 11, 2023 at 10:11:39AM +0800, Po Lu wrote: > Where are the config maintainers? Karl Barry and company? > (I don't remember his e-mail nor do I have it at hand.) > > I would expect them to be actively reading this list, but instead my > original request has been left twisting in the wind. I'm the maintainer and I'm actively reading this list now, a bit surprised to find so many messages at this time of year. :) Apparently, you don't quite like commit 91f6a7f616b161c25ba2001861a40e662e18c4ad that added $cpu-$vendor-windows-{gnu,msvc} support to config.sub, but I'm not sure I understood what exactly do you suggest to change in this case. -- ldv
Re: Rethinking configuration tuples
On Sun, Sep 10, 2023, at 10:11 PM, Po Lu wrote: > Where are the config maintainers? Karl Barry and company? (I don't > remember his e-mail nor do I have it at hand.) Karl Berry is the Automake maintainer. I'm not sure if there *is* an official config.* maintainer. The person most appropriately described as the de facto maintainer is probably Dmitry V. Levin. > I would expect them to be actively reading this list, but instead my > original request has been left twisting in the wind. If you could provide me a reference to your original request (e.g. URL in the mailing list archive) I will undertake to get it done. If I try to find it myself I'm afraid I will pick the wrong thing. If there is a specific git commit or commits you want reverted, the full hashes of those commits would also be very helpful. zw
Re: Rethinking configuration tuples
Where are the config maintainers? Karl Barry and company? (I don't remember his e-mail nor do I have it at hand.) I would expect them to be actively reading this list, but instead my original request has been left twisting in the wind.
Re: Rethinking configuration tuples
Zack Weinberg wrote: I haven't been following this long discussion very closely but I do have some opinions (with my "de facto autoconf maintainer" hat on): 1. As a general rule, it is not safe to change the canonicalization (i.e. the config.sub output) of an existing system name, *at all*; in many cases, not even if it is wrong. I find that people working on GNU tools often don't realize just how broadly used these names are. Changing the canonicalization of "CPU-VENDOR-mingw32", for example, is very likely to break things like Ansible playbooks and Travis-style CI build matrices -- one-off files that exist by the tens of thousands and there's no practical way to *enumerate* them all, let alone get them all changed to satisfy a GNU-internal desire for a more consistent naming convention. Perhaps I have been misunderstood; I have been suggesting to change our interpretation but to keep all existing tuples as they are. I am very much aware of this issue. *Very recently introduced* names can be adjusted to correct technical errors. For example, "CPU-VENDOR-windows-gnu" is a misnomer IMHO as there is no GNU libc port to Windows (see below); config.guess should not produce it and config.sub should not convert anything into it. But if the patch that had introduced this mistake were more than a few months old, we would be stuck with it, permanently. Fortunately, this particular error was caught relatively quickly. 2. We should avoid adding any more information to canonical system names. Things like the availability of Bourne shell, which of the several available implementations of "init" (Unix PID 1) is in use, etc. should be handled with Autoconf-style feature probes. Yes, it's difficult to run ./configure if you don't have a Bourne shell, but I suspect most of the environments where that's an issue are used primarily as cross-compilation targets rather than native-build hosts. A platform without a Bourne shell is (as far as the GNU build system is concerned) only usable as a cross-compilation target. Issues like shell availability or choice of init(8) are a reasonable use for the "OS" field, where an operating system tag is essentially a gestalt summary of the target environment. The combinatorial explosion that would cause in modern use is a different issue. My suggested place to draw the line is, if you reasonably need a cross-compiler targeting A to be different from a cross-compiler targeting B, then the distinction between A and B can go in the canonical system name; if you don't, then it shouldn't. This should be pretty close to existing practice (because that's exactly how GCC uses CSNs, via ./configure --target) and should give us concrete reasons to make a decision in each case. Agreed that calling the third field "operating system" is a holdover from a past where that actually mattered and operating systems were proprietary monoliths. This also provides a good first guess at a limit for what environment details should be in an CSN and what should not: if the same cross-compiler targets both environments, they should have the same CSN. However, a system with both GNU libc and Musl libc could possibly use GCC's multilib facility instead of separate instances of the compiler, so multilib targets probably need some form of disambiguation. [...] 3. I like the idea of a "--parseable" option to config.sub/guess that make them spit out something easier to parse. My preferred syntax would be a newline- or semicolon-separated sequence of Bourne shell assignment statements, because, if there was also a way to ask config.sub/guess to add a prefix to every variable name, that would let Autoconf scripts process the output with `eval` rather than the nasty bit of parser goo we have now (_AC_CANONICAL_SPLIT, https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/general.m4#n1987). It would need to be something like $ ./config.guess aarch64-unknown-linux-gnu $ ./config.guess --prefix=host --parseable host_cpu=aarch64 host_vendor=unknown host_os=linux-gnu It would be OK to introduce additional key=value pairs at that point (kernel, abi, etc), but the existing three (cpu, vendor, os) need to keep emitting exactly what they do now. I was proposing adding a --parse option only to config.sub to avoid code duplication. I also do not think of this as a "parseable" form but as a pre-parsed form. I disagree with using --prefix here when --parse could easily accept that same prefix as its optional argument, especially since config.{sub,guess} are in such close proximity to configure, which uses --prefix for a very different purpose. 4. We should deemphasize and possibly explicitly deprecate the vendor component of a CSN. Nowadays, in my experience, it just confuses people. The problem is that VENDOR was actually important in the dim past and could still be useful in some contexts today (I expect it to be
Re: Rethinking configuration tuples
I'd note that I don't see "rethinking target tuples" as changing how any given name is assigned, but rather changing how they are defined and how they are thought about. We wouldn't break anything by changing the fourth field to mean "Environment" rather than "Operating System", to be more well-defined - every existing tuple would still be the same, and even some existing erroneous ones would be validated rather than existing in a state of being incorrect, but impossible to change. Any tuple with `elf` as the final component, for example, would be correct as an Environment, not as an Operating System, and now those existing tuples would be sound, and not just "hanging on because things break if they cease to exist". On Sun, 10 Sept 2023 at 20:56, Po Lu wrote: > "Zack Weinberg" writes: > > > I haven't been following this long discussion very closely but I do > > have some opinions (with my "de facto autoconf maintainer" hat on): > > > > 1. As a general rule, it is not safe to change the canonicalization > > (i.e. the config.sub output) of an existing system name, *at all*; in > > many cases, not even if it is wrong. I find that people working on GNU > > tools often don't realize just how broadly used these names > > are. Changing the canonicalization of "CPU-VENDOR-mingw32", for > > example, is very likely to break things like Ansible playbooks and > > Travis-style CI build matrices -- one-off files that exist by the tens > > of thousands and there's no practical way to *enumerate* them all, let > > alone get them all changed to satisfy a GNU-internal desire for a more > > consistent naming convention. > > > > *Very recently introduced* names can be adjusted to correct technical > > errors. For example, "CPU-VENDOR-windows-gnu" is a misnomer IMHO as > > there is no GNU libc port to Windows (see below); config.guess should > > not produce it and config.sub should not convert anything into it. > > But if the patch that had introduced this mistake were more than a few > > months old, we would be stuck with it, permanently. > > This mistake is only two months old, thankfully. I believe it can be > corrected without consequence. >
Re: Rethinking configuration tuples
"Zack Weinberg" writes: > I haven't been following this long discussion very closely but I do > have some opinions (with my "de facto autoconf maintainer" hat on): > > 1. As a general rule, it is not safe to change the canonicalization > (i.e. the config.sub output) of an existing system name, *at all*; in > many cases, not even if it is wrong. I find that people working on GNU > tools often don't realize just how broadly used these names > are. Changing the canonicalization of "CPU-VENDOR-mingw32", for > example, is very likely to break things like Ansible playbooks and > Travis-style CI build matrices -- one-off files that exist by the tens > of thousands and there's no practical way to *enumerate* them all, let > alone get them all changed to satisfy a GNU-internal desire for a more > consistent naming convention. > > *Very recently introduced* names can be adjusted to correct technical > errors. For example, "CPU-VENDOR-windows-gnu" is a misnomer IMHO as > there is no GNU libc port to Windows (see below); config.guess should > not produce it and config.sub should not convert anything into it. > But if the patch that had introduced this mistake were more than a few > months old, we would be stuck with it, permanently. This mistake is only two months old, thankfully. I believe it can be corrected without consequence.
Re: Rethinking configuration tuples
I haven't been following this long discussion very closely but I do have some opinions (with my "de facto autoconf maintainer" hat on): 1. As a general rule, it is not safe to change the canonicalization (i.e. the config.sub output) of an existing system name, *at all*; in many cases, not even if it is wrong. I find that people working on GNU tools often don't realize just how broadly used these names are. Changing the canonicalization of "CPU-VENDOR-mingw32", for example, is very likely to break things like Ansible playbooks and Travis-style CI build matrices -- one-off files that exist by the tens of thousands and there's no practical way to *enumerate* them all, let alone get them all changed to satisfy a GNU-internal desire for a more consistent naming convention. *Very recently introduced* names can be adjusted to correct technical errors. For example, "CPU-VENDOR-windows-gnu" is a misnomer IMHO as there is no GNU libc port to Windows (see below); config.guess should not produce it and config.sub should not convert anything into it. But if the patch that had introduced this mistake were more than a few months old, we would be stuck with it, permanently. 2. We should avoid adding any more information to canonical system names. Things like the availability of Bourne shell, which of the several available implementations of "init" (Unix PID 1) is in use, etc. should be handled with Autoconf-style feature probes. Yes, it's difficult to run ./configure if you don't have a Bourne shell, but I suspect most of the environments where that's an issue are used primarily as cross-compilation targets rather than native-build hosts. My suggested place to draw the line is, if you reasonably need a cross-compiler targeting A to be different from a cross-compiler targeting B, then the distinction between A and B can go in the canonical system name; if you don't, then it shouldn't. This should be pretty close to existing practice (because that's exactly how GCC uses CSNs, via ./configure --target) and should give us concrete reasons to make a decision in each case. For example, this rule says that the combination of Linux kernel with musl libc should be identified as "CPU-VENDOR-linux-musl", not "CPU-VENDOR-linux-gnu-musl", regardless of whether the overall system uses other GNU components. This is because the presence or absence of GNU libc *does* affect cross-compilation of C programs, but the presence or absence of other GNU software doesn't. [Note: I don't know whether RMS has said anything about this, and if he has, I don't care.] A compiled language *other than* the C family might, in the future, want us to make a distinction between cross-compilation targets that existing CSNs do not capture, but we can worry about that when it actually happens. 3. I like the idea of a "--parseable" option to config.sub/guess that make them spit out something easier to parse. My preferred syntax would be a newline- or semicolon-separated sequence of Bourne shell assignment statements, because, if there was also a way to ask config.sub/guess to add a prefix to every variable name, that would let Autoconf scripts process the output with `eval` rather than the nasty bit of parser goo we have now (_AC_CANONICAL_SPLIT, https://git.savannah.gnu.org/cgit/autoconf.git/tree/lib/autoconf/general.m4#n1987). It would need to be something like $ ./config.guess aarch64-unknown-linux-gnu $ ./config.guess --prefix=host --parseable host_cpu=aarch64 host_vendor=unknown host_os=linux-gnu It would be OK to introduce additional key=value pairs at that point (kernel, abi, etc), but the existing three (cpu, vendor, os) need to keep emitting exactly what they do now. 4. We should deemphasize and possibly explicitly deprecate the vendor component of a CSN. Nowadays, in my experience, it just confuses people. zw
Re: Rethinking configuration tuples
John Ericson wrote: On 8/30/23 22:24, Jacob Bachmeyer wrote: John Ericson wrote: Err I mean, is there am example of a *-*-linux-$nongnu-musl? I would expect that to name an embedded environment using Musl libc and the Linux kernel, but that is not a full system. (Example: may not even have a shell at all) I suppose except for the system() function in libc, I would consider this a distinction not needed for configs. The choice of what other programs to run (be they init or shell) feels to me not like a build-time / development configuration decision, but a a runtime / ops configuration decision. They aren't "viral" decisions in the way that the choice of libc is (since all shared objects that may be combined together need to agree on their deps, most notably libc). The problem is that system(3) probably /does/ exist in that configuration, but such a system is only usable as a cross-compilation target, since building the GNU tools requires a shell. Maybe this could go in the config per the "arbitrary many components, finer distinctions to the right" "converging sequence" approach, but then I would want this further to the right, e.g. aarch64-unknown-musl-noshell not aarch64-unknown-noshell-musl. The idea of lacking a shell was intended as an example of something that would make a Free system /very/ different from the GNU system, enough that it cannot be considered a GNU variant. In other words, a Linux-based system that is clearly /not/ GNU/Linux. The choice of system service management is orthogonal to this, since it has minimal impact on user programs. (Unless systemd gets even more outrageously invasive...) Agreed, just wanted to double check. Of course, if systemd *does* get sufficiently outrageously invasive, we might need a *-*-linux-systemd-glibc tuple... (Since systemd gleefully makes extensive use of Linux-kernel-specific features, it cannot possibly be a standard on the GNU system, which supports multiple Free kernels.) Yes I agree systemd probably can't be "bonafide GNU OS", but I take the opposite conclusion that this is evidence for the "gnu" for glibc is more important than the "gnu" for "true GNU OS". In this hypothetical (that needs to *stay* hypothetical) example, "systemd" has somehow become an "OS" distinct from the GNU system. Except configure usually does not need a "fully disambiguated" form---the canonical form produced by config.sub is fine, since configure is usually matching against the full tuple using shell case patterns. The flat list with a defined order is optimal for this strategy, since it allows to easily check for the presence of any tag or combination of tags. Shell case patterns can be a bit of a footgun. For example, a common mistake is doing * instead of *-*. If the allowed pattern elements are sufficiently unambiguous, there is no mistake, since `*' matches text including `-'. In fact, when testing n "is tag FOO present?" predicate `*-foo-* | *-foo' would be correct. (I assume that a CPU type will remain required and will remain first in the list.) Sorry I meant as part of a larger pattern. With things like *-stuff-* vs *-*-stuff-*-*, the extra dashes are needed to make sure "stuff" matches the right component, and even then it only works if one knows the exact number of components (which can be accomplished by *-*... and the ordering of patterns). It is quite subtle! For the "converging sequence" model, omitting the extra dashes is important, since the number of tags prior to a "floating tag" can vary. (I would actually suggest making "gnu" such a floating tag in this model, with an exact definition to be obtained from a later discussion that would need to include RMS.) Allow the hypothetical --parse option to accept a PREFIX argument and you are pretty much there: $ ./config.sub --parse=host x86_64-linux-gnu host=x86_64-pc-linux-gnu host_cpu=x86_64 host_vendor=pc host_kernel=linux host_os=gnu $ That form should be both easily parsed by other tools and suitable for `eval` in shell scripts. Yup! We're in agreement. I agree testing is more robust, but for better or worse I still do see scripts using those host_* variables mentioned above. (Testing is possible but requires more care to get right for cross-compilation, for one.) In this case the test is `case $host in ... esac`. I would say it is better to case on (combinations of `host_*` variables than `$host`, because then knows exactly what components are being cased upon; there is no ambiguity. I think one should basically only use `host` as a block-box identifier (e.g. prefixing binaries) and and other time one would like to use `host` they should use the `host_*` variables instead. This comes back to the "converging sequence" model issue: what to do with the "floating tags" that are not in fixed fields? The problem is still getting it /into/ config.sub: config.sub expects a single command-line argument,
Re: Rethinking configuration tuples
On 8/30/23 22:24, Jacob Bachmeyer wrote: John Ericson wrote: Err I mean, is there am example of a *-*-linux-$nongnu-musl? I would expect that to name an embedded environment using Musl libc and the Linux kernel, but that is not a full system. (Example: may not even have a shell at all) I suppose except for the system() function in libc, I would consider this a distinction not needed for configs. The choice of what other programs to run (be they init or shell) feels to me not like a build-time / development configuration decision, but a a runtime / ops configuration decision. They aren't "viral" decisions in the way that the choice of libc is (since all shared objects that may be combined together need to agree on their deps, most notably libc). Maybe this could go in the config per the "arbitrary many components, finer distinctions to the right" "converging sequence" approach, but then I would want this further to the right, e.g. aarch64-unknown-musl-noshell not aarch64-unknown-noshell-musl. The choice of system service management is orthogonal to this, since it has minimal impact on user programs. (Unless systemd gets even more outrageously invasive...) Agreed, just wanted to double check. Of course, if systemd *does* get sufficiently outrageously invasive, we might need a *-*-linux-systemd-glibc tuple... (Since systemd gleefully makes extensive use of Linux-kernel-specific features, it cannot possibly be a standard on the GNU system, which supports multiple Free kernels.) Yes I agree systemd probably can't be "bonafide GNU OS", but I take the opposite conclusion that this is evidence for the "gnu" for glibc is more important than the "gnu" for "true GNU OS". Except configure usually does not need a "fully disambiguated" form---the canonical form produced by config.sub is fine, since configure is usually matching against the full tuple using shell case patterns. The flat list with a defined order is optimal for this strategy, since it allows to easily check for the presence of any tag or combination of tags. Shell case patterns can be a bit of a footgun. For example, a common mistake is doing * instead of *-*. If the allowed pattern elements are sufficiently unambiguous, there is no mistake, since `*' matches text including `-'. In fact, when testing n "is tag FOO present?" predicate `*-foo-* | *-foo' would be correct. (I assume that a CPU type will remain required and will remain first in the list.) Sorry I meant as part of a larger pattern. With things like *-stuff-* vs *-*-stuff-*-*, the extra dashes are needed to make sure "stuff" matches the right component, and even then it only works if one knows the exact number of components (which can be accomplished by *-*... and the ordering of patterns). It is quite subtle! Allow the hypothetical --parse option to accept a PREFIX argument and you are pretty much there: $ ./config.sub --parse=host x86_64-linux-gnu host=x86_64-pc-linux-gnu host_cpu=x86_64 host_vendor=pc host_kernel=linux host_os=gnu $ That form should be both easily parsed by other tools and suitable for `eval` in shell scripts. Yup! We're in agreement. I agree testing is more robust, but for better or worse I still do see scripts using those host_* variables mentioned above. (Testing is possible but requires more care to get right for cross-compilation, for one.) In this case the test is `case $host in ... esac`. I would say it is better to case on (combinations of `host_*` variables than `$host`, because then knows exactly what components are being cased upon; there is no ambiguity. I think one should basically only use `host` as a block-box identifier (e.g. prefixing binaries) and and other time one would like to use `host` they should use the `host_*` variables instead. The problem is still getting it /into/ config.sub: config.sub expects a single command-line argument, while pre-parsed form spans a few lines. I don't think that is so hard. config.sub accepts --gnu-long-args already )without confusing them as configs) so we can simply do something like ./config.sub --pre-categorized cpu=x86_64 vendor=pc kernel=linux os=gnu and then there is no confusing the two forms of input. [...] I am not entirely certain why, but I know that there is some reason we call the common GNU/Linux systems *-*-linux-gnu instead of *-*-linux. To be honest, I think this is basically the "call it GNU/Linux not Linux" controversy --- i.e. at the time it was done for social not technical reasons. I don't mind, since now that we have multiple libcs there /is/ a technical reason to distinguish. But this circles back to my hunch that Kernel (syscall interface) + libc (ABI) determines OS uniquely enough for config.sub's purposes. That is possible, but still a valid reason for the GNU Project to stay with that angle. Yeah I have no problem with the term GNU/Linux, I just don't think "OS" is useful for
Re: Rethinking configuration tuples
John Ericson wrote: On 8/27/23 23:59, Jacob Bachmeyer wrote: [...] This is also the framework in which *-*-linux-gnu-musl makes sense for a system that uses Musl libc but is otherwise a GNU/Linux system. Right but again where do we draw the line? For example, can one use systemd and its large entourage of intertwined software, or must one use GNU Shepherd or System V init? In the case of *-*-linux-gnu and *-*-linux-gnu-musl, the difference is the C runtime library (GNU libc vs. Musl libc) such that shared objects linked for one ABI are not compatible with the other. If Musl libc were exactly 100% binary compatible with GNU libc, then there would be no *-*-linux-gnu-musl platform, since it would be indistinguishable from *-*-linux-gnu. Err I mean, is there am example of a *-*-linux-$nongnu-musl? I would expect that to name an embedded environment using Musl libc and the Linux kernel, but that is not a full system. (Example: may not even have a shell at all) [...] The choice of system service management is orthogonal to this, since it has minimal impact on user programs. (Unless systemd gets even more outrageously invasive...) Agreed, just wanted to double check. Of course, if systemd *does* get sufficiently outrageously invasive, we might need a *-*-linux-systemd-glibc tuple... (Since systemd gleefully makes extensive use of Linux-kernel-specific features, it cannot possibly be a standard on the GNU system, which supports multiple Free kernels.) Except configure usually does not need a "fully disambiguated" form---the canonical form produced by config.sub is fine, since configure is usually matching against the full tuple using shell case patterns. The flat list with a defined order is optimal for this strategy, since it allows to easily check for the presence of any tag or combination of tags. Shell case patterns can be a bit of a footgun. For example, a common mistake is doing * instead of *-*. If the allowed pattern elements are sufficiently unambiguous, there is no mistake, since `*' matches text including `-'. In fact, when testing an "is tag FOO present?" predicate `*-foo-* | *-foo' would be correct. (I assume that a CPU type will remain required and will remain first in the list.) I would rather case on disambiguated variables. Indeed, AC_CANONICAL_HOST computes host_cpu, host_vendor, and host_os for precisely that purpose. If config.sub could split out the disambiguated form, those variables could be defined more simply and robustly. Allow the hypothetical --parse option to accept a PREFIX argument and you are pretty much there: $ ./config.sub --parse=host x86_64-linux-gnu host=x86_64-pc-linux-gnu host_cpu=x86_64 host_vendor=pc host_kernel=linux host_os=gnu $ That form should be both easily parsed by other tools and suitable for `eval` in shell scripts. Note that config.sub is itself a shell script, and handling JSON in shell is a giant pain. The most we could reasonably do is what config.sub already does: determine each component as a separate variable and then output that by substituting text into a template. Yes I agree config.sub in its current form (must be highly portable across different Bourne-shell derivatives) has no hope of parsing JSON. It could output it or it could also output your ${key}=${value}\n format, and it could also consume your format. Your format is ideal for it! Adding a prefix to each key in the key=value format is trivial and would further help shell scripts that want to "parse by eval" but configure itself tests predicates rather than caring exactly what part of the configuration tuple means what. Put another way, configure is usually looking for a yes/no answer, so a pre-parsed form is less useful than a single string that can be used for pattern matches. I agree testing is more robust, but for better or worse I still do see scripts using those host_* variables mentioned above. (Testing is possible but requires more care to get right for cross-compilation, for one.) In this case the test is `case $host in ... esac`. There is no reasonable way to feed the key=value format /into/ config.sub: configuration tuples are hyphen-delimited lists. I think there is. The overall algorithm is roughly "(a) decide which component is which, (b) sanitize and normalize components decision to that decision". We would skip step (a) and go straight to step (b) in order to do this. This indicates part of the value of doing this: rather than just "system testing" the entirety of config.sub, we would now have something closer to a "unit test" of part of it in isolation. FWIW, this is similar to a rearranging the code to a support a mode where non-normal-form configs are rejected instead of normalized. The problem is still getting it /into/ config.sub: config.sub expects a single command-line argument, while pre-parsed form spans a few lines. [...] I am not entirely
Re: Rethinking configuration tuples
On 8/27/23 23:59, Jacob Bachmeyer wrote: >> I am OK with duck-typing, but what is "all meaningful ways"? Sure, POSIX is >> meaningful, the exact output of uname is not, etc. but where do we draw the >> line? > That is a question for which I do not currently have a certain answer. :/ Thanks, we'll keep trying to tease one out. >>> This is also the framework in which *-*-linux-gnu-musl makes sense for a >>> system that uses Musl libc but is otherwise a GNU/Linux system. >> >> Right but again where do we draw the line? For example, can one use systemd >> and its large entourage of intertwined software, or must one use GNU >> Shepherd or System V init? >> > > In the case of *-*-linux-gnu and *-*-linux-gnu-musl, the difference is the C > runtime library (GNU libc vs. Musl libc) such that shared objects linked for > one ABI are not compatible with the other. If Musl libc were exactly 100% > binary compatible with GNU libc, then there would be no *-*-linux-gnu-musl > platform, since it would be indistinguishable from *-*-linux-gnu. Err I mean, is there am example of a *-*-linux-$nongnu-musl? Agreed that if Musl was binary compatible with glibc, there would be no need to distinguish at the config level. > The choice of system service management is orthogonal to this, since it has > minimal impact on user programs. (Unless systemd gets even more outrageously > invasive...) Agreed, just wanted to double check. > Except configure usually does not need a "fully disambiguated" form---the > canonical form produced by config.sub is fine, since configure is usually > matching against the full tuple using shell case patterns. The flat list > with a defined order is optimal for this strategy, since it allows to easily > check for the presence of any tag or combination of tags. Shell case patterns can be a bit of a footgun. For example, a common mistake is doing * instead of *-*. I would rather case on disambiguated variables. Indeed, AC_CANONICAL_HOST computes host_cpu, host_vendor, and host_os for precisely that purpose. If config.sub could split out the disambiguated form, those variables could be defined more simply and robustly. > Note that config.sub is itself a shell script, and handling JSON in shell is > a giant pain. The most we could reasonably do is what config.sub already > does: determine each component as a separate variable and then output that > by substituting text into a template. >> Yes I agree config.sub in its current form (must be highly portable across >> different Bourne-shell derivatives) has no hope of parsing JSON. It could >> output it or it could also output your ${key}=${value}\n format, and it >> could also consume your format. Your format is ideal for it! > Adding a prefix to each key in the key=value format is trivial and would > further help shell scripts that want to "parse by eval" but configure itself > tests predicates rather than caring exactly what part of the configuration > tuple means what. Put another way, configure is usually looking for a yes/no > answer, so a pre-parsed form is less useful than a single string that can be > used for pattern matches. I agree testing is more robust, but for better or worse I still do see scripts using those host_* variables mentioned above. (Testing is possible but requires more care to get right for cross-compilation, for one.) > There is no reasonable way to feed the key=value format *into* config.sub: > configuration tuples are hyphen-delimited lists. I think there is. The overall algorithm is roughly "(a) decide which component is which, (b) sanitize and normalize components decision to that decision". We would skip step (a) and go straight to step (b) in order to do this. This indicates part of the value of doing this: rather than just "system testing" the entirety of config.sub, we would now have something closer to a "unit test" of part of it in isolation. FWIW, this is similar to a rearranging the code to a support a mode where non-normal-form configs are rejected instead of normalized. > Producing key=value format using config.sub's knowledge of valid tuples might > be reasonable for *other* systems to use instead of needing their own parsers. Yes it is definitely necessary for that, and that is a good use-case for sure. >>> Thank you; as I mentioned above, the goal is to best support heterogeneous >>> multi-arch systems, but recognizing a tension here. For configure, the >>> configuration tuple should not contain information that can be determined >>> by testing, but for storing multiple binary sets, ABIs do need to be part >>> of the name, even if they can be determined by configure tests. >> >> Agreed configure tests are better for the "long tail" of other attributes. >> (IMO if we were to define "operating system", it would be something like the >> "limit" of all configure checks.) >> >> But a big part of my "kernel-libc" thinking (and I think also Connor's) is
Re: Rethinking configuration tuples
John Ericson wrote: On 8/27/23 01:06, Jacob Bachmeyer wrote: [...] Ah sorry, I shouldn't have made reference to JSON at all --- what I really was getting at is the /abstract syntax/. In particular, rather than having an abstract syntax of "list of strings" (parsing today's concrete syntax by breaking on dash), where the meaning of each string is ambiguous / context-sensative, we have of "keys mapped to enumerations", i.e. one always knows the meaning of each component explicitly / without inspecting it or its context. JSON or your flat list in canonical ordering (where I assume we are careful to never skip a type of component) are both valid concrete syntaxes that can be parsed / printed from this abstract syntax. JSON is far too complicated to use here, except possibly as a "pre-parsed" form that config.sub could output on request for programs that want a structured form instead of parsing the tuple themselves. But for that case, why use JSON instead of a trivial multi-line key=value format? Hypothetical Example: $ config.sub --parse x86_64-linux-gnu cpu=x86_64 vendor=pc kernel=linux os=gnu $ Note that this example both canonicalizes and parses. Yes that looks great to me. This shares the abstract syntax with what I had in mind, and anything that understands JSON can easily convert back and forth between the two. I argue for "duck-typing" here from the user's perspective: if and only if the system in all meaningful ways appears to be the GNU system, there should be a *-gnu* somewhere in the configuration tuple. I am OK with duck-typing, but what is "all meaningful ways"? Sure, POSIX is meaningful, the exact output of uname is not, etc. but where do we draw the line? That is a question for which I do not currently have a certain answer. :/ This is also the framework in which *-*-linux-gnu-musl makes sense for a system that uses Musl libc but is otherwise a GNU/Linux system. Right but again where do we draw the line? For example, can one use systemd and its large entourage of intertwined software, or must one use GNU Shepherd or System V init? In the case of *-*-linux-gnu and *-*-linux-gnu-musl, the difference is the C runtime library (GNU libc vs. Musl libc) such that shared objects linked for one ABI are not compatible with the other. If Musl libc were exactly 100% binary compatible with GNU libc, then there would be no *-*-linux-gnu-musl platform, since it would be indistinguishable from *-*-linux-gnu. The choice of system service management is orthogonal to this, since it has minimal impact on user programs. (Unless systemd gets even more outrageously invasive...) [...] I still oppose JSON because it is way too verbose for this: configuration tuples need to be both expressive and simple enough to type at a shell prompt as arguments to configure. Using JSON by default would also be a very nasty "flag day" that would break all existing programs that use config.sub. Perhaps config.sub could accept an --as=json parameter for JSON output? Yes exactly, JSON is a no-go for prefixed binaries, but probably better for things like Autoconf which needs to parse the output of config.sub either way. No, because Autoconf uses the shell and JSON is a [*profanity elided*] to parse using shell constructs. A flat list of hyphen-delimited tags is almost ideal for the parsing that configure needs to do. In fact, with a few restrictions (met by using canonical ordering) this is what configure /already/ parses. Oops, yes I was being sloppy confusing concrete and abstract syntax again. Sorry! I think while that for something like Meson or CMake JSON could be better, for Autoconf your ${key}=${value}\n format is perfect. Easy to parse and fully disambiguated. And of course, GNU config should care more about Autoconf than Meson or CMake. Except configure usually does not need a "fully disambiguated" form---the canonical form produced by config.sub is fine, since configure is usually matching against the full tuple using shell case patterns. The flat list with a defined order is optimal for this strategy, since it allows to easily check for the presence of any tag or combination of tags. Note that config.sub is itself a shell script, and handling JSON in shell is a giant pain. The most we could reasonably do is what config.sub already does: determine each component as a separate variable and then output that by substituting text into a template. Yes I agree config.sub in its current form (must be highly portable across different Bourne-shell derivatives) has no hope of parsing JSON. It could output it or it could also output your ${key}=${value}\n format, and it could also consume your format. Your format is ideal for it! Adding a prefix to each key in the key=value format is trivial and would further help shell scripts that want to "parse by eval" but configure itself tests predicates rather than caring
Re: Rethinking configuration tuples
On 8/27/23 01:06, Jacob Bachmeyer wrote: As I understand the history, Linux was the first clearly Free kernel available. At the time, BSD still had a dark cloud hanging over it due to its (distant) origins at AT the BSD and AT UNIX codebases would not be legally recognized as separate until February 1994, although BSD had honestly (almost?) completely diverged from the AT codebase in June 1991 with Net/2. Mach was still proprietary; RMS was (or would later be) campaigning for its liberation, which would not occur until some years later. It is worth noting that Linux was originally a toy kernel, and it only attracted the effort it did and grew like it did because it was basically the last missing piece for fully Free systems at the time. Yes that is how I understand it too Ah sorry, I shouldn't have made reference to JSON at all --- what I really was getting at is the /abstract syntax/. In particular, rather than having an abstract syntax of "list of strings" (parsing today's concrete syntax by breaking on dash), where the meaning of each string is ambiguous / context-sensative, we have of "keys mapped to enumerations", i.e. one always knows the meaning of each component explicitly / without inspecting it or its context. JSON or your flat list in canonical ordering (where I assume we are careful to never skip a type of component) are both valid concrete syntaxes that can be parsed / printed from this abstract syntax. JSON is far too complicated to use here, except possibly as a "pre-parsed" form that config.sub could output on request for programs that want a structured form instead of parsing the tuple themselves. But for that case, why use JSON instead of a trivial multi-line key=value format? Hypothetical Example: $ config.sub --parse x86_64-linux-gnu cpu=x86_64 vendor=pc kernel=linux os=gnu $ Note that this example both canonicalizes and parses. Yes that looks great to me. This shares the abstract syntax with what I had in mind, and anything that understands JSON can easily convert back and forth between the two. I argue for "duck-typing" here from the user's perspective: if and only if the system in all meaningful ways appears to be the GNU system, there should be a *-gnu* somewhere in the configuration tuple. I am OK with duck-typing, but what is "all meaningful ways"? Sure, POSIX is meaningful, the exact output of uname is not, etc. but where do we draw the line? This is also the framework in which *-*-linux-gnu-musl makes sense for a system that uses Musl libc but is otherwise a GNU/Linux system. Right but again where do we draw the line? For example, can one use systemd and its large entourage of intertwined software, or must one use GNU Shepherd or System V init? Effectively, a different libc is a different ABI. Agreed, especially when the syscall interface isn't stable, like with many non-Windows kernels. My larger goal here is to smooth the way for multi-arch systems, with /usr/CPU-VENDOR-KERNEL-OS-ABI or so as the --prefix for binaries built for each architecture. This means that configuration tuples should be detailed enough to allow the needed distinctions, but not so detailed as to themselves become an artificial incompatibility. In larger networked environments, even KERNEL and OS could vary. It's a great goal, and mine too! :) Yeah whatever windows-something we settle on for MinGW, I promise my offer still stands to try to get get LLVM to (a) accept it, and (b) steer people away from windows-gnu towards it. Thanks. No problem! :) This is the major expectation that using *-*-windows-gnu for MinGW violates: GNU implements POSIX and MinGW does not. Using *-mingnu still leaves considerable room for confusion in my view, which using *-mingw avoids. That is fine with me. Agreed "mingnu" takes the proper noun and turns it back into a common noun phrase --- i.e. "minimal GNU" has many valid interpretations while "MinGW" avoids that be being a known quantity. After that, I think we are close enough to convene a working group for a JSON/whatever explicit standard. And that would be amazing. I still oppose JSON because it is way too verbose for this: configuration tuples need to be both expressive and simple enough to type at a shell prompt as arguments to configure. Using JSON by default would also be a very nasty "flag day" that would break all existing programs that use config.sub. Perhaps config.sub could accept an --as=json parameter for JSON output? Yes exactly, JSON is a no-go for prefixed binaries, but probably better for things like Autoconf which needs to parse the output of config.sub either way. No, because Autoconf uses the shell and JSON is a [*profanity elided*] to parse using shell constructs. A flat list of hyphen-delimited tags is almost ideal for the parsing that configure needs to do. In fact, with a few restrictions (met by using canonical ordering) this is what
Re: Rethinking configuration tuples
John Ericson wrote: On 8/24/23 23:54, Jacob Bachmeyer wrote: John Ericson wrote: This is why I opened with "Operating System" lacks a coherent objective definition. [...] As I understand, historically, "operating systems" were proprietary monoliths and the GNU Project originally expected to produce another monolith, but /our/ monolith would be Free Software. As an interim measure, the GNU utilities were designed to be widely portable across the various individually-monolithic proprietary operating systems then in use across a wide variety of hardware. The broader Free Software Movement unexpectedly shattered that state of affairs, leading to the 4-element configuration tuple form, when the Linux kernel became available and it was noticed that---oops!---GNU on Linux and GNU on HURD would have significant differences that at least some of the GNU packages would need to handle. (For example, GNU libc is very different between Linux, where POSIX I/O maps fairly directly to underlying syscalls, and HURD, where POSIX I/O must be translated to Mach IPC, but both of these are Free GNU systems.) This means that the GNU system is a somewhat blurry category, with many variants possible, and is orthogonal to "Linux": there are GNU/Linux systems, GNU systems using other kernels, and Linux-based systems not using GNU at all. This latter category is fairly common in embedded systems, where the GNU utilities are often eschewed for lighter-weight alternatives to save flash space (or, less honorably, to avoid GPL3). Yes I agree with this state of affairs. I sometimes (but not always!) detect a sort of "Linux Scooped us" sentiment in GNU quarters, but as I see it portability and diversity of distros was pretty much inevitable --- replacing propriety Unix userlands with GNU software was a huge point in how GNU got going in academic/institutional environments in the early days, and even if Hurd got there before Linux there would be no reason to rip out that portability. As I understand the history, Linux was the first clearly Free kernel available. At the time, BSD still had a dark cloud hanging over it due to its (distant) origins at AT the BSD and AT UNIX codebases would not be legally recognized as separate until February 1994, although BSD had honestly (almost?) completely diverged from the AT codebase in June 1991 with Net/2. Mach was still proprietary; RMS was (or would later be) campaigning for its liberation, which would not occur until some years later. It is worth noting that Linux was originally a toy kernel, and it only attracted the effort it did and grew like it did because it was basically the last missing piece for fully Free systems at the time. JSON is pretty much a hard no for me: it is far too complex for what really needs to be a simple structure. Flat strings work very well for the way that GNU software typically expects to parse a configuration tuple using shell constructs. Perhaps it would be better to redefine configuration tuples as a flat list of tags with a canonical ordering? (The reason for a canonical ordering is in part to ensure that all existing coherent configuration tuple strings remain valid and to ensure that text-based pattern matching continues to work.) Ah sorry, I shouldn't have made reference to JSON at all --- what I really was getting at is the /abstract syntax/. In particular, rather than having an abstract syntax of "list of strings" (parsing today's concrete syntax by breaking on dash), where the meaning of each string is ambiguous / context-sensative, we have of "keys mapped to enumerations", i.e. one always knows the meaning of each component explicitly / without inspecting it or its context. JSON or your flat list in canonical ordering (where I assume we are careful to never skip a type of component) are both valid concrete syntaxes that can be parsed / printed from this abstract syntax. JSON is far too complicated to use here, except possibly as a "pre-parsed" form that config.sub could output on request for programs that want a structured form instead of parsing the tuple themselves. But for that case, why use JSON instead of a trivial multi-line key=value format? Hypothetical Example: $ config.sub --parse x86_64-linux-gnu cpu=x86_64 vendor=pc kernel=linux os=gnu $ Note that this example both canonicalizes and parses. [...] I know Po Lu doesn't like them, because they overlap with existing ones. But what about you two, Adam and Jacob? I am trying to compromise between what various things do already, and and also correct things like windows-gnu (even if there is no such thing as the GNU operating system (only multiple GNU Hurd-supporting distros), I agree that MinGW is clearly not a complete enough of set of GNU software to earn the right to drop the "minimal" part). The logical problem with your parenthetical is that it ignores GNU/Linux, which *is* also a GNU
Re: Rethinking configuration tuples (was: Re: config.sub should normalize *-*-windows-*)
On 8/24/23 23:54, Jacob Bachmeyer wrote: John Ericson wrote: This is why I opened with "Operating System" lacks a coherent objective definition. [...] As I understand, historically, "operating systems" were proprietary monoliths and the GNU Project originally expected to produce another monolith, but /our/ monolith would be Free Software. As an interim measure, the GNU utilities were designed to be widely portable across the various individually-monolithic proprietary operating systems then in use across a wide variety of hardware. The broader Free Software Movement unexpectedly shattered that state of affairs, leading to the 4-element configuration tuple form, when the Linux kernel became available and it was noticed that---oops!---GNU on Linux and GNU on HURD would have significant differences that at least some of the GNU packages would need to handle. (For example, GNU libc is very different between Linux, where POSIX I/O maps fairly directly to underlying syscalls, and HURD, where POSIX I/O must be translated to Mach IPC, but both of these are Free GNU systems.) This means that the GNU system is a somewhat blurry category, with many variants possible, and is orthogonal to "Linux": there are GNU/Linux systems, GNU systems using other kernels, and Linux-based systems not using GNU at all. This latter category is fairly common in embedded systems, where the GNU utilities are often eschewed for lighter-weight alternatives to save flash space (or, less honorably, to avoid GPL3). Yes I agree with this state of affairs. I sometimes (but not always!) detect a sort of "Linux Scooped us" sentiment in GNU quarters, but as I see it portability and diversity of distros was pretty much inevitable --- replacing propriety Unix userlands with GNU software was a huge point in how GNU got going in academic/institutional environments in the early days, and even if Hurd got there before Linux there would be no reason to rip out that portability. JSON is pretty much a hard no for me: it is far too complex for what really needs to be a simple structure. Flat strings work very well for the way that GNU software typically expects to parse a configuration tuple using shell constructs. Perhaps it would be better to redefine configuration tuples as a flat list of tags with a canonical ordering? (The reason for a canonical ordering is in part to ensure that all existing coherent configuration tuple strings remain valid and to ensure that text-based pattern matching continues to work.) Ah sorry, I shouldn't have made reference to JSON at all --- what I really was getting at is the /abstract syntax/. In particular, rather than having an abstract syntax of "list of strings" (parsing today's concrete syntax by breaking on dash), where the meaning of each string is ambiguous / context-sensative, we have of "keys mapped to enumerations", i.e. one always knows the meaning of each component explicitly / without inspecting it or its context. JSON or your flat list in canonical ordering (where I assume we are careful to never skip a type of component) are both valid concrete syntaxes that can be parsed / printed from this abstract syntax. --- Concretely, I think these are pretty clear configs: CPU-VENDOR-windows-mingnu # MinGW, MS C + GNU C++ and other GNU-ish things, TODO distinguish between MSVCRT and UCRT I say that this one really should just be *-mingw. Sure. I went with mingnu because the "w" is redundant with the "windows", but ultimately I care more about the pattern than the exact choice of identifiers / enumeration tags. (As we way in programming language land, I care about the thing "up to alpha-renaming"). Note that there are both MinGW32 and MinGW64, corresponding to 32-bit and 64-bit Windows APIs. Should that be included or should the CPU type be used to distinguish? (e.g. i686-pc-windows-mingw is MinGW32 and x86_64-pc-windows-mingw is MinGW64?) Yes I think so. If you look at https://www.mingw-w64.org/downloads/ one even sees |x86_64-w64-mingw32| which is quite something, and 64-bit! I think what happened is that "w32" to was chosen to mean the then-new win32 API/ABI, as opposed to DOS. Win64 as I understand is necessarily a new ABI because of the change in CPU arch, but not really a new API, being more of a "let's make the minimal amount of changes so the source/headers are portable" situation. So a combination of "same API" and "too lazy to update GNU config" made "mingw32" stick around. f16804b79ee5a23a9994a1cdc760cd9ba813148a added mingw64 to GNU config in 2012, which is far after the advent of 64-bit Windows. In the proposed five-element form, MSVCRT and UCRT are easily distinguished. Example: i686-pc-windows-mingw-msvcrt i686-pc-windows-mingw-ucrt x86_64-pc-windows-mingw-msvcrt x86_64-pc-windows-mingw-ucrt That is very true, I will grant you that :) CPU-VENDOR-windows-cygnus # Cygwin
Re: Rethinking configuration tuples
Po Lu wrote: People, the nature and widespread use of config.* precludes any efforts aimed at ``rethinking'' the tuples they accept and generate. If you want your own format, then by all means, proceed with your own project. But please leave config.* in peace. I will say right now that backwards compatibility, specifically that existing tuples remain unchanged as much as possible (blatantly incorrect tuples such as *-windows-gnu for MinGW excepted) is an absolute requirement here. Existing code expects the existing strings. Those must be preserved. -- Jacob
Re: Rethinking configuration tuples
People, the nature and widespread use of config.* precludes any efforts aimed at ``rethinking'' the tuples they accept and generate. If you want your own format, then by all means, proceed with your own project. But please leave config.* in peace.
Rethinking configuration tuples (was: Re: config.sub should normalize *-*-windows-*)
John Ericson wrote: This is why I opened with "Operating System" lacks a coherent objective definition. The more pugilistic message is to say the rest of the world doesn't think the GNU operating system exists --- that there is simply a choice of kernel (Linux, k*BSD, Hurd, something else...) and choices of libraries and system components on top of that, and many combinations are possible. The rest of the world might say this in a mean way, but I say it is actually a /good/ thing --- software freedom means one /can/ choose my components à la carte, and only a lack of software freedom results in a kernel and mass of libraries outside one's control blurring together into a scary "take it or leave it" monolith we call an operating system. As I understand, historically, "operating systems" were proprietary monoliths and the GNU Project originally expected to produce another monolith, but /our/ monolith would be Free Software. As an interim measure, the GNU utilities were designed to be widely portable across the various individually-monolithic proprietary operating systems then in use across a wide variety of hardware. The broader Free Software Movement unexpectedly shattered that state of affairs, leading to the 4-element configuration tuple form, when the Linux kernel became available and it was noticed that---oops!---GNU on Linux and GNU on HURD would have significant differences that at least some of the GNU packages would need to handle. (For example, GNU libc is very different between Linux, where POSIX I/O maps fairly directly to underlying syscalls, and HURD, where POSIX I/O must be translated to Mach IPC, but both of these are Free GNU systems.) This means that the GNU system is a somewhat blurry category, with many variants possible, and is orthogonal to "Linux": there are GNU/Linux systems, GNU systems using other kernels, and Linux-based systems not using GNU at all. This latter category is fairly common in embedded systems, where the GNU utilities are often eschewed for lighter-weight alternatives to save flash space (or, less honorably, to avoid GPL3). On 8/24/23 08:51, Adam Joseph wrote: [...] It seems like a lot of the proposals in this thread are being evaluated not based on whether or not they are coherent, but rather on whether or not they take us a few nanometers closer to whatever happens to whatever LLVM's internal implementation details happen to be this week. I care about coherence, the reason I like to see what LLVM does that working from a parsed representation forces the software to be much more honest. Since GNU config doesn't reveal its categories but just spits out another opaque string, there is no external pressure for its categorization to be any good. LLVM, on the other hand, dispenses with strings entirely and just uses the enums, so it is forced to make sure those enums make sense and work for the branching the program has to do. LLVM parsing of configs is ad-hoc Postel's law stuff like everyone else, but its internal representation is actually quite stable. Parsing is the ugly nasty part that gets to the pristine clear ontology on the other side. Ultimately I would like to convene everyone to commit to an agreed upon internal representation too. E.g. clang and GNU config could both spit out some JSON that is unambiguous and should match. I think that would alleviate a lot of Adam's concerns about "following LLVM". But I don't think it is possible to convene the working group needed to standardize such a format yet, because there is little trust between parties. Moving us a "a few nanometers closer" on each side demonstrates that there is willingness to compromise. JSON is pretty much a hard no for me: it is far too complex for what really needs to be a simple structure. Flat strings work very well for the way that GNU software typically expects to parse a configuration tuple using shell constructs. Perhaps it would be better to redefine configuration tuples as a flat list of tags with a canonical ordering? (The reason for a canonical ordering is in part to ensure that all existing coherent configuration tuple strings remain valid and to ensure that text-based pattern matching continues to work.) --- Concretely, I think these are pretty clear configs: CPU-VENDOR-windows-mingnu # MinGW, MS C + GNU C++ and other GNU-ish things, TODO distinguish between MSVCRT and UCRT I say that this one really should just be *-mingw. Note that there are both MinGW32 and MinGW64, corresponding to 32-bit and 64-bit Windows APIs. Should that be included or should the CPU type be used to distinguish? (e.g. i686-pc-windows-mingw is MinGW32 and x86_64-pc-windows-mingw is MinGW64?) In the proposed five-element form, MSVCRT and UCRT are easily distinguished. Example: i686-pc-windows-mingw-msvcrt i686-pc-windows-mingw-ucrt x86_64-pc-windows-mingw-msvcrt