Re: config.sub should normalize *-*-windows-*

2023-08-26 Thread Po Lu
connor horman  writes:

> It seems to me reading this thread that we've come into two
> conflicting realities: * There exists targets that need to be
> distinguished, and * They are not distinct in any component that
> config.sub has, therefore they cannot and should not be distinguished.
>
> mingw and msvc both use the NT kernel, and the windows operating
> system. So it seems to me that windows, the OS, is the correct way to
> describe them. According to the discussions on this thread, they
> should thusly both canonicalize to the same target. And yet, not only
> is there desire to separate these targets, they already are.
>
> LLVM (as well as my own target parsing tool) refer to the last two
> components as "sys" with two subcomponents (of which at least one
> exists), being os and env. IMO, this seems a far more coherent
> definition that satisfies the requirements, and even more correctly
> matches targets that already exist.

The objective is to keep the status quo unchanged till Hell freezes
over, so that no programs will ever be broken.

> musl is another extreme example: There is no musl OS. The last
> component being musl refers to the use of the musl libc. The resulting
> binary can then be used on either a GNU system or a non-GNU linux
> system like alpine, void, or iglunix. Thus musl cannot be regarded as
> an "OPERATING_SYSTEM" but rather an an environment.
>
> Even on linux-gnu the definition is murky at best. While I won't
> dispute the existence of a GNU operating system running atop the linux
> kernel, in many cases, the actual linux-gnu tag merely refers to
> glibc. Few things using targets nowadays actually cares about the rest
> of the tools, and when they do care that they exist (on --host or even
> --target), they typically don't care that they're provided by GNU, and
> even may not care that they match the interface of the tool provided
> by GNU. Only on --build are the tools really cared about, and I don't
> see many things matching the build tuple or even canonicalizing it. If
> we thus define an "Operating System" as "kernel+libc+tools atop that"
> it becomes clear to me that few things written nowadays care about the
> "GNU Operating System" and only really care about the "GNU
> Environment".

For the purpose of compiling programs, systems using the GNU libc are
equivalent to GNU systems.  config.* does not draw excessively fine
distinctions between them.

In keeping with that, systems using the Musl libc are so similar that
they may as well be considered as a single operating system.  This
contrasts with MinGW and MSVC, whose discrepancies are of sufficient
consequence to warrant individual identification by config.*.

And as configurations which embody these distinctions _already exist_,
they should never change, nor be supplanted by new and purportedly
``improved'' configurations.  I reiterate, until the very end of time...



Re: Rethinking configuration tuples (was: Re: config.sub should normalize *-*-windows-*)

2023-08-26 Thread John Ericson


On 8/24/23 23:54, Jacob Bachmeyer wrote:

John Ericson wrote:


This is why I opened with "Operating System" lacks a coherent 
objective definition.


[...]


As I understand, historically, "operating systems" were proprietary 
monoliths and the GNU Project originally expected to produce another 
monolith, but /our/ monolith would be Free Software.  As an interim 
measure, the GNU utilities were designed to be widely portable across 
the various individually-monolithic proprietary operating systems then 
in use across a wide variety of hardware.  The broader Free Software 
Movement unexpectedly shattered that state of affairs, leading to the 
4-element configuration tuple form, when the Linux kernel became 
available and it was noticed that---oops!---GNU on Linux and GNU on 
HURD would have significant differences that at least some of the GNU 
packages would need to handle.  (For example, GNU libc is very 
different between Linux, where POSIX I/O maps fairly directly to 
underlying syscalls, and HURD, where POSIX I/O must be translated to 
Mach IPC, but both of these are Free GNU systems.)


This means that the GNU system is a somewhat blurry category, with 
many variants possible, and is orthogonal to "Linux":  there are 
GNU/Linux systems, GNU systems using other kernels, and Linux-based 
systems not using GNU at all.  This latter category is fairly common 
in embedded systems, where the GNU utilities are often eschewed for 
lighter-weight alternatives to save flash space (or, less honorably, 
to avoid GPL3).


Yes I agree with this state of affairs. I sometimes (but not always!) 
detect a sort of "Linux Scooped us" sentiment in GNU quarters, but as I 
see it portability and diversity of distros was pretty much inevitable 
--- replacing propriety Unix userlands with GNU software was a huge 
point in how GNU got going in academic/institutional environments in the 
early days, and even if Hurd got there before Linux there would be no 
reason to rip out that portability.


JSON is pretty much a hard no for me:  it is far too complex for what 
really needs to be a simple structure.  Flat strings work very well 
for the way that GNU software typically expects to parse a 
configuration tuple using shell constructs.  Perhaps it would be 
better to redefine configuration tuples as a flat list of tags with a 
canonical ordering?  (The reason for a canonical ordering is in part 
to ensure that all existing coherent configuration tuple strings 
remain valid and to ensure that text-based pattern matching continues 
to work.)


Ah sorry, I shouldn't have made reference to JSON at all --- what I 
really was getting at is the /abstract syntax/. In particular, rather 
than having an abstract syntax of "list of strings" (parsing today's 
concrete syntax by breaking on dash), where the meaning of each string 
is ambiguous / context-sensative, we have of "keys mapped to 
enumerations", i.e. one always knows the meaning of each component 
explicitly / without inspecting it or its context.


JSON or your flat list in canonical ordering (where I assume we are 
careful to never skip a type of component) are both valid concrete 
syntaxes that can be parsed / printed from this abstract syntax.





---

Concretely, I think these are pretty clear configs:

CPU-VENDOR-windows-mingnu # MinGW, MS C + GNU C++ and other GNU-ish 
things, TODO distinguish between MSVCRT and UCRT




I say that this one really should just be *-mingw.


Sure. I went with mingnu because the "w" is redundant with the 
"windows", but ultimately I care more about the pattern than the exact 
choice of identifiers / enumeration tags. (As we way in programming 
language land, I care about the thing "up to alpha-renaming").


Note that there are both MinGW32 and MinGW64, corresponding to 32-bit 
and 64-bit Windows APIs.  Should that be included or should the CPU 
type be used to distinguish?  (e.g.  i686-pc-windows-mingw is MinGW32 
and x86_64-pc-windows-mingw is MinGW64?)


Yes I think so. If you look at https://www.mingw-w64.org/downloads/ one 
even sees |x86_64-w64-mingw32| which is quite something, and 64-bit!


I think what happened is that "w32" to was chosen to mean the then-new 
win32 API/ABI, as opposed to DOS. Win64 as I understand is necessarily a 
new ABI because of the change in CPU arch, but not really a new API, 
being more of a "let's make the minimal amount of changes so the 
source/headers are portable" situation. So a combination of "same API" 
and "too lazy to update GNU config" made "mingw32" stick around.


f16804b79ee5a23a9994a1cdc760cd9ba813148a added mingw64 to GNU config in 
2012, which is far after the advent of 64-bit Windows.


In the proposed five-element form, MSVCRT and UCRT are easily 
distinguished.  Example:


i686-pc-windows-mingw-msvcrt
i686-pc-windows-mingw-ucrt
x86_64-pc-windows-mingw-msvcrt
x86_64-pc-windows-mingw-ucrt


That is very true, I will grant you that :)


CPU-VENDOR-windows-cygnus # Cygwin


Re: config.sub should normalize *-*-windows-*

2023-08-26 Thread John Ericson

Thanks Connor. I think we are both on the same page!

On 8/24/23 14:51, connor horman wrote:

It seems to me reading this thread that we've come into two 
conflicting realities:

* There exists targets that need to be distinguished, and
* They are not distinct in any component that config.sub has, 
therefore they cannot and should not be distinguished.


mingw and msvc both use the NT kernel, and the windows operating 
system. So it seems to me that windows, the OS, is the correct way to 
describe them. According to the discussions on this thread, they 
should thusly both canonicalize to the same target. And yet, not only 
is there desire to separate these targets, they already are.
Agreed. We can have our cake and eat it to both both: (a) distinguishing 
things which are already distinguished and (b) having configs follow 
consistent conventions.


LLVM (as well as my own target parsing tool) refer to the last two 
components as "sys" with two subcomponents (of which at least one 
exists), being os and env. IMO, this seems a far more coherent 
definition that satisfies the requirements, and even more correctly 
matches targets that already exist.

Agreed!


musl is another extreme example: There is no musl OS. The last 
component being musl refers to the use of the musl libc. The resulting 
binary can then be used on either a GNU system or a non-GNU linux 
system like alpine, void, or iglunix. Thus musl cannot be regarded as 
an "OPERATING_SYSTEM" but rather an an environment.

Agreed!


Even on linux-gnu the definition is murky at best. While I won't 
dispute the existence of a GNU operating system running atop the linux 
kernel, in many cases, the actual linux-gnu tag merely refers to 
glibc. Few things using targets nowadays actually cares about the rest 
of the tools, and when they do care that they exist (on --host or even 
--target), they typically don't care that they're provided by GNU, and 
even may not care that they match the interface of the tool provided 
by GNU. Only on --build are the tools really cared about, and I don't 
see many things matching the build tuple or even canonicalizing it. If 
we thus define an "Operating System" as "kernel+libc+tools atop that" 
it becomes clear to me that few things written nowadays care about the 
"GNU Operating System" and only really care about the "GNU Environment".


Agreed! Well put --- even if we were to find a rigorous objective 
definition for "Operating System" in general, encompassing a long tail 
of auxiliary interfaces, it would be overly specific what what things 
inspecting the output of config.sub actually care about.


(FWIW I am also fine saying there exists the "GNU Operating System", but 
to me "Operating System" is always an exercise in branding, tying 
together disparate components which always in principle (e.g. if we had 
the source code) could be mixed-and-matched in other ways.)


I would like this very much to happen, along with the Rust project 
which has it's own target defs (but similar as well).


I am glad I am not the only one!

John


Re: Rethinking configuration tuples

2023-08-26 Thread Jacob Bachmeyer

John Ericson wrote:

On 8/24/23 23:54, Jacob Bachmeyer wrote:

John Ericson wrote:


This is why I opened with "Operating System" lacks a coherent 
objective definition.


[...]


As I understand, historically, "operating systems" were proprietary 
monoliths and the GNU Project originally expected to produce another 
monolith, but /our/ monolith would be Free Software.  As an interim 
measure, the GNU utilities were designed to be widely portable across 
the various individually-monolithic proprietary operating systems 
then in use across a wide variety of hardware.  The broader Free 
Software Movement unexpectedly shattered that state of affairs, 
leading to the 4-element configuration tuple form, when the Linux 
kernel became available and it was noticed that---oops!---GNU on 
Linux and GNU on HURD would have significant differences that at 
least some of the GNU packages would need to handle.  (For example, 
GNU libc is very different between Linux, where POSIX I/O maps fairly 
directly to underlying syscalls, and HURD, where POSIX I/O must be 
translated to Mach IPC, but both of these are Free GNU systems.)


This means that the GNU system is a somewhat blurry category, with 
many variants possible, and is orthogonal to "Linux":  there are 
GNU/Linux systems, GNU systems using other kernels, and Linux-based 
systems not using GNU at all.  This latter category is fairly common 
in embedded systems, where the GNU utilities are often eschewed for 
lighter-weight alternatives to save flash space (or, less honorably, 
to avoid GPL3).


Yes I agree with this state of affairs. I sometimes (but not always!) 
detect a sort of "Linux Scooped us" sentiment in GNU quarters, but as 
I see it portability and diversity of distros was pretty much 
inevitable --- replacing propriety Unix userlands with GNU software 
was a huge point in how GNU got going in academic/institutional 
environments in the early days, and even if Hurd got there before 
Linux there would be no reason to rip out that portability.




As I understand the history, Linux was the first clearly Free kernel 
available.  At the time, BSD still had a dark cloud hanging over it due 
to its (distant) origins at AT the BSD and AT UNIX codebases would 
not be legally recognized as separate until February 1994, although BSD 
had honestly (almost?) completely diverged from the AT codebase in 
June 1991 with Net/2.  Mach was still proprietary; RMS was (or would 
later be) campaigning for its liberation, which would not occur until 
some years later.  It is worth noting that Linux was originally a toy 
kernel, and it only attracted the effort it did and grew like it did 
because it was basically the last missing piece for fully Free systems 
at the time.


JSON is pretty much a hard no for me:  it is far too complex for what 
really needs to be a simple structure.  Flat strings work very well 
for the way that GNU software typically expects to parse a 
configuration tuple using shell constructs.  Perhaps it would be 
better to redefine configuration tuples as a flat list of tags with a 
canonical ordering?  (The reason for a canonical ordering is in part 
to ensure that all existing coherent configuration tuple strings 
remain valid and to ensure that text-based pattern matching continues 
to work.)


Ah sorry, I shouldn't have made reference to JSON at all --- what I 
really was getting at is the /abstract syntax/. In particular, rather 
than having an abstract syntax of "list of strings" (parsing today's 
concrete syntax by breaking on dash), where the meaning of each string 
is ambiguous / context-sensative, we have of "keys mapped to 
enumerations", i.e. one always knows the meaning of each component 
explicitly / without inspecting it or its context.


JSON or your flat list in canonical ordering (where I assume we are 
careful to never skip a type of component) are both valid concrete 
syntaxes that can be parsed / printed from this abstract syntax.




JSON is far too complicated to use here, except possibly as a 
"pre-parsed" form that config.sub could output on request for programs 
that want a structured form instead of parsing the tuple themselves.  
But for that case, why use JSON instead of a trivial multi-line 
key=value format?


Hypothetical Example:
$ config.sub --parse x86_64-linux-gnu
cpu=x86_64
vendor=pc
kernel=linux
os=gnu
$

Note that this example both canonicalizes and parses.


[...]
I know Po Lu doesn't like them, because they overlap with existing 
ones. But what about you two, Adam and Jacob? I am trying to 
compromise between what various things do already, and and also 
correct things like windows-gnu (even if there is no such thing as 
the GNU operating system (only multiple GNU Hurd-supporting 
distros), I agree that MinGW is clearly not a complete enough of set 
of GNU software to earn the right to drop the "minimal" part).


The logical problem with your parenthetical is that it ignores 
GNU/Linux, which *is* also a GNU