Re: Rethinking configuration tuples (was: Re: config.sub should normalize *-*-windows-*)

2023-08-26 Thread John Ericson


On 8/24/23 23:54, Jacob Bachmeyer wrote:

John Ericson wrote:


This is why I opened with "Operating System" lacks a coherent 
objective definition.


[...]


As I understand, historically, "operating systems" were proprietary 
monoliths and the GNU Project originally expected to produce another 
monolith, but /our/ monolith would be Free Software.  As an interim 
measure, the GNU utilities were designed to be widely portable across 
the various individually-monolithic proprietary operating systems then 
in use across a wide variety of hardware.  The broader Free Software 
Movement unexpectedly shattered that state of affairs, leading to the 
4-element configuration tuple form, when the Linux kernel became 
available and it was noticed that---oops!---GNU on Linux and GNU on 
HURD would have significant differences that at least some of the GNU 
packages would need to handle.  (For example, GNU libc is very 
different between Linux, where POSIX I/O maps fairly directly to 
underlying syscalls, and HURD, where POSIX I/O must be translated to 
Mach IPC, but both of these are Free GNU systems.)


This means that the GNU system is a somewhat blurry category, with 
many variants possible, and is orthogonal to "Linux":  there are 
GNU/Linux systems, GNU systems using other kernels, and Linux-based 
systems not using GNU at all.  This latter category is fairly common 
in embedded systems, where the GNU utilities are often eschewed for 
lighter-weight alternatives to save flash space (or, less honorably, 
to avoid GPL3).


Yes I agree with this state of affairs. I sometimes (but not always!) 
detect a sort of "Linux Scooped us" sentiment in GNU quarters, but as I 
see it portability and diversity of distros was pretty much inevitable 
--- replacing propriety Unix userlands with GNU software was a huge 
point in how GNU got going in academic/institutional environments in the 
early days, and even if Hurd got there before Linux there would be no 
reason to rip out that portability.


JSON is pretty much a hard no for me:  it is far too complex for what 
really needs to be a simple structure.  Flat strings work very well 
for the way that GNU software typically expects to parse a 
configuration tuple using shell constructs.  Perhaps it would be 
better to redefine configuration tuples as a flat list of tags with a 
canonical ordering?  (The reason for a canonical ordering is in part 
to ensure that all existing coherent configuration tuple strings 
remain valid and to ensure that text-based pattern matching continues 
to work.)


Ah sorry, I shouldn't have made reference to JSON at all --- what I 
really was getting at is the /abstract syntax/. In particular, rather 
than having an abstract syntax of "list of strings" (parsing today's 
concrete syntax by breaking on dash), where the meaning of each string 
is ambiguous / context-sensative, we have of "keys mapped to 
enumerations", i.e. one always knows the meaning of each component 
explicitly / without inspecting it or its context.


JSON or your flat list in canonical ordering (where I assume we are 
careful to never skip a type of component) are both valid concrete 
syntaxes that can be parsed / printed from this abstract syntax.





---

Concretely, I think these are pretty clear configs:

CPU-VENDOR-windows-mingnu # MinGW, MS C + GNU C++ and other GNU-ish 
things, TODO distinguish between MSVCRT and UCRT




I say that this one really should just be *-mingw.


Sure. I went with mingnu because the "w" is redundant with the 
"windows", but ultimately I care more about the pattern than the exact 
choice of identifiers / enumeration tags. (As we way in programming 
language land, I care about the thing "up to alpha-renaming").


Note that there are both MinGW32 and MinGW64, corresponding to 32-bit 
and 64-bit Windows APIs.  Should that be included or should the CPU 
type be used to distinguish?  (e.g.  i686-pc-windows-mingw is MinGW32 
and x86_64-pc-windows-mingw is MinGW64?)


Yes I think so. If you look at https://www.mingw-w64.org/downloads/ one 
even sees |x86_64-w64-mingw32| which is quite something, and 64-bit!


I think what happened is that "w32" to was chosen to mean the then-new 
win32 API/ABI, as opposed to DOS. Win64 as I understand is necessarily a 
new ABI because of the change in CPU arch, but not really a new API, 
being more of a "let's make the minimal amount of changes so the 
source/headers are portable" situation. So a combination of "same API" 
and "too lazy to update GNU config" made "mingw32" stick around.


f16804b79ee5a23a9994a1cdc760cd9ba813148a added mingw64 to GNU config in 
2012, which is far after the advent of 64-bit Windows.


In the proposed five-element form, MSVCRT and UCRT are easily 
distinguished.  Example:


i686-pc-windows-mingw-msvcrt
i686-pc-windows-mingw-ucrt
x86_64-pc-windows-mingw-msvcrt
x86_64-pc-windows-mingw-ucrt


That is very true, I will grant you that :)


CPU-VENDOR-windows-cygnus # Cygwin

CPU-VENDOR-

Rethinking configuration tuples (was: Re: config.sub should normalize *-*-windows-*)

2023-08-24 Thread Jacob Bachmeyer

John Ericson wrote:


This is why I opened with "Operating System" lacks a coherent 
objective definition.


The more pugilistic message is to say the rest of the world doesn't 
think the GNU operating system exists --- that there is simply a 
choice of kernel (Linux, k*BSD, Hurd, something else...) and choices 
of libraries and system components on top of that, and many 
combinations are possible. The rest of the world might say this in a 
mean way, but I say it is actually a /good/ thing --- software freedom 
means one /can/ choose my components à la carte, and only a lack of 
software freedom results in a kernel and mass of libraries outside 
one's control blurring together into a scary "take it or leave it" 
monolith we call an operating system.




As I understand, historically, "operating systems" were proprietary 
monoliths and the GNU Project originally expected to produce another 
monolith, but /our/ monolith would be Free Software.  As an interim 
measure, the GNU utilities were designed to be widely portable across 
the various individually-monolithic proprietary operating systems then 
in use across a wide variety of hardware.  The broader Free Software 
Movement unexpectedly shattered that state of affairs, leading to the 
4-element configuration tuple form, when the Linux kernel became 
available and it was noticed that---oops!---GNU on Linux and GNU on HURD 
would have significant differences that at least some of the GNU 
packages would need to handle.  (For example, GNU libc is very different 
between Linux, where POSIX I/O maps fairly directly to underlying 
syscalls, and HURD, where POSIX I/O must be translated to Mach IPC, but 
both of these are Free GNU systems.)


This means that the GNU system is a somewhat blurry category, with many 
variants possible, and is orthogonal to "Linux":  there are GNU/Linux 
systems, GNU systems using other kernels, and Linux-based systems not 
using GNU at all.  This latter category is fairly common in embedded 
systems, where the GNU utilities are often eschewed for lighter-weight 
alternatives to save flash space (or, less honorably, to avoid GPL3).



On 8/24/23 08:51, Adam Joseph wrote:

[...]
It seems like a lot of the proposals in this thread are being evaluated not
based on whether or not they are coherent, but rather on whether or not they
take us a few nanometers closer to whatever happens to whatever LLVM's internal
implementation details happen to be this week.



I care about coherence, the reason I like to see what LLVM does that 
working from a parsed representation forces the software to be much 
more honest. Since GNU config doesn't reveal its categories but just 
spits out another opaque string, there is no external pressure for its 
categorization to be any good. LLVM, on the other hand, dispenses with 
strings entirely and just uses the enums, so it is forced to make sure 
those enums make sense and work for the branching the program has to do.


LLVM parsing of configs is ad-hoc Postel's law stuff like everyone 
else, but its internal representation is actually quite stable. 
Parsing is the ugly nasty part that gets to the pristine clear 
ontology on the other side.


Ultimately I would like to convene everyone to commit to an agreed 
upon internal representation too. E.g. clang and GNU config could both 
spit out some JSON that is unambiguous and should match. I think that 
would alleviate a lot of Adam's concerns about "following LLVM". But I 
don't think it is possible to convene the working group needed to 
standardize such a format yet, because there is little trust between 
parties. Moving us a "a few nanometers closer" on each side 
demonstrates that there is willingness to compromise.




JSON is pretty much a hard no for me:  it is far too complex for what 
really needs to be a simple structure.  Flat strings work very well for 
the way that GNU software typically expects to parse a configuration 
tuple using shell constructs.  Perhaps it would be better to redefine 
configuration tuples as a flat list of tags with a canonical ordering?  
(The reason for a canonical ordering is in part to ensure that all 
existing coherent configuration tuple strings remain valid and to ensure 
that text-based pattern matching continues to work.)



---

Concretely, I think these are pretty clear configs:

CPU-VENDOR-windows-mingnu # MinGW, MS C + GNU C++ and other GNU-ish 
things, TODO distinguish between MSVCRT and UCRT




I say that this one really should just be *-mingw.  Note that there are 
both MinGW32 and MinGW64, corresponding to 32-bit and 64-bit Windows 
APIs.  Should that be included or should the CPU type be used to 
distinguish?  (e.g.  i686-pc-windows-mingw is MinGW32 and 
x86_64-pc-windows-mingw is MinGW64?)


In the proposed five-element form, MSVCRT and UCRT are easily 
distinguished.  Example:


i686-pc-windows-mingw-msvcrt
i686-pc-windows-mingw-ucrt
x86_64-pc-windows-mingw-msvcrt
x86_64-pc-windows-mingw-ucrt