On 10/13/24 23:04, John Millikin wrote:
On 2024-10-13 01:28, Rob Landley <r...@landley.net> wrote:
I need to write a FAQ entry about scripts/prereq/build.sh and maybe a
section of the README. It's regenerated each release by
scripts/recreate-prereq.sh (yes I'm checking in a generated file) and
the current documentation is the commit messages on the initial commit:

That's a very useful pointer, thank you! The `scripts/prereq/generated/*.h'
files were exactly what I was looking for.

Updating that is part of my release checklist.

(Yes the release is late. Between selling my house and moving, job hunting that wound up with me going back to a previous employer, and an overenthusiastic developer taking the fun out of reading the mailing list, reinstalling my build environment, yet another round of covid, the looming election... I seem to have burned out fairly hard this year. Trying to spin back up to speed...)

My specific proposal is to convert some or all of the $SED processing into
C code and either put it in its own binary, or unify it with
`scripts/mkflags.c' / `scripts/config2help.c' / etc.

I was actually looking to get away from those and move it _more_ towards
sed. (And maybe awk now that we have one of those but I have to read
through it and promote it out of pending first.)

That said, I'm working on replacing kconfig/* into a new
scripts/config.c so maybe it will inherit some of those functions, not
sure yet...

Ah, in that case, would an acceptable middle ground be to move the sed
invocations into a separate `.sh' script that can be invoked directly?

I tried to do that some years ago (make header generation a separate script), and it complicated the build because both the header generation and the compile part needed to populate the list of files and libraries. (And no you can't just write them in to generated/ because the lifetimes are wrong. Plus at best you're splitting 80% of the code off from the remaining 20%, you're actually factoring out the _build_, not breaking up the header generation. And reading multiple different files instead of one didn't make what it was doing clearer, the only reason portability.sh is a separate file is multiple things like single.sh and install.sh read it. I was actually looking at folding genconfig.sh BACK in because there's much less of it now that C11's __has_include() let me move most of the probes it was doing to preprocessor directives...)

That said, the compile loop itself is fairly generic and might be of use to other projects. And it only exists at all because "cc -j $(nproc)" isn't recognized by compilers, which you'd THINK would be providing their own SMP internally in 2024. Alas even _that_ needs the scripts/portability.sh nonsense because MacOS hasn't got "nproc" and so on, so it's not REALLY generic. It's got dependencies...

What would the advantage be? It's not _all_ sed invocations (despite some cleanup passes recently ala 1e3708a91268 and 567f8daac6e7 and d52e93c94784 and the various library probe redo commits). A relevant blog entry would be https://landley.net/notes-2024.html#08-02-2024

It's still building and running mkflags.c and config2help.c because the processing those do is a bit beyond what I could beat out of sed. The help text processing probably gets lumped into the kconfig rewrite. The flags thing is mostly about getting code to drop out when you switch off config symbols: the build depends pretty heavily on the compiler doing dead code elimination for if (0) {blah;} which even Turbo C for DOS did fine (and the -ffunction-sections -fdata-sections -Wl,--gc-sections compiler directives tell the linker to strip out unreferenced functions and global variables from lib, so I don't have to #ifdef around them), but it helps if FLAG(F) macros turn into 0 and that requires USE_CFGSYM("blah") wrappers and some plumbing. Happy to do it in a more elegant way if someone can think of one...

(And yes, MacOS compiler doesn't support --gc-sections despite it being in gcc for TWENTY YEARS and llvm.ld supporting it from early on. The problem is they don't use ELF, they use their own proprietary mach-o object format, so they're stuck with a linker from the dawn of time. They DO have a thing that does this, --dead-strip, it's just gratuitously incompatibly renamed and is another reason portability.sh exists...)

For example, `scripts/make.sh' currently contains this code to produce
`generated/newtoys.h':

     if isnewer newtoys.h toys
     then
       # The multiplexer is the first element in the array
       echo "USE_TOYBOX(NEWTOY(toybox, 0, TOYFLAG_STAYROOT|TOYFLAG_NOHELP))" \
         > "$GENDIR"/newtoys.h
       # Sort rest by name for binary search (copy name to front, sort,
remove copy)
       $SED -n 's/^\(USE_[^(]*(.*TOY(\)\([^,]*\)\(,.*\)/\2 \1\2\3/p' toys/*/*.c 
\
         | sort -s -k 1,1 | $SED 's/[^ ]* //'  >> "$GENDIR"/newtoys.h
       [ $? -ne 0 ] && exit 1
     fi

It's difficult to get there without having the rest of `make.sh' tag along
(the hostcmp and environment probing), but if the code were adjusted to
something like this:

     # make-generated.sh
     gen_newtoys_h() {
       # The multiplexer is the first element in the array
       echo "USE_TOYBOX(NEWTOY(toybox, 0, TOYFLAG_STAYROOT|TOYFLAG_NOHELP))" \
         > "$GENDIR"/newtoys.h
       # Sort rest by name for binary search (copy name to front, sort,
remove copy)
       $SED -n 's/^\(USE_[^(]*(.*TOY(\)\([^,]*\)\(,.*\)/\2 \1\2\3/p' toys/*/*.c 
\
         | sort -s -k 1,1 | $SED 's/[^ ]* //'  >> "$GENDIR"/newtoys.h
       [ $? -ne 0 ] && exit 1
     }

     # scripts/make.sh
     source scripts/make-generated.sh
     # [...]
     if isnewer newtoys.h toys
     then
       gen_newtoys_h
     fi

This would let a hermetic build system handle the C compilation of the helper
tools, then call into `make-generated.sh' for the sedding.

So you propose building a parallel build system that sources subsets of my scripts, broken down into yet more files that import each other. And both files hardwire the name "newtoys.h", so farther from the general "single point of truth" concept...

The only advantage I can see here is granularity: you want to be able to reproduce some scripts but not others. Is there a downside to creating them all and cherry picking what you need? Or if some don't build at all in a given environment, anything doing isnewer can be skipped via "touch".

I'm all for simplifying the build, but when I figure out how I tend to do it. Making gears that need to interlock for the benefit of environments I will never personally regression test isn't always simplifying...

Alas the one thing the build still needs is /bin/bash because toysh
isn't quite ready yet. I'm working on that too, but this year's kind of
gotten away from me. (I sold my house and moved, my wife graduated and
got a full time job, I went back to work for the j-core guys...)

I did some light testing and found that the generated code portions of
`scripts/make.sh' are mostly portable. There were two minor Bash-isms that
were easy to replace with POSIX equivalents,

Mostly because macos is using an ancient version of bash (last GPLv2 release, from 2007) which doesn't understand things like wait -n and thus it has a probe and workaround.

and I could successfully run the
build using either dash-0.5.12

Sigh, breaking the Defective Annoying SHell strikes me as a bonus but I'm biased there.

http://lists.landley.net/pipermail/toybox-landley.net/2020-March/027641.html

or mksh-R59c (which are both much easier to
build in an isolated chroot than Bash).

Android's using mksh, the test plumbing at least gets a workout over there. In the test suite, I apply patches removing bashisms because I haven't finished toysh yet and they have to run with mksh on device. The build does not yet run on device (but I'm working on that, including building the kernel).

Personally, I'd really like to finish toysh, but seen "burnout" above. (And the whole red queen's race thing where I really don't WANT to maintain kernel rust removal patches the way I did perl removal patches, and wasn't planning to implement my own crypt() but glibc decided posix schmozix they were yanking it, and upgrading from devuan bronchitis to devuan dermatitis broke a bunch of "TEST_HOST=1 make tests" I still haven't entirely sorted through...)

Do you have any interest in patches to make `scripts/make.sh' (and/or an
extracted `make-generated.sh') POSIX-compatible-er?

I'm writing a bash compatible shell. Dogfooding it (the toybox build working under toybox's shell) is probably my goal for a 0.9 release. (0.8.twodigits is _embarassing, and doesn't sort right in directories).

Removing bash-isms I intend to implement just makes me go "I need to do more shell work". That said, I can see an argument for running the build under mksh. What specific features is mksh missing?

Rob
_______________________________________________
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to