Re: single-binary for execline programs?

2023-02-02 Thread Laurent Bercot

Yes, this is only possible because you did a very good job in the first
place. Good work! This cannot be said enough.


 Thanks.
 I managed to de-global the arrays in trap.c, so now the only 
unavoidable

global is in forstdin: a pointer to a structure accessed by a signal
handler.
 You'd think with all the siginfo stuff, POSIX would have thought of
mandating a void * auxiliary pointer you'd give to sigaction() and that
would be stored and provided to the signal handler, but no, there's just
no room to pass user data other than globally. Yet another example of
wonderful, user-friendly design.

 But yeah, 8 bytes of bss/data for the whole thing is pretty good, the
crt and the libc are basically the only static RAM users, so there's
nothing more to do here.



I was also curious about starting time and should have done that in my
previous mail, it's a bit slower as expected.


 Yeah, a 0.2 ms difference is fine, I think. :P But I'm not sure if
it's possible to get an accurate benchmark, because the cost of 4-5
strcmp()s are negligible before the cost of the execve's in the first
place. I suspect at least half of the difference comes from mapping a
bigger executable.



I think the main reason to like shared libraries as a distribution is
that if you upgrade it, you get the upgrade for all programs that depend
on it -- which isn't reall a problem for this.


 Oh, absolutely, and that's why it's hard to advocate static linking to
distributions. It's a very reasonable argument for dynamic linking.



At the risk of repeating myself, I'll be happy to help with anything
related to this -- that's the least I can do given I brought it up.


 Thank you. I might seriously take you up on that offer further down
the road. :)
 But really, since the "cat everything together" method works in this
case, there's not much more to do except pay attention when writing
or editing normal programs in the future.

 I pushed "multicall-strip" and "multicall-install" targets in git,
and documented the setup in the INSTALL file. As experimental, because
although I *think* everything is working, there may still be some
interaction I've missed.

--
 Laurent



Re: single-binary for execline programs?

2023-02-01 Thread Dominique Martinet
Laurent Bercot wrote on Wed, Feb 01, 2023 at 06:59:04PM +:
> > Look, here's a trivial, suboptimal wrapper, far from pretty:
> > (...)
> > (look, I said it wasn't pretty -- there are at least a dozen of problems
> > with this, but nothing a day of work I offered to do can't fix; I wrote
> > this because it was faster than talking without a concrete example to
> > have some figures, and that took me less time than the rest of this mail)
> 
>  Damn you to the nine circles of Hell, one by one, slowly, then all of
> them at the same time.
>  You piqued my curiosity, so I did it, and I spent the day making it work.

I'm sorry -- I also know the feeling of doing myself something someone
suggested :-D

If that's any consolation, I'm also here testing it at 5AM...

>  That said, you were right: that's some pretty hefty saving of disk
> space. The execline binary is 169kB, statically linked against musl on
> x86_64. That's neat. I expected it to be at least twice bigger. And
> the data/bss isn't too bad either: only 2 pages. But that's because
> execline programs use very little global memory in the first place -
> the only places where globals are used is when state needs to be
> accessed by signal handlers, and there's nothing I can do about that in
> short-lived programs. (Long-lived programs use a selfpipe, so only
> one int of global data is ever needed for them.)

Yes, this is only possible because you did a very good job in the first
place. Good work! This cannot be said enough.


>  So, all in all, much better results than I expected, it was a pleasant
> surprise. Still, concatenating all the code feels really clunky, and
> a real multicall program needs to be designed for this from the start,
> which won't happen for execline in the foreseeable future, so this is
> as much as you get for now.
> 
>  If you're interested in hacking the thing, the magic happens in
> tools/gen-multicall.sh.

Will have a further look tomorrow.

I was also curious about starting time and should have done that in my
previous mail, it's a bit slower as expected.
Running 'execline-cd /' 50 times followed by 'true', I get:

$ hyperfine --warmup 5 -N 'cd / cd / ... true'
(multi)
  Time (mean ± σ):  21.9 ms ±   2.1 ms[User: 5.0 ms, System: 16.7 ms]
  Range (min … max):17.2 ms …  26.7 ms148 runs
(original)
  Time (mean ± σ):  21.7 ms ±   1.9 ms[User: 5.0 ms, System: 16.5 ms]
  Range (min … max):16.8 ms …  24.9 ms120 runs

so the original binary is slightly faster to load in a benchmark; but
I'm curious at the benefits one would get from not having to look for
and read multiple binaries in more real cases...
Either way it probably will be hard to notice.

> >  libexecline is statically linked, so these pages aren't shared afaik?
> 
>  That's right, I forgot it was always statically linked. If it helps,
> changing ${LIBEXECLINE} to -lexecline in the src/execline/deps-exe
> files, then running ./tools/gen-deps.sh > package/deps.mak, should
> allow you to link against a dynamic libexecline. Can you do it, see
> how much space you gain? That's a configuration I would definitely
> support, even if it's slower - people usually love shared libraries.

I think the main reason to like shared libraries as a distribution is
that if you upgrade it, you get the upgrade for all programs that depend
on it -- which isn't reall a problem for this.

Regardless curiosity triumphs, the gains aren't as big as I thought: text
going from 231914 to 186776, which is still twice over the multi binary
(75066 in this configuration).
The rest will be C runtime...

(data is slighly lower (~400 bytes), bss slightly bigger (~50 bytes), but
not different enough to matter)

>  Sigh. I shouldn't feel that way, and any potential improvement should
> be a source of joy, not dread - but really I wish the results weren't
> so good. Now Pandora's box has been opened and everyone will want to
> use the multicall exclusively, so at some point I'll have to support it,
> i.e. ensure it's actually correct and enhance its maintainability.
> And that means a lot more work. :(

At the risk of repeating myself, I'll be happy to help with anything
related to this -- that's the least I can do given I brought it up.

I understand you probably consider it's faster to do it yourself and
it's your baby, but this is probably orthogonal enough to be worth
delegating a bit. Well, ultimately it's up to you :)


Cheers,
-- 
Dominique


Re: single-binary for execline programs?

2023-02-01 Thread Laurent Bercot




allow you to link against a dynamic libexecline. Can you do it, see
how much space you gain? That's a configuration I would definitely
support, even if it's slower - people usually love shared libraries.


 I'm tired. This configuration is obviously already supported, and no
need to patch. You just need to ./configure --disable-allstatic.

 Compared to fully static binaries with musl, a fully static multicall
is a size gain of 87%. Compared to fully dynamic binaries, it's a gain
of 69%. Still impressive.

 A fully dynamic multicall binary is only 80 kB on x86_64, but it's
a pretty stupid configuration because it's the only user of libexecline
so nothing is gained by sharing it. There still may be a case for
sharing libskarnet; but toolchains make it so difficult to link against
some libraries statically and some others dynamically that supporting
that configuration is just not worth it.

--
 Laurent



Re: single-binary for execline programs?

2023-02-01 Thread Laurent Bercot

Look, here's a trivial, suboptimal wrapper, far from pretty:

> (...)

(look, I said it wasn't pretty -- there are at least a dozen of problems
with this, but nothing a day of work I offered to do can't fix; I wrote
this because it was faster than talking without a concrete example to
have some figures, and that took me less time than the rest of this mail)


 Damn you to the nine circles of Hell, one by one, slowly, then all of
them at the same time.
 You piqued my curiosity, so I did it, and I spent the day making it 
work.


 The execline git now has a 'multicall' make target. It will make an
"execline" binary that has *everything* in it. You can symlink it to
the name of an execline program and it will do what you expect. You can
also call the subcommand as argv[1]: "execline exit 3" will exit 3.

 No install targets, no automatic stripping, no symlinks, nothing.
I don't want to officially support this configuration, because I *know*
it will be a time sink - every ricer on the planet will want me to 
change

something. So you get the binary for your own enjoyment, and that's it.
Have fun. If it breaks, you get to keep both pieces.

 It's really rough: it only marginally improves on your model, fixing
the most glaring problems. The only fancy thing it does is find the
applet via bsearch(), because that's easy and it saves about 20 strcmp()
per call. Apart from that, it's super dumb.

 That said, you were right: that's some pretty hefty saving of disk
space. The execline binary is 169kB, statically linked against musl on
x86_64. That's neat. I expected it to be at least twice bigger. And
the data/bss isn't too bad either: only 2 pages. But that's because
execline programs use very little global memory in the first place -
the only places where globals are used is when state needs to be
accessed by signal handlers, and there's nothing I can do about that in
short-lived programs. (Long-lived programs use a selfpipe, so only
one int of global data is ever needed for them.)

 So, all in all, much better results than I expected, it was a pleasant
surprise. Still, concatenating all the code feels really clunky, and
a real multicall program needs to be designed for this from the start,
which won't happen for execline in the foreseeable future, so this is
as much as you get for now.

 If you're interested in hacking the thing, the magic happens in
tools/gen-multicall.sh.



 libexecline is statically linked, so these pages aren't shared afaik?


 That's right, I forgot it was always statically linked. If it helps,
changing ${LIBEXECLINE} to -lexecline in the src/execline/deps-exe
files, then running ./tools/gen-deps.sh > package/deps.mak, should
allow you to link against a dynamic libexecline. Can you do it, see
how much space you gain? That's a configuration I would definitely
support, even if it's slower - people usually love shared libraries.



I really don't see what's different between e.g. execline and coreutils,
who apparently thought it was worth it;


 coreutils also thought it was worth it to implement true --help and
true --version, so I'll leave to your imagination how much I value their
technical judgment.
 The only way to know for sure whether it will be worth it is to stop
speculating and start profiling, which is what I did. And it appears
the results are interesting, so, that's great!

 Sigh. I shouldn't feel that way, and any potential improvement should
be a source of joy, not dread - but really I wish the results weren't
so good. Now Pandora's box has been opened and everyone will want to
use the multicall exclusively, so at some point I'll have to support it,
i.e. ensure it's actually correct and enhance its maintainability.
And that means a lot more work. :(



But, unfortunately for you, the full openrc suite is 2.2MB (5 on arm
with bloated aarch64), which is a bit less than the s6 suite :-D


 No, that's fair. It's true that s6 takes a bit more disk space.
Where OpenRC loses is RAM and CPU, because it does everything in
shell scripts. And shell scripts definitely win on disk space. :)

--
 Laurent



Re: single-binary for execline programs?

2023-02-01 Thread Dominique Martinet
Laurent Bercot wrote on Wed, Feb 01, 2023 at 10:41:39AM +:
> > I'd go out a limb and say if you only support single-binary mode, some
> > of the code could be simplified further by sharing some argument
> > handling, but it's hard to do simpler than your exlsn_main wrapper so
> > it'll likely be identical with individual programs not changing at all,
> > with just an extra shim to wrap them all; it's not like busybox where
> > individual binaries can be selected so a static wrapper would be dead
> > simple.
> 
>  I doubt much sharing would be possible.
> 
>  The main problem I have with multicall is that the program's
> functionality changes depending on argv[0]. You need to first select
> on argv[0], and *then* you can parse options and handle arguments.
> Note that each exlsn_* function needs its own call to subgetopt_r(),
> despite the options being very similar because they all fill an
> eltransforminfo_t structure.

Yes, as I've mentioned you've already done a great job at sharing as
much as possible.

I'm not expecting any change to the program.

Look, here's a trivial, suboptimal wrapper, far from pretty:

$ cd execline; ./configure
$ cd src
$ grep -l 'int main' */*.c | while read f; do
b=${f##*/}; b=${b%.c}; b=${b//-/_}
sed -i -e 's/int main/int main_'"$b"'/' "$f"
echo "$b" >> programs
done

$ {
  cat <

int main (int argc, const char **argv, char const *const *envp)
{
   const char *app = strrchr(argv[0], '/');
   if (app) app++; else app=argv[0];
#define APP(name) if (strcmp(app, #name) == 0) return main_##name(argc, argv, 
envp);
EOF
  sed -e 's/.*/APP(&)/' < programs
  cat < wrapper.c
$ gcc -O2 -I$PWD/include -I$PWD/include-local \
*/*.c wrapper.c -o wrapper \
-lskarnet -Wno-implicit-function-declaration
$ ln wrapper execline_cd
$ ln wrapper if
$ ./if true '' ./execline_cd / ls


(look, I said it wasn't pretty -- there are at least a dozen of problems
with this, but nothing a day of work I offered to do can't fix; I wrote
this because it was faster than talking without a concrete example to
have some figures, and that took me less time than the rest of this mail)

$ size wrapper
   textdata bss dec hex filename
  9716728601136  101163   18b2b wrapper (glibc)
  9852928361264  102629   190e5 wrapper (musl)

>  Having a shim over *all* the execline programs would be that,
> multiplied by the number of programs; at the source level, there would
> not be any significant refactoring, because each program is pretty much
> its own thing. An executable is its own atomic unit, more or less.
> 
>  If anything, execline is the package that's the *least* adapted to
> multicall because of this. There is no possible sharing between
> "if" and "piperw", for instance, because these are two small units with
> very distinct functionality. The only way to make execline suited to
> multicall would be to entirely refactor the code of the executables and
> make a giant library, à la busybox. And I am familiar enough with
> analyzing and patching busybox that I certainly do not want to add that
> kind of maintenance nightmare to execline.

I don't think any more refactoring would be useful, I don't see the
problem of looking at argv[0] first independantly... And gcc still found
quite a bit to share as the sum of all text segments of all binaries
goes to ~235000; many binaries really do sum up.
(And that's before ELF/ld overhead)

>  Anything that can be shared in execline is pretty much already shared
> in libexecline. If you build execline with full shared libraries, you
> get as much code sharing as is reasonably accessible without a complete
> rearchitecture.

libexecline is statically linked, so these pages aren't shared afaik?

My understanding is that if any symbol from a compilation unit (a .lo in
the .a) are used, the whole unit is going to be duplicated there, and
runtime has no way of figuring that out.
Of course, C runtime also probably amounts for a part of that
difference.

>  The "one unique binary" argument applies better to some of my other
> software; for instance, the latest s6-instance-* additions to s6.
> I considered making a unique "s6-instance" binary, with varying
> functionality depending on an argv[1] subcommand; I eventually decided
> against it because it would have broken UI consistency with the rest of
> s6, but it would have been a reasonable choice for this set of programs -
> which are already thin wrappers around library calls and share a lot
> of code. Same thing with s6-fdholder-*.
>  execline binaries, by contrast, are all over the place, and *not* good
> candidates for multicall.

I really don't see what's different between e.g. execline and coreutils,
who apparently thought it was worth it; but, sure, there are other
targets (and as said below some that you aren't working on)

Then again, a multicall coreutils does not seem to care about data/bss:
   textdata bss dec hex filename
1208372  

Re: single-binary for execline programs?

2023-02-01 Thread Laurent Bercot

I believe I did my homework looking first -- are there other discussion
channels than this list that one should be aware of?


 The lists are definitely the only place you *should* be aware of, but
there are a lot of informal spaces where discussions happen, because not
everyone is as well-behaved as you are :) Github issues, webforums of
other projects, IRC channels, etc.
 The important stuff normally only happens here, but I'm getting user
feedback from several sources.



I'd go out a limb and say if you only support single-binary mode, some
of the code could be simplified further by sharing some argument
handling, but it's hard to do simpler than your exlsn_main wrapper so
it'll likely be identical with individual programs not changing at all,
with just an extra shim to wrap them all; it's not like busybox where
individual binaries can be selected so a static wrapper would be dead
simple.


 I doubt much sharing would be possible.

 The main problem I have with multicall is that the program's
functionality changes depending on argv[0]. You need to first select
on argv[0], and *then* you can parse options and handle arguments.
Note that each exlsn_* function needs its own call to subgetopt_r(),
despite the options being very similar because they all fill an
eltransforminfo_t structure.

 Having a shim over *all* the execline programs would be that,
multiplied by the number of programs; at the source level, there would
not be any significant refactoring, because each program is pretty much
its own thing. An executable is its own atomic unit, more or less.

 If anything, execline is the package that's the *least* adapted to
multicall because of this. There is no possible sharing between
"if" and "piperw", for instance, because these are two small units with
very distinct functionality. The only way to make execline suited to
multicall would be to entirely refactor the code of the executables and
make a giant library, à la busybox. And I am familiar enough with
analyzing and patching busybox that I certainly do not want to add that
kind of maintenance nightmare to execline.

 Anything that can be shared in execline is pretty much already shared
in libexecline. If you build execline with full shared libraries, you
get as much code sharing as is reasonably accessible without a complete
rearchitecture.
 Any significant disk space you would gain in a multicall binary
compared to a bunch of dynamically linked executables would come from
the deduplication of unavoidable ELF boilerplate and C run-time, and
that's basically it.

 The "one unique binary" argument applies better to some of my other
software; for instance, the latest s6-instance-* additions to s6.
I considered making a unique "s6-instance" binary, with varying
functionality depending on an argv[1] subcommand; I eventually decided
against it because it would have broken UI consistency with the rest of
s6, but it would have been a reasonable choice for this set of programs 
-

which are already thin wrappers around library calls and share a lot
of code. Same thing with s6-fdholder-*.
 execline binaries, by contrast, are all over the place, and *not* good
candidates for multicall.



Hmm, I'd need to do some measurements, but my impression would be that
since the overall size is smaller it should pay off for any pipeline
calling more than a handful of binaries, as you'll benefit from running
the same binary multiple times rather than having to look through
multiple binaries (even without optimizing the execs out).


 Yes, you might win a few pages by sharing the text, but I'm more
concerned about bss and data. Although I take some care in minimizing
globals, I know that in my typical small programs, it won't matter if
I add an int global, because the amount of global data I need will
never reach 4k, so it won't map an extra page.

 When you start aggregating applets, the cost of globals skyrockets.
You need to pay extra attention to every piece of data. Let me bring
the example of busybox again: vda, the maintainer, does an excellent
job of keeping the bss/data overhead low (only 2 pages of global
private/dirty), but that's at the price of keeping it front and
center, always, when reviewing and merging patches, and nacking stuff
that would otherwise be a significant improvement. It's *hard*, and
hampers code agility in a serious way. I don't want that.

 Sure, you can say that globals are a bad idea anyway, but a lot of
programs need *some* state, if local to a TU - and the C and ELF models
make it so that TU-local variables still end up in the global data
section.



Even almost 1MB (the x86_64 version that doesn't have the problem,
package currently 852KB installed size + filesystem overhead..) is
still something I consider big for the systems I'm building, even
without the binutils issue it's getting harder to fit in a complete
rootfs in 100MB.


 I will never understand how disk space is an issue for execline and s6.
 RAM absolutely is, because 

Re: single-binary for execline programs?

2023-01-31 Thread Dominique Martinet
alice wrote on Wed, Feb 01, 2023 at 07:22:14AM +0100:
> > Just looking at the s6 suite (s6, s6-rc, execline, skalibs,
> > s6-linux-init) I'm looking at a 3MB increase (because I won't be able to
> > get rid of openrc for compatibility with user scripts it'll have to live
> > in compat hooks...) ; being able to shave ~700KB of that would be
> > very interesting for me (number from linking all .c together with a
> > dummy main wrapper, down 148KB)
> > (s6-* dozen of binaries being another similar target and would shave a
> > bit more as well, build systems being similar I was hoping it could go
> > next if this had been well received)
> 
> out of (somewhat off-topic) curiosity, what is the layout here?
> the general answer to such a case generally is:
> "sure, it's 3MB. but it's a well-implemented well-oiled well-used 3MB, and the
> 'business software' is hundreds of times that", but maybe this is something
> special?
> 
> given the (below) talk of inexperienced users, it makes me wonder if 
> everything
> is in this 100mb, or if it's only a reserved rootfs for you while the rest if
> customer-used.

Exactly, we have a double-copy rootfs that we try to keep as small as
possible and update atomically; then the rest of the eMMC/SD card is
left for user containers in a separate partition.

(I actually lied a bit here because the container runtime itself depends
on podman, which is a huge go binary that in itself doubles the rootfs
size ; and we cut some slack space that made the alpine 3.17 update
possible, but I'm definitely counting each MB at this point, it's
getting difficult to just install a debug kernel now...
I don't want to waste anyone's time, which is why I offered to do it,
but reviews still take time and as said previously I won't push
further.

I'd suggest "there's more information here", but it's all in Japanese:
https://armadillo.atmark-techno.com/guide/armadillo-base-os
You'd probably learn more from the rootfs[1] and update scripts[2]
directly:
[1] (rootfs content binary, 57MB)
https://armadillo.atmark-techno.com/files/downloads/armadillo-iot-g4/baseos/baseos-x2-3.17.1-at.2.tar.zst
[2] (rootfs and image builder)
https://armadillo.atmark-techno.com/files/downloads/armadillo-iot-g4/tool/build-rootfs-v3.17-at.2.tar.gz
[3] https://github.com/atmark-techno/mkswu/

There's probably plenty to improve and I never got any external
feedback, so feel free to break everything and ask if you have time, I'd
be curious if you can make sense of it without the japanese docs)

> > >  You're not the only one who is uncomfortable with it, but it's really a
> > > perception thing. There has never been a problem caused by it. Shells
> > > don't get confused. External tools don't get confused. On this aspect,
> > > Unix is a lot more correct and resilient than you give it credit for. :)
> >
> > Shells and external tools would definitely be fine, they're not looking
> > there in the first place.
> > I think you're underestimating what users who haven't used a unix before
> > can do though; I can already picture some rummaging in /bin and
> > wondering why posix-cd "doesn't work" or something... We get impressive
> > questions sometimes.
> 
> i definitely feel for you there (in regards to inexperienced user questions),
> but i'd say that generally very low level (relatively) systems integration
> software is not the place where "inexperienced user support" is in-scope. the
> easiest answer is that if somebody runs into such a scenario, they'll just 
> have
> to learn the answer to the question, not have an "answer" pre-implemented for
> them via a workaround (such as this one) that removes it.
> 
> that is to say, resolving this (question-)case specifically would not be a
> benefit for execline itself.

Right, this is also completely off topic, I probably shouldn't have
started digging this hole :)
I'm not arguing for removing the installed tools here, just saying I
will likely get some questions about it once we make the switch.

Nothing that cannot be dealt with, but I am a greedy dev so I started
asking for too much.

-- 
Dominique Martinet | Asmadeus


Re: single-binary for execline programs?

2023-01-31 Thread alice
On Wed Feb 1, 2023 at 6:58 AM CET, Dominique Martinet wrote:
> Laurent Bercot wrote on Wed, Feb 01, 2023 at 04:49:47AM +:
> > > It should be fairly easy to do something like coreutils'
> > > --enable-single-binary without much modification
> > 
> >  The subject has come up a few times recently,
>
> I believe I did my homework looking first -- are there other discussion
> channels than this list that one should be aware of?
>
> > so, at the risk of being
> > blunt, I will make it very clear and definitive, for future reference:
> >
> >  No. It will not happen.
>
> Well, thanks for the clear answer, I'm glad I asked first!
>
> I'm a sore loser though, so I'll develop a bit more below. You've
> probably got better to do so feel free to just say you're not changing
> your mind or pointing me at the other discussions and I'll stop bugging
> you.
>
> >  The fact that toolchains are becoming worse and worse is not imputable
> > to execline, or to the way I write or package software. It has always
> > been possible, and reasonable, to provide a lot of small binaries.
> > Building a binary is not inherently more complicated today than it was
> > 20 years ago. There is no fundamental reason why this should change; the
> > only reason why people are even thinking this is that there is an
> > implicit assumption that software always becomes better with time, and
> > using the latest versions is always a good idea. I am guilty of this
> > too.
> > 
> >  This assumption is true when it comes to bugs, but it becomes false if
> > the main functionality of a project is impacted.
> >  If a newer version of binutils is unable to produce reasonably small
> > binaries, to the point that it incites software developers to change
> > their packaging to accommodate the tool, then it's not an improvement,
> > it's a recession. And the place to fix it is binutils.
>
> I definitely agree with this, I reported the problem in the bz I linked,
> and the reception has been rather good -- I trust we'll get back to
> smaller binaries in the next version or otherwise near future.
>  
> >  Multicall binaries have costs, mostly maintainability costs.
> > Switching from a multiple binaries model to a multicall binary model
> > because the tooling is making the multiple binaries model unusably
> > expensive is basically moving the burden from the tooling to the
> > maintainer. Here's a worse tool, do more effort to accommodate it!
>
> I guess it isn't completely free, but it certainly isn't heavy if the
> abstraction isn't done too badly.
>
> I'd go out a limb and say if you only support single-binary mode, some
> of the code could be simplified further by sharing some argument
> handling, but it's hard to do simpler than your exlsn_main wrapper so
> it'll likely be identical with individual programs not changing at all,
> with just an extra shim to wrap them all; it's not like busybox where
> individual binaries can be selected so a static wrapper would be dead
> simple.
>
> >  Additionally to maintainability costs, multicall binaries also have a
> > small cost in CPU usage (binary starting time) and RAM usage (larger
> > mappings, fewer memory optimizations) compared to multiple binaries.
> > These costs are paid not by the maintainer, but by the users.
>
> Hmm, I'd need to do some measurements, but my impression would be that
> since the overall size is smaller it should pay off for any pipeline
> calling more than a handful of binaries, as you'll benefit from running
> the same binary multiple times rather than having to look through
> multiple binaries (even without optimizing the execs out).
>
> Memory in particular ought to be shared for r-x pages, or there's some
> problem with the system. I'm not sure if it'll lazily load only the
> pages it requires for execution or if some readahead will read it all
> (it probably should), but once it's read it shouldn't take space
> multiple times, so multiple binaries is likely to take more space when
> you include vfs cache as soon as you call a few in a row; memory usage
> should be mostly identical to disk usage in practice.
>
> Anyway, I'll concede that in doubt, let's call it a space vs. speed
> tradeoff where I'm favoring space.
>
> >  Well, no. If having a bunch of execline binaries becomes more expensive
> > in disk space because of an "upgrade" in binutils, that is a binutils
> > problem, and the place to fix it is binutils.
>
> I shouldn't have brought up the binutils bug.
> Even almost 1MB (the x86_64 version that doesn't have the problem,
> package currently 852KB installed size + filesystem overhead..) is
> still something I consider big for the systems I'm building, even
> without the binutils issue it's getting harder to fit in a complete
> rootfs in 100MB.
>
> Just looking at the s6 suite (s6, s6-rc, execline, skalibs,
> s6-linux-init) I'm looking at a 3MB increase (because I won't be able to
> get rid of openrc for compatibility with user scripts it'll have to live
> in compat 

Re: single-binary for execline programs?

2023-01-31 Thread Dominique Martinet
Laurent Bercot wrote on Wed, Feb 01, 2023 at 04:49:47AM +:
> > It should be fairly easy to do something like coreutils'
> > --enable-single-binary without much modification
> 
>  The subject has come up a few times recently,

I believe I did my homework looking first -- are there other discussion
channels than this list that one should be aware of?

> so, at the risk of being
> blunt, I will make it very clear and definitive, for future reference:
>
>  No. It will not happen.

Well, thanks for the clear answer, I'm glad I asked first!

I'm a sore loser though, so I'll develop a bit more below. You've
probably got better to do so feel free to just say you're not changing
your mind or pointing me at the other discussions and I'll stop bugging
you.

>  The fact that toolchains are becoming worse and worse is not imputable
> to execline, or to the way I write or package software. It has always
> been possible, and reasonable, to provide a lot of small binaries.
> Building a binary is not inherently more complicated today than it was
> 20 years ago. There is no fundamental reason why this should change; the
> only reason why people are even thinking this is that there is an
> implicit assumption that software always becomes better with time, and
> using the latest versions is always a good idea. I am guilty of this
> too.
> 
>  This assumption is true when it comes to bugs, but it becomes false if
> the main functionality of a project is impacted.
>  If a newer version of binutils is unable to produce reasonably small
> binaries, to the point that it incites software developers to change
> their packaging to accommodate the tool, then it's not an improvement,
> it's a recession. And the place to fix it is binutils.

I definitely agree with this, I reported the problem in the bz I linked,
and the reception has been rather good -- I trust we'll get back to
smaller binaries in the next version or otherwise near future.
 
>  Multicall binaries have costs, mostly maintainability costs.
> Switching from a multiple binaries model to a multicall binary model
> because the tooling is making the multiple binaries model unusably
> expensive is basically moving the burden from the tooling to the
> maintainer. Here's a worse tool, do more effort to accommodate it!

I guess it isn't completely free, but it certainly isn't heavy if the
abstraction isn't done too badly.

I'd go out a limb and say if you only support single-binary mode, some
of the code could be simplified further by sharing some argument
handling, but it's hard to do simpler than your exlsn_main wrapper so
it'll likely be identical with individual programs not changing at all,
with just an extra shim to wrap them all; it's not like busybox where
individual binaries can be selected so a static wrapper would be dead
simple.

>  Additionally to maintainability costs, multicall binaries also have a
> small cost in CPU usage (binary starting time) and RAM usage (larger
> mappings, fewer memory optimizations) compared to multiple binaries.
> These costs are paid not by the maintainer, but by the users.

Hmm, I'd need to do some measurements, but my impression would be that
since the overall size is smaller it should pay off for any pipeline
calling more than a handful of binaries, as you'll benefit from running
the same binary multiple times rather than having to look through
multiple binaries (even without optimizing the execs out).

Memory in particular ought to be shared for r-x pages, or there's some
problem with the system. I'm not sure if it'll lazily load only the
pages it requires for execution or if some readahead will read it all
(it probably should), but once it's read it shouldn't take space
multiple times, so multiple binaries is likely to take more space when
you include vfs cache as soon as you call a few in a row; memory usage
should be mostly identical to disk usage in practice.

Anyway, I'll concede that in doubt, let's call it a space vs. speed
tradeoff where I'm favoring space.

>  Well, no. If having a bunch of execline binaries becomes more expensive
> in disk space because of an "upgrade" in binutils, that is a binutils
> problem, and the place to fix it is binutils.

I shouldn't have brought up the binutils bug.
Even almost 1MB (the x86_64 version that doesn't have the problem,
package currently 852KB installed size + filesystem overhead..) is
still something I consider big for the systems I'm building, even
without the binutils issue it's getting harder to fit in a complete
rootfs in 100MB.

Just looking at the s6 suite (s6, s6-rc, execline, skalibs,
s6-linux-init) I'm looking at a 3MB increase (because I won't be able to
get rid of openrc for compatibility with user scripts it'll have to live
in compat hooks...) ; being able to shave ~700KB of that would be
very interesting for me (number from linking all .c together with a
dummy main wrapper, down 148KB)
(s6-* dozen of binaries being another similar target and would shave a
bit more 

Re: single-binary for execline programs?

2023-01-31 Thread Laurent Bercot

In particular there's a "feature" with recent binutils that makes every
binary be at least 64KB on arm/aarch64[1], so the execline package is a
whopping 3.41MB[2] there (... and still 852KB on x86_64[3]) -- whereas
just doing a dummy sed to avoid conflict on main and bundling all .c
together in a single binary yields just 148KB (x86_64 but should be
similar on all archs -- we're talking x20 bloat from aarch64/armv7
sizes! Precious memory and disk space!)

> (...)

It should be fairly easy to do something like coreutils'
--enable-single-binary without much modification


 The subject has come up a few times recently, so, at the risk of being
blunt, I will make it very clear and definitive, for future reference:

 No. It will not happen.

 The fact that toolchains are becoming worse and worse is not imputable
to execline, or to the way I write or package software. It has always
been possible, and reasonable, to provide a lot of small binaries.
Building a binary is not inherently more complicated today than it was
20 years ago. There is no fundamental reason why this should change; the
only reason why people are even thinking this is that there is an
implicit assumption that software always becomes better with time, and
using the latest versions is always a good idea. I am guilty of this
too.

 This assumption is true when it comes to bugs, but it becomes false if
the main functionality of a project is impacted.
 If a newer version of binutils is unable to produce reasonably small
binaries, to the point that it incites software developers to change
their packaging to accommodate the tool, then it's not an improvement,
it's a recession. And the place to fix it is binutils.
 The tooling should be at the service of programmers, not the other way
around.

 It is a similar issue when glibc makes it expensive in terms of RAM to
run a large number of copies of the same process. Linux, like other
Unix-like kernels, is very efficient at this, and shares everything that
can be shared, but glibc performs *a lot* of private mappings that incur
considerable overhead. (See the thread around this message:
https://skarnet.org/lists/supervision/2804.html
for an example.)
 Does that mean that running 100 copies of the same binary is a bad
model? No, it just means that glibc is terrible at that and needs
improvement.

 Back in the day when Solaris was relevant, it had an incredibly
expensive implementation of fork(), which made it difficult, especially
with the processing power of 1990s-era Sun hardware, to write servers
that forked and still served a reasonable number of connections.
It led to emerging "good practices", that were taught by my (otherwise
wonderful) C/Unix programming teacher, and that were: fork as little as
possible, use a single process to do everything. And that's how most
userspace on Solaris worked indeed.
 It did a lot of harm to the ecosystem, turning programs into giant
messes because people did not want to use the primitives that were
available to them for fear of inefficiency, and jumping through hoops
to work around it at the expense of maintainability.
 Switching to Linux and its efficient fork() was a relief.

 Multicall binaries have costs, mostly maintainability costs.
Switching from a multiple binaries model to a multicall binary model
because the tooling is making the multiple binaries model unusably
expensive is basically moving the burden from the tooling to the
maintainer. Here's a worse tool, do more effort to accommodate it!

 Additionally to maintainability costs, multicall binaries also have a
small cost in CPU usage (binary starting time) and RAM usage (larger
mappings, fewer memory optimizations) compared to multiple binaries.
These costs are paid not by the maintainer, but by the users.
Everyone loses.

 Well, no. If having a bunch of execline binaries becomes more expensive
in disk space because of an "upgrade" in binutils, that is a binutils
problem, and the place to fix it is binutils.



In the long run this could also provide a workaround for conflicting
names, cf. old 2016 thread[4], if we'd prefer either running the
appropriate main directly or re-exec'ing into the current binary after
setting argv[0] appropriately for "builtins".


 There have been no conflicts since "import". I do not expect more name
conflicts in the future, and in any case, that is not an issue that
multicall binaries can solve any better than multiple binaries. These
are completely orthogonal things.



(I assume you wouldn't like the idea of not installing the individual
commands, but that'd become a possibility as well. I'm personally a bit
uncomfortable having something in $PATH for 'if' and other commands that
have historically been shell builtins, but have a different usage for
execline...)


 You're not the only one who is uncomfortable with it, but it's really a
perception thing. There has never been a problem caused by it. Shells
don't get confused. External tools don't get confused. On this 

single-binary for execline programs?

2023-01-31 Thread Dominique Martinet
Hello,

I'm currently having a fresh look at s6 on alpine (thanks for the recent
work with dynamic services! Looking forward to seeing it mature!).

One thing that surprised me is how many small executables the programs
come with.
In particular there's a "feature" with recent binutils that makes every
binary be at least 64KB on arm/aarch64[1], so the execline package is a
whopping 3.41MB[2] there (... and still 852KB on x86_64[3]) -- whereas
just doing a dummy sed to avoid conflict on main and bundling all .c
together in a single binary yields just 148KB (x86_64 but should be
similar on all archs -- we're talking x20 bloat from aarch64/armv7
sizes! Precious memory and disk space!)
I'm expectant that binutils will improve that ultimately, but it never
hurts to aim higher :)

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=28824
[2] https://pkgs.alpinelinux.org/package/edge/main/aarch64/execline
[3] https://pkgs.alpinelinux.org/package/edge/main/x86_64/execline

It should be fairly easy to do something like coreutils'
--enable-single-binary without much modification, for example
compile each executable with -Dmain=main_$program and have a small
wrapper that forwards to main_$argv0 (or just rename if that becomes the
default behaviour right away, that'd be even simpler).
I would be happy to contribute to that if you're not against the idea.


In the long run this could also provide a workaround for conflicting
names, cf. old 2016 thread[4], if we'd prefer either running the
appropriate main directly or re-exec'ing into the current binary after
setting argv[0] appropriately for "builtins".
(I assume you wouldn't like the idea of not installing the individual
commands, but that'd become a possibility as well. I'm personally a bit
uncomfortable having something in $PATH for 'if' and other commands that
have historically been shell builtins, but have a different usage for
execline...)

[4] https://skarnet.org/lists/skaware/0737.html


Cheers,
-- 
Dominique