Re: single-binary for execline programs?

2023-01-31 Thread Dominique Martinet
alice wrote on Wed, Feb 01, 2023 at 07:22:14AM +0100:
> > Just looking at the s6 suite (s6, s6-rc, execline, skalibs,
> > s6-linux-init) I'm looking at a 3MB increase (because I won't be able to
> > get rid of openrc for compatibility with user scripts it'll have to live
> > in compat hooks...) ; being able to shave ~700KB of that would be
> > very interesting for me (number from linking all .c together with a
> > dummy main wrapper, down 148KB)
> > (s6-* dozen of binaries being another similar target and would shave a
> > bit more as well, build systems being similar I was hoping it could go
> > next if this had been well received)
> 
> out of (somewhat off-topic) curiosity, what is the layout here?
> the general answer to such a case generally is:
> "sure, it's 3MB. but it's a well-implemented well-oiled well-used 3MB, and the
> 'business software' is hundreds of times that", but maybe this is something
> special?
> 
> given the (below) talk of inexperienced users, it makes me wonder if 
> everything
> is in this 100mb, or if it's only a reserved rootfs for you while the rest if
> customer-used.

Exactly, we have a double-copy rootfs that we try to keep as small as
possible and update atomically; then the rest of the eMMC/SD card is
left for user containers in a separate partition.

(I actually lied a bit here because the container runtime itself depends
on podman, which is a huge go binary that in itself doubles the rootfs
size ; and we cut some slack space that made the alpine 3.17 update
possible, but I'm definitely counting each MB at this point, it's
getting difficult to just install a debug kernel now...
I don't want to waste anyone's time, which is why I offered to do it,
but reviews still take time and as said previously I won't push
further.

I'd suggest "there's more information here", but it's all in Japanese:
https://armadillo.atmark-techno.com/guide/armadillo-base-os
You'd probably learn more from the rootfs[1] and update scripts[2]
directly:
[1] (rootfs content binary, 57MB)
https://armadillo.atmark-techno.com/files/downloads/armadillo-iot-g4/baseos/baseos-x2-3.17.1-at.2.tar.zst
[2] (rootfs and image builder)
https://armadillo.atmark-techno.com/files/downloads/armadillo-iot-g4/tool/build-rootfs-v3.17-at.2.tar.gz
[3] https://github.com/atmark-techno/mkswu/

There's probably plenty to improve and I never got any external
feedback, so feel free to break everything and ask if you have time, I'd
be curious if you can make sense of it without the japanese docs)

> > >  You're not the only one who is uncomfortable with it, but it's really a
> > > perception thing. There has never been a problem caused by it. Shells
> > > don't get confused. External tools don't get confused. On this aspect,
> > > Unix is a lot more correct and resilient than you give it credit for. :)
> >
> > Shells and external tools would definitely be fine, they're not looking
> > there in the first place.
> > I think you're underestimating what users who haven't used a unix before
> > can do though; I can already picture some rummaging in /bin and
> > wondering why posix-cd "doesn't work" or something... We get impressive
> > questions sometimes.
> 
> i definitely feel for you there (in regards to inexperienced user questions),
> but i'd say that generally very low level (relatively) systems integration
> software is not the place where "inexperienced user support" is in-scope. the
> easiest answer is that if somebody runs into such a scenario, they'll just 
> have
> to learn the answer to the question, not have an "answer" pre-implemented for
> them via a workaround (such as this one) that removes it.
> 
> that is to say, resolving this (question-)case specifically would not be a
> benefit for execline itself.

Right, this is also completely off topic, I probably shouldn't have
started digging this hole :)
I'm not arguing for removing the installed tools here, just saying I
will likely get some questions about it once we make the switch.

Nothing that cannot be dealt with, but I am a greedy dev so I started
asking for too much.

-- 
Dominique Martinet | Asmadeus


Re: single-binary for execline programs?

2023-01-31 Thread alice
On Wed Feb 1, 2023 at 6:58 AM CET, Dominique Martinet wrote:
> Laurent Bercot wrote on Wed, Feb 01, 2023 at 04:49:47AM +:
> > > It should be fairly easy to do something like coreutils'
> > > --enable-single-binary without much modification
> > 
> >  The subject has come up a few times recently,
>
> I believe I did my homework looking first -- are there other discussion
> channels than this list that one should be aware of?
>
> > so, at the risk of being
> > blunt, I will make it very clear and definitive, for future reference:
> >
> >  No. It will not happen.
>
> Well, thanks for the clear answer, I'm glad I asked first!
>
> I'm a sore loser though, so I'll develop a bit more below. You've
> probably got better to do so feel free to just say you're not changing
> your mind or pointing me at the other discussions and I'll stop bugging
> you.
>
> >  The fact that toolchains are becoming worse and worse is not imputable
> > to execline, or to the way I write or package software. It has always
> > been possible, and reasonable, to provide a lot of small binaries.
> > Building a binary is not inherently more complicated today than it was
> > 20 years ago. There is no fundamental reason why this should change; the
> > only reason why people are even thinking this is that there is an
> > implicit assumption that software always becomes better with time, and
> > using the latest versions is always a good idea. I am guilty of this
> > too.
> > 
> >  This assumption is true when it comes to bugs, but it becomes false if
> > the main functionality of a project is impacted.
> >  If a newer version of binutils is unable to produce reasonably small
> > binaries, to the point that it incites software developers to change
> > their packaging to accommodate the tool, then it's not an improvement,
> > it's a recession. And the place to fix it is binutils.
>
> I definitely agree with this, I reported the problem in the bz I linked,
> and the reception has been rather good -- I trust we'll get back to
> smaller binaries in the next version or otherwise near future.
>  
> >  Multicall binaries have costs, mostly maintainability costs.
> > Switching from a multiple binaries model to a multicall binary model
> > because the tooling is making the multiple binaries model unusably
> > expensive is basically moving the burden from the tooling to the
> > maintainer. Here's a worse tool, do more effort to accommodate it!
>
> I guess it isn't completely free, but it certainly isn't heavy if the
> abstraction isn't done too badly.
>
> I'd go out a limb and say if you only support single-binary mode, some
> of the code could be simplified further by sharing some argument
> handling, but it's hard to do simpler than your exlsn_main wrapper so
> it'll likely be identical with individual programs not changing at all,
> with just an extra shim to wrap them all; it's not like busybox where
> individual binaries can be selected so a static wrapper would be dead
> simple.
>
> >  Additionally to maintainability costs, multicall binaries also have a
> > small cost in CPU usage (binary starting time) and RAM usage (larger
> > mappings, fewer memory optimizations) compared to multiple binaries.
> > These costs are paid not by the maintainer, but by the users.
>
> Hmm, I'd need to do some measurements, but my impression would be that
> since the overall size is smaller it should pay off for any pipeline
> calling more than a handful of binaries, as you'll benefit from running
> the same binary multiple times rather than having to look through
> multiple binaries (even without optimizing the execs out).
>
> Memory in particular ought to be shared for r-x pages, or there's some
> problem with the system. I'm not sure if it'll lazily load only the
> pages it requires for execution or if some readahead will read it all
> (it probably should), but once it's read it shouldn't take space
> multiple times, so multiple binaries is likely to take more space when
> you include vfs cache as soon as you call a few in a row; memory usage
> should be mostly identical to disk usage in practice.
>
> Anyway, I'll concede that in doubt, let's call it a space vs. speed
> tradeoff where I'm favoring space.
>
> >  Well, no. If having a bunch of execline binaries becomes more expensive
> > in disk space because of an "upgrade" in binutils, that is a binutils
> > problem, and the place to fix it is binutils.
>
> I shouldn't have brought up the binutils bug.
> Even almost 1MB (the x86_64 version that doesn't have the problem,
> package currently 852KB installed size + filesystem overhead..) is
> still something I consider big for the systems I'm building, even
> without the binutils issue it's getting harder to fit in a complete
> rootfs in 100MB.
>
> Just looking at the s6 suite (s6, s6-rc, execline, skalibs,
> s6-linux-init) I'm looking at a 3MB increase (because I won't be able to
> get rid of openrc for compatibility with user scripts it'll have to live
> in compat 

Re: single-binary for execline programs?

2023-01-31 Thread Dominique Martinet
Laurent Bercot wrote on Wed, Feb 01, 2023 at 04:49:47AM +:
> > It should be fairly easy to do something like coreutils'
> > --enable-single-binary without much modification
> 
>  The subject has come up a few times recently,

I believe I did my homework looking first -- are there other discussion
channels than this list that one should be aware of?

> so, at the risk of being
> blunt, I will make it very clear and definitive, for future reference:
>
>  No. It will not happen.

Well, thanks for the clear answer, I'm glad I asked first!

I'm a sore loser though, so I'll develop a bit more below. You've
probably got better to do so feel free to just say you're not changing
your mind or pointing me at the other discussions and I'll stop bugging
you.

>  The fact that toolchains are becoming worse and worse is not imputable
> to execline, or to the way I write or package software. It has always
> been possible, and reasonable, to provide a lot of small binaries.
> Building a binary is not inherently more complicated today than it was
> 20 years ago. There is no fundamental reason why this should change; the
> only reason why people are even thinking this is that there is an
> implicit assumption that software always becomes better with time, and
> using the latest versions is always a good idea. I am guilty of this
> too.
> 
>  This assumption is true when it comes to bugs, but it becomes false if
> the main functionality of a project is impacted.
>  If a newer version of binutils is unable to produce reasonably small
> binaries, to the point that it incites software developers to change
> their packaging to accommodate the tool, then it's not an improvement,
> it's a recession. And the place to fix it is binutils.

I definitely agree with this, I reported the problem in the bz I linked,
and the reception has been rather good -- I trust we'll get back to
smaller binaries in the next version or otherwise near future.
 
>  Multicall binaries have costs, mostly maintainability costs.
> Switching from a multiple binaries model to a multicall binary model
> because the tooling is making the multiple binaries model unusably
> expensive is basically moving the burden from the tooling to the
> maintainer. Here's a worse tool, do more effort to accommodate it!

I guess it isn't completely free, but it certainly isn't heavy if the
abstraction isn't done too badly.

I'd go out a limb and say if you only support single-binary mode, some
of the code could be simplified further by sharing some argument
handling, but it's hard to do simpler than your exlsn_main wrapper so
it'll likely be identical with individual programs not changing at all,
with just an extra shim to wrap them all; it's not like busybox where
individual binaries can be selected so a static wrapper would be dead
simple.

>  Additionally to maintainability costs, multicall binaries also have a
> small cost in CPU usage (binary starting time) and RAM usage (larger
> mappings, fewer memory optimizations) compared to multiple binaries.
> These costs are paid not by the maintainer, but by the users.

Hmm, I'd need to do some measurements, but my impression would be that
since the overall size is smaller it should pay off for any pipeline
calling more than a handful of binaries, as you'll benefit from running
the same binary multiple times rather than having to look through
multiple binaries (even without optimizing the execs out).

Memory in particular ought to be shared for r-x pages, or there's some
problem with the system. I'm not sure if it'll lazily load only the
pages it requires for execution or if some readahead will read it all
(it probably should), but once it's read it shouldn't take space
multiple times, so multiple binaries is likely to take more space when
you include vfs cache as soon as you call a few in a row; memory usage
should be mostly identical to disk usage in practice.

Anyway, I'll concede that in doubt, let's call it a space vs. speed
tradeoff where I'm favoring space.

>  Well, no. If having a bunch of execline binaries becomes more expensive
> in disk space because of an "upgrade" in binutils, that is a binutils
> problem, and the place to fix it is binutils.

I shouldn't have brought up the binutils bug.
Even almost 1MB (the x86_64 version that doesn't have the problem,
package currently 852KB installed size + filesystem overhead..) is
still something I consider big for the systems I'm building, even
without the binutils issue it's getting harder to fit in a complete
rootfs in 100MB.

Just looking at the s6 suite (s6, s6-rc, execline, skalibs,
s6-linux-init) I'm looking at a 3MB increase (because I won't be able to
get rid of openrc for compatibility with user scripts it'll have to live
in compat hooks...) ; being able to shave ~700KB of that would be
very interesting for me (number from linking all .c together with a
dummy main wrapper, down 148KB)
(s6-* dozen of binaries being another similar target and would shave a
bit more 

Re: s6 instanced services are "forgotten" after s6-rc-update

2023-01-31 Thread Laurent Bercot




Agree on avoiding restarting old instances. If instances were atomic
services, s6-rc-update wouldn't restart them either.

OTOH, the template's files are copied, not symlinked, which means
restarting old instances will use the old template. Does this call for
an s6-instance-update program?


 The fix I currently have in git does exactly that: instances are now
correctly transmitted across s6-rc-update, and not restarted; the new
template is copied, but it's not copied to existing instances, it will
only be used for new ones. To get the new template on an existing
instance, you need s6-instance-delete + s6-instance-create.

 There may indeed be some value to an s6-instance-update program that
would provide a new template to an existing instance, with an option
to immediately restart the instance or not. I'll think about it some 
more,

inputs welcome.

--
 Laurent



Re: single-binary for execline programs?

2023-01-31 Thread Laurent Bercot

In particular there's a "feature" with recent binutils that makes every
binary be at least 64KB on arm/aarch64[1], so the execline package is a
whopping 3.41MB[2] there (... and still 852KB on x86_64[3]) -- whereas
just doing a dummy sed to avoid conflict on main and bundling all .c
together in a single binary yields just 148KB (x86_64 but should be
similar on all archs -- we're talking x20 bloat from aarch64/armv7
sizes! Precious memory and disk space!)

> (...)

It should be fairly easy to do something like coreutils'
--enable-single-binary without much modification


 The subject has come up a few times recently, so, at the risk of being
blunt, I will make it very clear and definitive, for future reference:

 No. It will not happen.

 The fact that toolchains are becoming worse and worse is not imputable
to execline, or to the way I write or package software. It has always
been possible, and reasonable, to provide a lot of small binaries.
Building a binary is not inherently more complicated today than it was
20 years ago. There is no fundamental reason why this should change; the
only reason why people are even thinking this is that there is an
implicit assumption that software always becomes better with time, and
using the latest versions is always a good idea. I am guilty of this
too.

 This assumption is true when it comes to bugs, but it becomes false if
the main functionality of a project is impacted.
 If a newer version of binutils is unable to produce reasonably small
binaries, to the point that it incites software developers to change
their packaging to accommodate the tool, then it's not an improvement,
it's a recession. And the place to fix it is binutils.
 The tooling should be at the service of programmers, not the other way
around.

 It is a similar issue when glibc makes it expensive in terms of RAM to
run a large number of copies of the same process. Linux, like other
Unix-like kernels, is very efficient at this, and shares everything that
can be shared, but glibc performs *a lot* of private mappings that incur
considerable overhead. (See the thread around this message:
https://skarnet.org/lists/supervision/2804.html
for an example.)
 Does that mean that running 100 copies of the same binary is a bad
model? No, it just means that glibc is terrible at that and needs
improvement.

 Back in the day when Solaris was relevant, it had an incredibly
expensive implementation of fork(), which made it difficult, especially
with the processing power of 1990s-era Sun hardware, to write servers
that forked and still served a reasonable number of connections.
It led to emerging "good practices", that were taught by my (otherwise
wonderful) C/Unix programming teacher, and that were: fork as little as
possible, use a single process to do everything. And that's how most
userspace on Solaris worked indeed.
 It did a lot of harm to the ecosystem, turning programs into giant
messes because people did not want to use the primitives that were
available to them for fear of inefficiency, and jumping through hoops
to work around it at the expense of maintainability.
 Switching to Linux and its efficient fork() was a relief.

 Multicall binaries have costs, mostly maintainability costs.
Switching from a multiple binaries model to a multicall binary model
because the tooling is making the multiple binaries model unusably
expensive is basically moving the burden from the tooling to the
maintainer. Here's a worse tool, do more effort to accommodate it!

 Additionally to maintainability costs, multicall binaries also have a
small cost in CPU usage (binary starting time) and RAM usage (larger
mappings, fewer memory optimizations) compared to multiple binaries.
These costs are paid not by the maintainer, but by the users.
Everyone loses.

 Well, no. If having a bunch of execline binaries becomes more expensive
in disk space because of an "upgrade" in binutils, that is a binutils
problem, and the place to fix it is binutils.



In the long run this could also provide a workaround for conflicting
names, cf. old 2016 thread[4], if we'd prefer either running the
appropriate main directly or re-exec'ing into the current binary after
setting argv[0] appropriately for "builtins".


 There have been no conflicts since "import". I do not expect more name
conflicts in the future, and in any case, that is not an issue that
multicall binaries can solve any better than multiple binaries. These
are completely orthogonal things.



(I assume you wouldn't like the idea of not installing the individual
commands, but that'd become a possibility as well. I'm personally a bit
uncomfortable having something in $PATH for 'if' and other commands that
have historically been shell builtins, but have a different usage for
execline...)


 You're not the only one who is uncomfortable with it, but it's really a
perception thing. There has never been a problem caused by it. Shells
don't get confused. External tools don't get confused. On this 

single-binary for execline programs?

2023-01-31 Thread Dominique Martinet
Hello,

I'm currently having a fresh look at s6 on alpine (thanks for the recent
work with dynamic services! Looking forward to seeing it mature!).

One thing that surprised me is how many small executables the programs
come with.
In particular there's a "feature" with recent binutils that makes every
binary be at least 64KB on arm/aarch64[1], so the execline package is a
whopping 3.41MB[2] there (... and still 852KB on x86_64[3]) -- whereas
just doing a dummy sed to avoid conflict on main and bundling all .c
together in a single binary yields just 148KB (x86_64 but should be
similar on all archs -- we're talking x20 bloat from aarch64/armv7
sizes! Precious memory and disk space!)
I'm expectant that binutils will improve that ultimately, but it never
hurts to aim higher :)

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=28824
[2] https://pkgs.alpinelinux.org/package/edge/main/aarch64/execline
[3] https://pkgs.alpinelinux.org/package/edge/main/x86_64/execline

It should be fairly easy to do something like coreutils'
--enable-single-binary without much modification, for example
compile each executable with -Dmain=main_$program and have a small
wrapper that forwards to main_$argv0 (or just rename if that becomes the
default behaviour right away, that'd be even simpler).
I would be happy to contribute to that if you're not against the idea.


In the long run this could also provide a workaround for conflicting
names, cf. old 2016 thread[4], if we'd prefer either running the
appropriate main directly or re-exec'ing into the current binary after
setting argv[0] appropriately for "builtins".
(I assume you wouldn't like the idea of not installing the individual
commands, but that'd become a possibility as well. I'm personally a bit
uncomfortable having something in $PATH for 'if' and other commands that
have historically been shell builtins, but have a different usage for
execline...)

[4] https://skarnet.org/lists/skaware/0737.html


Cheers,
-- 
Dominique


Re: s6 instanced services are "forgotten" after s6-rc-update

2023-01-31 Thread Carlos Eduardo
Agree on avoiding restarting old instances. If instances were atomic
services, s6-rc-update wouldn't restart them either.

OTOH, the template's files are copied, not symlinked, which means
restarting old instances will use the old template. Does this call for
an s6-instance-update program?

Em ter., 31 de jan. de 2023 às 08:08, Laurent Bercot
 escreveu:
>
>
> >I can provide an strace of s6-rc-update if needed. Looking into it, it
> >seems s6-rc-update "uncritically" unlinks the live instance/ and instances/
> >folders and replaces them with brand-new copies from the compiled database.
>
>   I can confirm that this happens and that it was an oversight; I'm now
> in the process of fixing it (which will involve a few changes to s6
> ending
> up in a major update, I'm afraid).
>
>   A question I have is: what should s6-rc-update do when the template has
> changed? The template will obviously be changed in the new service, but
> should the old instances stay alive, with the old template? My natural
> inclinaton is to say yes; if the user wants the service restarted they
> can say so explicitly in the conversion file. But maybe there are better
> alternatives I haven't thought about.
>
> --
>   Laurent
>


Re: s6 instanced services are "forgotten" after s6-rc-update

2023-01-31 Thread Laurent Bercot




I can provide an strace of s6-rc-update if needed. Looking into it, it
seems s6-rc-update "uncritically" unlinks the live instance/ and instances/
folders and replaces them with brand-new copies from the compiled database.


 I can confirm that this happens and that it was an oversight; I'm now
in the process of fixing it (which will involve a few changes to s6 
ending

up in a major update, I'm afraid).

 A question I have is: what should s6-rc-update do when the template has
changed? The template will obviously be changed in the new service, but
should the old instances stay alive, with the old template? My natural
inclinaton is to say yes; if the user wants the service restarted they
can say so explicitly in the conversion file. But maybe there are better
alternatives I haven't thought about.

--
 Laurent