Re: [PATCH] Fix typo

2024-04-17 Thread Laurent Bercot



 Fixed, thanks!
 (I assume you meant in the s6 package. :))

--
 Laurent



Re: [announce] small skarnet.org Spring 2024 update

2024-04-16 Thread Laurent Bercot




Thank you! Sorry for the rather bare initial report - was very much
one of trying to work out what had gone wrong initially!


 It's all good - I'm supposed to catch these things and I failed,
so the next best thing is to get them fixed as quickly as possible :)

--
 Laurent



Re: [announce] small skarnet.org Spring 2024 update

2024-04-16 Thread Laurent Bercot

I can confirm that the patch worked:


 Thanks, execline-2.9.5.1 is out now.

--
 Laurent



Re: [announce] small skarnet.org Spring 2024 update

2024-04-16 Thread Laurent Bercot

Running backtick with gdb reveals that the crash is caused by the
`memcpy' at line 63 of src/libexecline/el_modifs_and_exec.c


 Thanks for doing my work for me :D
 (these are the bugs I usually catch before release, but, laziness.)

 The latest execline git head should fix it. If it works for you,
I'll cut the 2.9.5.1 release.

--
 Laurent



Re: [announce] small skarnet.org Spring 2024 update

2024-04-16 Thread Laurent Bercot




backtick -E A_LONGISH_NAME { s6-echo foo }

It fails with:


 Huh. I must have missed something. Thanks for the report, will
investigate and fix.

--
 Laurent



[announce] small skarnet.org Spring 2024 update

2024-04-15 Thread Laurent Bercot



 Hello,

 New versions of some skarnet.org packages are available.
 A very light update this time, just keeping the lights on.

skalibs-2.14.1.1(release)
execline-2.9.5.0(minor)
s6-2.12.0.4 (release)
tipidee-0.0.4.0 (minor)

 skalibs and s6 get tiny bugfixes.

 Alongside minor bugfixes as well, the execline release makes
backtick add its child's exit code to the ? environment variable
(as foreground does) when used with an option making it continue
when the child fails.

 tipidee can now be used with sites whose root is served via a
unique CGI script. The Server: header is now overridable, for
people who don't want to broadcast the exact version of their web
server in an HTTP response. And there's a new ls.cgi program, that
can be used as an index.cgi to serve a list of all the files in
a directory.

 Enjoy,
 Bug-reports welcome.

--
 Laurent



Re: Update: s6 and utmps rpm package

2024-04-12 Thread Laurent Bercot

I would like to package an example service for s6. Could you suggest one?


 tipidee would be a good one. I plan to release tipidee-0.4.0.0 very 
soon

and have an Alpine package for it early next week, if you want to have
example scripts.



So, is s6-rc a good candidate for rpm package? I am preparing to build s6-rc 
package in next few days.


 Having a s6-rc package won't hurt. I'm not sure it would be useful,
because there's no point in having an alternative service manager when 
you

already have systemd, but it's not doing any harm.



Thanks for your suggestion, when I see the content of utmp-prepare and 
utmp-init,
I have the same question: will this conflict with the existing utmp/wtmp 
service?


 Yes, this will conflict. Anything that's in the utmps package is made 
to
work with a utmps installation, not with a regular glibc utmp 
installation.

On Fedora, utmp works with glibc as is, so you shouldn't add anything.

--
 Laurent



Re: Update: s6 and utmps rpm package

2024-04-12 Thread Laurent Bercot

1. Run btmpd, utmpd, wtmpd as s6 service. But this option will add s6 as extra 
dependency.

2. Run  btmpd, utmpd, wtmpd as systemd service. The dependency is minimal. Only
depends on s6-ipcserver.


 On Alpine, s6-ipcserver is in a separate package because Alpine is very
careful about disk space, so much that they wanted me to make utmps
available without the bulk of s6. (Yes, I find this pretty hypocritical
given other decisions they make, but I was tired of arguing with them.)

 On RedHat, you will not have the same concern: s6 is a drop in the 
water

compared to the amount of disk space you need to boot anyway. So it does
not make sense to separate s6 from s6-ipcserver, and I suggest making 
the

utmps package depend on the s6 package anyway.

 This is a separate question from running the [uwb]tmpd services under
s6-svscan or systemd. Both approaches have advantages.

 Running the utmps services under systemd:
- they start earlier
- you can make any systemd service depend on them

 Running the utmps services under s6:
- independence from systemd, can be portable anywhere
- shows an example of how to run a service under s6



3. Run  btmpd, utmpd, wtmpd as s6-rc service. Add two more dependencies: s6 and 
s6-rc.


 That option, on the other hand, isn't a good one. There is an argument
for running a s6 supervision tree under systemd, but there is little
argument for running s6-rc and having a parallel service manager 
ecosystem

- this probably adds more complexity than it's worth. (Unless it's for
transitional purposes, but transitioning Fedora out of systemd isn't
happening.)

...

 All of that being said, however, my opinion is that you *should not*
package utmps for Fedora. utmp management is a distro-wide decision:
the utmp database is unique and accessed by several components in the
system. Fedora uses glibc, and glibc has its own utmp implementation,
and all the existing Fedora packages expect utmp to be managed by the
glibc implementation. Adding utmps, and packages that will use utmps,
will introduce conflict, and break things. (The utmp databases won't
have the correct permissions, glibc will access the files directly
without the locking that utmps does and concurrent access will cause
file corruption, etc.)

 utmps isn't something that you can add like this and have some packages
depend on it and others not. It has to be a concerted effort by the 
whole

distribution, to decide if they switch to it or not. Alpine uses it
because musl doesn't provide a real utmp implementation; the transition
could be done incrementally without conflicting. glibc-based distros are
another story, a transition would need to be done atomically. And unless
you submit a proposal to Fedora and it is discussed and accepted by the
Powers That Be, it's not happening.

--
 Laurent



Re: Update: rpm package for utmps, skalibs.

2024-04-10 Thread Laurent Bercot



 Please note that this list isn't meant for real-time debugging.
 If you want real-time help, please join IRC (#s6 on OFTC), that's what
it is for.



Apr 10 22:06:53 rpm-builder s6.systemd-boot[15235]: s6-supervise s6-svscan-log: 
warning: unable to spawn ./run (waiting 60 seconds): No such file or directory


 Looks like your scandir isn't empty.
 Again, do not use s6-svscanboot or anything similar. Start s6-svscan
directly.

--
 Laurent



Re: Update: rpm package for utmps, skalibs.

2024-04-10 Thread Laurent Bercot



I prefer this way. Some packages prefer s6 as their process supervisor, 
some
packages prefer systemd. With the help of s6 rpm package, other rpm 
packages
who depend on s6 can install their service in s6’s service directory. 
We just pave

for the community, the choice is in their hands.


 All right. Then, as Guillaume says, you cannot use /run/service as your
scandir, because everything under /run will be wiped at boot time. You
should pick a permanent place, such as (for instance) 
/var/lib/s6/service,

and make it your scandir.
 All the packages that have services they want s6 to manage will need to
create their own service directories, and symlink them into the scandir.


Could you tell me how to verify s6-svscan works well as there is 
nothing in log.


 Did you start the process? If you did, it's running, and it's working.
If it isn't, it's a bug, either in the way you started it, or in the 
code.
Do not runtime-check the result of your policies. If your policy says 
that

s6-svscan is started, then you should assume it is working. If it isn't,
it's not something that should be handled at run time, it's something 
that

should be handled before anything ships.

--
 Laurent



Re: Update: rpm package for utmps, skalibs.

2024-04-10 Thread Laurent Bercot



 Since your s6-svscan doesn't run as pid 1, you don't need a finish or a
crash script. Not creating the .s6-svscan directory at all is good: the
default behaviour is suitable for running s6-svscan as a normal service.

 The answer to the rest of your questions implies policy decisions. In
other words, what do you want the package to do, exactly, and how do you
want it to interact with other services that you want to run under s6?

 - Do you want to start an empty supervision tree at boot, and then
have the service manager (probably systemd, here) populate it with 
various

symlinks to service directories as it brings services up?
 - Do you want to start a pre-populated supervision tree that survives
across reboots, and have packages install their services there and not 
be

handled by the service manager at all?
 - Do you want another behaviour?

 If you're going to make a package, you first need to think about 
exactly
what it is you're trying to accomplish. In accurate, painful technical 
detail.


--
 Laurent



Re: Update: rpm package for utmps, skalibs.

2024-04-08 Thread Laurent Bercot




I notice s6-svscanboot is the start script for s6-svscan. I am not an execline 
expert, but I can see that s6-svscanboot prepare log directories and start 
s6-svscan. If systemd provides log service for s6-svscan. Do we need 
s6-svscanboot for rpm package?


 No, you don't.
 As I said in a previous mail: you probably want to throw out everything
that isn't the original sources, and make a fresh start. This includes
throwing out s6-svscanboot, which sets up a catch-all logger because
sysvinit/busybox init + openrc doesn't provide one. Since systemd has 
its

own catch-all logger, you don't need s6-svscanboot.



The next question is about s6.pre-install, s6.pre-upgrade script. they do the 
same thing (setup catchlog user/group), why we need s6.pre-upgrade script if 
the previous installation already setup the catchlog user/group?


 As Hoël said, it's a legacy script, for very old installations that 
need

to upgrade. It could probably be removed. In any case, in a new rpm, you
don't need it. And since you're not setting up a catch-all logger, you
don't need a user for it either, so you can do away with the install
scripts entirely.

--
 Laurent



Re: Update: rpm package for utmps, skalibs.

2024-04-06 Thread Laurent Bercot




One last question: do we need the s6-openrc rpm package? I know systemd is more 
popular for Redhat and Fedora. Any suggestion?


 I doubt anyone is going to run openrc on Fedora. If you're going to 
package

s6 for a given distribution, you should integrate it properly with that
distribution, not copy what is done on other distributions. To that end,
you should forget about openrc, and probably assume systemd is running.

 That means:

- removing all the other files than the sources.
- making a suitable unit file to start s6-svscan.
- taking advantage of the fact that systemd, unlike openrc, has a 
logging

mechanism, so you don't need to set up a catch-all logger for the
supervision tree - on the contrary, you can just let all the logs fall
through to stderr.

 I would help you do all that, except I have no experience with rpm
(except from 20 years ago, that is).

--
 Laurent



Re: Update: rpm package for utmps, skalibs.

2024-04-03 Thread Laurent Bercot

RHEL and Fedora have an alternatives system:

* https://docs.fedoraproject.org/en-US/packaging-guidelines/Alternatives/
* https://www.linux.org/docs/man8/alternatives.html


 Then it looks like the correct way to proceed, if Eric can coordinate 
with

the maintainers of the filesystem and bash packages.

--
 Laurent



Re: Update: rpm package for utmps, skalibs.

2024-04-03 Thread Laurent Bercot

There have been some discussions, starting at Fedora, about unifying
the bin and sbin directories:
https://fedoraproject.org/wiki/Changes/Unify_bin_and_sbin


 Ha. 25 years later, they understand that the separation makes no sense,
and *just* when we were going to use that silly separation to work 
around

an even sillier idiosyncrasy.

 Talk about timing.



Also, even apart from unifying the directories, there are various
people who have expressed concern about having different programs with
the same name in /usr/bin and /usr/sbin, thus making it something of
a potluck which one will be invoked depending on the user's search path.
I have to admit that I am kind of in agreement with that: different
binaries with the same name in directories that are both meant to be in
the search path seems... a bit fishy to me, and, yeah, with
the potential for problems if the directories are reordered
(I have seen arguments for both sides: "things in /sbin are more
important, so it should come before /bin; things in /bin are used
much more often, so it should come before /sbin").


 I agree with all this. In principle, /usr/bin and /usr/sbin should not 
be

distinct, for all these reasons.
 The thing is, we're not in the realm of "good design", here. We're in 
the

realm of "work around the braindeadness and use the cracks to uglyhack
something that works".

 If rpm doesn't have an alternatives system to get the useless binaries 
out

of the way, and if /usr/sbin is unusable, then there's nothing left but
"add another directory to the global PATH", which is super invasive.

--
 Laurent



Re: s6-rc user services on Gentoo

2024-04-03 Thread Laurent Bercot




2) The presence of a notification-fd file tells s6 that dbus-daemon
can be somehow coerced into producing an s6-style readiness
notification using file descriptor 3 without changing its code, are
you sure that's the case with this script? My service definition for
the system-wide message bus polls for readiness using s6-notifyoncheck
and a dbus-send command...


 "dbus-daemon --print-address=3" should produce a suitable notification.
The address of the bus can only be printed once the bus exists. :)

--
 Laurent



Re: Update: rpm package for utmps, skalibs.

2024-04-02 Thread Laurent Bercot



After check the installed package of execline on alpine. I choose to 
install main part of execline to /usr/bin.
Create /usr/sbin directory, create relative symbol link for cd, umask 
and wait to /usr/bin/execline.


 Does that mean you're using --enable-multicall? You can, it's just 
surprising

for a distribution that uses glibc and doesn't care much about size. :)

 And yes, I think you have it right. Put the binary and most of the 
symlinks

in /usr/bin, and only use /usr/sbin for the cd, umask and wait symlinks.

--
 Laurent



Re: Update: rpm package for utmps, skalibs.

2024-04-01 Thread Laurent Bercot



 And yes, since execline-provided cd, umask and wait, when called via a
PATH search (not that a shell will ever do that, but execvp() can), will
substitute themselves to Fedora-provided POSIX binaries, it is necessary
to build execline with --enable-pedantic-posix in order to prevent 
trouble

with whatever pathological case Fedora could come up with.

--
 Laurent



Re: Update: rpm package for utmps, skalibs.

2024-04-01 Thread Laurent Bercot

[packager@rpm-builder etc]$ env | grep PATH
PATH=/home/packager/.local/bin:/home/packager/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

I guess /user/local/bin or /usr/local/sbin is our first choice? Do we need 
--enable-pedantic-posix for /usr/local/bin or /usr/local/sbin?


 No, /usr/local is reserved, as the name implies, for local 
installations:

packaged software cannot use it.

 If the default PATH has /usr/sbin before /usr/bin for all users, then 
the
best thing is probably to install cd, umask and wait into /usr/sbin. 
It's

not exactly clean, but at this point we're not trying to be clean, we're
trying to make things work. And it wouldn't be the first time a binary 
that's

available to all users gets installed in /usr/sbin.

--
 Laurent



Re: Update: rpm package for utmps, skalibs.

2024-04-01 Thread Laurent Bercot

In my (admittedly ugly) package, I simply delete execline's `cd' and
`umask'; `wait' is renamed to `execline-wait', just like `execline-cd'
and `execline-umask' (which are not conflicting and so not deleted).


 This means that your execline package cannot run execline scripts that
use cd, umask or wait. It may work for you, but it is not suitable as a
general audience package.

--
 Laurent



Re: Update: rpm package for utmps, skalibs.

2024-04-01 Thread Laurent Bercot

file /usr/bin from install of execline-2.9.4.0-1.fc39.x86_64 conflicts 
with file from package filesystem-3.18-6.fc39.x86_64
file /usr/bin/cd from install of execline-2.9.4.0-1.fc39.x86_64 
conflicts with file from package bash-5.2.26-1.fc39.x86_64
file /usr/bin/umask from install of execline-2.9.4.0-1.fc39.x86_64 
conflicts with file from package bash-5.2.26-1.fc39.x86_64
file /usr/bin/wait from install of execline-2.9.4.0-1.fc39.x86_64 
conflicts with file from package bash-5.2.26-1.fc39.x86_64


 Oh, Red Hat, never change.

 The correct answer to this problem is that these binaries from the 
"filesystem"
and "bash" packages should not exist. They are just never called - any 
instance
of "cd", "umask" or "wait" in a shell script calls sa shell builtin, and 
they
*have to* be builtins - they cannot work when called as external tools. 
The
only reason why these binaries exist is to comply with a broad statement 
in
POSIX that every builtin must also be provided as an external tool, even 
those

where it does not make sense.
 Red Hat-based distributions are the only ones that do this. Other ones 
have

understood that these binaries are useless.

 But obviously you cannot remove these binaries unless you're the 
"filesystem"

and "bash" maintainer, so workarounds must be found.

 The best workaround is to use an alternatives system if available. I 
don't
know if rpm provides this. The idea is to offer execline as an 
alternative

source for the cd, umask and wait binaries. If you build execline with
--enable-pedantic-posix (which you should do in this case), the binaries
provided by execline are fully compatible with the POSIX requirement, 
and can

replace the default Fedora binaries entirely; they also provide actually
useful functionality when used in ways not explicitly covered by POSIX 
(i.e.

when chain-loading, which is how they're used in execline scripts).

 You may need to work with the filesystem and bash maintainers for this.

 Short of that, the only possible workaround is to find a place that 
appears
*before* /usr/bin in the default PATH, and install, or link, execline 
binaries
there. This may be difficult to find, because /usr/bin is generally one 
of
the first locations in PATH. If you cannot find this, then the only way 
is
to install execline binaries in their own directory (e.g. 
/var/lib/execline/bin)
*and* add that directory to the default PATH of every user, before 
/usr/bin,

which is a lot more invasive.

 (If there is no policy that forbids creation of subdirectories in /, 
you

could consider building skaware with --enable-slashpackage, and adding
/command at the top of the user PATH. execline would have its binaries 
in

/package/admin/execline/command, accessible via /command, nothing would
conflict with stuff living in FHS, and as long as /command is before 
/usr/bin
in PATH, things would work. That is what I do on my machines. But 
unfortunately,
most distributions can be pretty anal about /package and /command - 
which is
hypocritical considering they have no problem with /media and /srv, but 
that's

a fight for another day - so it's doubtful you can do that.)

 If everything else fails, document somewhere that execline, *and* 
packages
that depend on execline, will not be usable unless that directory is 
added
*at the top of* the PATH. It's not only about finding the binaries, it's 
about

making sure that the correct cd, umask and wait binaries are found; if a
binary from "filesystem" or "bash" is found instead, this will break 
execline

scripts.

 Note that what some distros did, i.e. putting the execline binaries in
/var/lib/execline/bin and adding a /usr/bin/execlineb wrapper that 
prepends PATH
with /var/lib/execline/bin before executing 
/var/lib/execlineb/bin/execlineb,
is explicitly NOT correct. execline binaries aren't supposed to be 
accessible

only when called by execlineb; they're supposed to be accessible via the
default PATH, and some parts of skaware will break if they're not. 
Putting them
in /var/lib/execline/bin is fine, but then /var/lib/execline/bin needs 
to be
in the default PATH, and *before* /usr/bin, instead of being activated 
by a

wrapper.

--
 Laurent



Re: Update: rpm package for utmps, skalibs.

2024-04-01 Thread Laurent Bercot

Yes, skalibs, execline are different projects. The GitHub site is just a 
central and temporary place to hold the spec files.

For skalibs project, I build 4 rpm packages: skalibs, skalibs-devel, 
skalibs-devel-static, skalibs-doc.
skalibs-devel depends on skalibs. Just follow the aports counterpart pacakges 
dependency rule.

The rpmbuild tool supports building different packages in one directory.


 OK, thanks for the clarification.

--
 Laurent



Re: Update: rpm package for utmps, skalibs.

2024-04-01 Thread Laurent Bercot



 I haven't looked in detail, but I'm not sure why you want everything
in one single RPM.

 skalibs, utmps, execline and s6 are different projects. A package 
should

be one project, not a set of projects. A package manager will handle
dependencies between packages and install all the rpms that are needed 
by

a given project.

 Or isn't it how the rpm packaging system handles dependencies?

--
 Laurent



Re: is there any rpm package for utmps, skalibs?

2024-03-27 Thread Laurent Bercot
my first question is: does skalibs support glibc? alpine only support 
musl.


 Yes. skalibs supports everything that makes a good attempt to be
POSIX-conformant, so that includes glibc.

--
 Laurent



Re: is there any rpm package for utmps, skalibs?

2024-03-27 Thread Laurent Bercot



 Hi Wang,

 Your e-mail client seems to be broken. It sends HTML entities as 
text/plain,
and it makes the content of your mail unreadable. Please fix this, if 
you can.


 From what I can understand, you're looking for rpm packages for skalibs
and utmps. I don't know if there are any; I haven't had any contact from 
a

Fedora maintainer or user.

 If anyone's interested in making rpm packages for skaware, or in 
helping

Wang make them, please show yourself :)

--
 Laurent



Re: version information output

2024-01-12 Thread Laurent Bercot

there is no version information option (like say "-V") for
the s6 utils. such a command line option should make the
tool output its version number and terminate.

it would be nice if such an option could be added to the tools.


 It would also add boilerplate to every single binary, which would make
them bigger, as well as longer and more annoying to write.

 Most of the time, the version information is available elsewhere;
typically, in your package manager. Or in the filesystem if you're using
slashpackage. (That's one of the issues with FHS, it requires an
additional system such as a package manager to retain the 
meta-information

it loses by having binaries in fixed directories.)

 Binaries are not the place to store meta-information. There is nothing
you can programmatically do with version information; if you require a
specific minimal version of a tool, then by policy you should have it on
your system, and you should assume that your requirements are met (and
it's a bug if they are not).

 "true --help" and "true --version" are often mentioned for laughs; 
there

is a reason for that.

--
 Laurent



Re: [announce] skalibs-2.14.1.0, tipidee-0.0.3.0, shibari-0.0.1.0

2023-12-23 Thread Laurent Bercot


Additionally, the shibari documentation has been ported:

* https://git.sr.ht/~flexibeast/shibari-man-pages/refs/v0.0.1.0.1

(For those wondering, porting the two man pages for shibari took me roughly an 
hour.)


 You are awesome.


Re: [announce] skalibs-2.14.1.0, tipidee-0.0.3.0, shibari-0.0.1.0

2023-12-21 Thread Laurent Bercot




The difference in UDP is that not having a connection makes it harder to model
with the stdin/stdout method of UCSPI, right?


 Yes. A super-server model makes sense for TCP because you can spawn
one server to handle one stream; not so much for UDP, because there is
no stream, only packets, and you don't want to spawn a process for
every packet.

 A UDP server doesn't have to deal with the complexity of multiplexing
exchanges with different clients, either, since it only needs to
respond to every packet in sequence. No parallelism needed, it's all
very straightforward and simple to write.

 djbdns does the exact same: axfrdns is spawned by a super-server, but
tinydns binds to its UDP socket itself.

--
 Laurent



[announce] skalibs-2.14.1.0, tipidee-0.0.3.0, shibari-0.0.1.0

2023-12-21 Thread Laurent Bercot



 Hello,

 New versions of some skarnet.org packages are available.
 This is mostly a bugfix release, with some new features.

skalibs-2.14.1.0   (minor)
s6-2.12.0.3(release)
s6-dns-2.3.7.1 (release)
s6-networking-2.7.0.1  (release)
tipidee-0.0.3.0(minor)
shibari-0.0.1.0(new!)


 * skalibs-2.14.1.0
   

 Despite the minor bump, that was necessary for one of the bug fixes,
this is still a bugfix release - but an important one. All users
should upgrade.

 The upgrade breaks the build of old s6 and s6-networking versions,
despite not being a major upgrade. This is intentional; the 'broken'
functionality actually never worked, and the old interfaces *could*
never work, so, better get rid of them and expose problems at build time
rather than run time. The new versions of s6 and s6-networking use the
new, working interfaces. Other packages are not impacted.

 https://skarnet.org/software/skalibs/
 git://git.skarnet.org/skalibs


 * tipidee-0.0.3.0
   ---

 tipidee now supports ranges! And also a new XXX_no_translate
configuration option, which - as the name implies - is pretty dangerous:
it disables path translations and interprets the requested URI as is,
which allows symlinks to documents located outside of the server's root.

 https://skarnet.org/software/tipidee/
 git://git.skarnet.org/tipidee


 * shibari-0.0.1.0
   ---

 A brand new project, because I clearly don't have enough on my plate.
 shibari is a suite of DNS tools, a successor to s6-dns. Eventually it
will fully replace s6-dns, but for now it simply depends on it. This
first version of shibari comes with two DNS server programs (one for UDP
and one for TCP), that are more or less drop-in replacements for djb's
tinydns and axfrdns. (I wrote them because it was a better long-term
solution than adding a patch to fix a bug in axfrdns.)

 https://skarnet.org/software/shibari/
 git://git.skarnet.org/shibari


 Enjoy,
 Bug-reports welcome - but I probably won't be working much on them
during the end of the year.

 Merry Christmas, happy holidays, and happy new year!

--
 Laurent



Re: A define program using blocks for execline?

2023-12-17 Thread Laurent Bercot

Yes, it can be done with current execline tools through options like
-s in define and importas, but I feel something like this would be
clearer:

block-define var { 1 2 3 }
printf "%s\n" "This is ${var}"

Does this already exist?


 Not really, but that sounds like a possible addition, the model sounds
sane. Thanks for the suggestion, I'll think about it.

--
 Laurent



Re: How does s6-linux-init-shutdownd.c contact pid 1?

2023-11-20 Thread Laurent Bercot

I've been trying to find out why my "finish" script is not working
(or perhaps it is working but not printing output anywhere I can see)


 The ways of shutdown are mysterious. :)



However, I don't think kill(-1, n) *works* for pid 1.


 Indeed, it does not. That kill is supposed to be sent to every process
*except* s6-svscan, which will survive and restart the supervisor for
s6-linux-init-shutdownd, which will restart s6-linux-init-shutdownd,
which will then execute stage 4.

 If you're not in a container, stage 4 will unmount the filesystems
then hard halt/poweroff/reboot the machine.

https://git.skarnet.org/cgi-bin/cgit.cgi/s6-linux-init/tree/src/shutdown/s6-linux-init-shutdownd.c#n193

 s6-linux-init-shutdownd never tells s6-svscan to exit, so if you're
running s6-linux-init, it's normal that your .s6-svscan/finish script
is not executed.

 The place where you want to hack things is /etc/rc.shutdown.final,
which is run by the stage 4 script right before the hard reboot. At
this point nothing is mounted anymore and the process tree is only

s6-svscan
 \_ s6-supervise s6-linux-init-shutdownd
 \_ foreground { rc.shutdown.final } reboot
 \_ rc.shutdown.final

so you can do dirty stuff like "rm -f 
/run/service/s6-linux-init-shutdownd

&& s6-svc -dxwD /run/service/s6-linux-init-shutdownd &&
s6-svscanctl -b /run/service" which should clean up the s6-supervise
and the foreground, and give control to .s6-svscan/finish.

 Start your finish script with "wait { }" because s6-svscan will 
probably

exec into it before rc.shutdown.final dies, and you don't want a zombie
hanging around.

--
 Laurent



[announce] s6-2.12.0.2

2023-11-20 Thread Laurent Bercot



 Hello,

 I don't normally spam all of you for bugfix releases, but this one is
important. You definitely want to grab the 2.12.0.2 version of s6, not
the 2.12.0.1 one. The bug could prevent a shutdown from completing.

 https://skarnet.org/software/s6/
 git://git.skarnet.org/s6

 Sorry about that,

--
 Laurent



[announce] but what about *second* skarnet.org November 2023 release?

2023-11-19 Thread Laurent Bercot



 Hello,

 New versions of some skarnet.org packages are available.
 This is mostly a bugfix release, addressing the problems that were
reported since the big release two weeks ago.

 Despite that, s6-dns got a minor version bump because the fixes
needed an additional interface; and s6-networking got a major bump,
because it needed an interface change. Nothing that *should* impact
you, the changes are pretty innocuous; but see below.

skalibs-2.14.0.1(release)
s6-2.12.0.1(release)
s6-dns-2.3.7.0(minor)
s6-networking-2.7.0.0(major)
tipidee-0.0.2.0(minor)


 * skalibs-2.14.0.1
   

 This release is important if you want the fixes in s6-dns: the
ipv6 parsing code has been revamped.

 https://skarnet.org/software/skalibs/
 git://git.skarnet.org/skalibs


 * s6-2.12.0.1
   ---

 It's only a bugfix, but you want to grab this version, because the
bug was impactful (s6-svscanctl -an not working as intended).

 https://skarnet.org/software/s6/
 git://git.skarnet.org/s6


 * s6-dns-2.3.7.0
   --

 - The parsing of /etc/hosts now ignores link-local addresses instead
of refusing to process the whole file.
 - New interface to only process /etc/hosts if a client requires it.

 https://skarnet.org/software/s6-dns/
 git://git.skarnet.org/s6-dns


 * s6-networking-2.7.0.0
   -

 - s6-tlsc-io has changed interfaces; now it's directly usable from a
terminal. This change should be invisible unless you were using
s6-tlsc-io without going through s6-tlsc (which, until now, there was
no reason to do).
 - s6-tcpserverd now logs "accept" and "reject" instead of "allow" and
"deny", this terminology now being reserved to s6-tcpserver-access.
 - The -h option to s6-tcpclient and s6-tcpserver-access has changed
semantics. Previously it was used to require a DNS lookup, and was 
hardly

ever specified since it was the default (with -H disabling DNS lookups).
Now it means that DNS lookups must be preceded by a lookup in the
hosts database.
 - A new pair of options, -J|-j, are accepted by s6-tlsc-io and
s6-tlsd-io, and by extension the whole TLS chain of tools. -J means that
s6-tls[cd]-io should exit nonzero with an error message if the peer 
fails

to send a close_notify before closing the connection; -j, which is the
default, means ignore it and exit normally.
 - The TLS tunnels work as intended in more corner cases and
pathological situations.

 https://skarnet.org/software/s6-networking/
 git://git.skarnet.org/s6-networking


 *  tipidee-0.0.2.0
---

 - Bugfixes.
 - New configuration options: "log x-forwarded-for", to log the contents
of the X-Forwarded-For header, if any, along with the request; and
"global executable_means_cgi", to treat any executable file as a CGI
script (which is useful when you control the document hierarchy, but
dangerous when it's left to third-party content manager programs).

 https://skarnet.org/software/tipidee/
 git://git.skarnet.org/tipidee


 Enjoy,
 As always, bug-reports welcome.

--
 Laurent



Re: [announce] skarnet.org November 2023 release

2023-11-10 Thread Laurent Bercot




Minor issue, the version linked from the web page
(https://skarnet.org/software/skalibs/) needs a bump


 Whoops. Fixed.

--
 Laurent



Re: tipidee - uri parse when port missing

2023-11-09 Thread Laurent Bercot




 Hi Vincent,

 I'm not sure if you're testing with the released version of tipidee
or not. Please make sure to only report bugs against the released
version or the git head.

 In any case, the absence of a port in the Host field is certainly not
the reason why tipidee would answer a 400. There has to be something
else in the request coming from the proxy that it doesn't like. Can
you post the full request?

--
 Laurent



[announce] skarnet.org November 2023 release

2023-11-06 Thread Laurent Bercot



 Hello,

 New versions of all the skarnet.org packages are available.
 This is a big one, fixing a lot of small bugs, optimizing a lot behind
the scenes, adding some functionality. Some major version bumps were
necessary, which means compatibility with previous versions is not
guaranteed; updating the whole stack is strongly recommended.

 Also, tipidee is out! If you've been looking for a small inetd-like
Web server that is still standards-compliant and fast, you should
definitely check it out.

skalibs-2.14.0.0 (major)
nsss-0.2.0.4 (release)
utmps-0.1.2.2(release)
execline-2.9.4.0 (minor)
s6-2.12.0.0  (major)
s6-rc-0.5.4.2(release)
s6-linux-init-1.1.2.0(minor)
s6-portable-utils-2.3.0.3(release)
s6-linux-utils-2.6.2.0   (minor)
s6-dns-2.3.6.0   (minor)
s6-networking-2.6.0.0(major)
mdevd-0.1.6.3(release)
smtpd-starttls-proxy-0.0.1.3 (release)
bcnm-0.0.1.7 (release)
dnsfunnel-0.0.1.6(release)
tipidee-0.1.0.0  (new!)


 * skalibs-2.14.0.0
   

 This version of skalibs adds a lot of new sysdeps, a lot of new
functions, and changes to existing functions, in order to support
the new features in other packages.
 The most important change is the new cspawn() function, providing
an interface to posix_spawn() with support for most of its options
with a fork() fallback for systems that do not have it.
 What this means is that on systems supporting posix_spawn(), the
number of calls to fork() in the whole skarnet.org stack has been
significantly reduced. This is important for programs where spawning
a new process is in a hot path - typically s6-tcpserver.

 Updating skalibs is a prerequisite for updating any other part of
the skarnet.org stack.
 Once you've updated skalibs, you probably don't *have to* update
the rest; old versions of packages should generally build with the new
skalibs as is, and if indeed they do, nothing should break. But it is
a major update, so there are no guarantees; please update to the
latest versions at your convenience.

 https://skarnet.org/software/skalibs/
 git://git.skarnet.org/skalibs


 * execline-2.9.4.0
   

 - execlineb now has a dummy -e option (it does nothing). This is so
it can be used as a replacement for a shell in more environments.
Also, execline programs use fork() a lot less, so overall execline
script performance is better.
 - The multicall setup did not properly install symbolic links for
execline programs; this is fixed, and is fixed as well as in other
packages supporting a multicall setup (s6-portable-utils and
s6-linux-utils).

 https://skarnet.org/software/execline/
 git://git.skarnet.org/execline


 * s6-2.12.0.0
   ---

 - s6 programs use fork() less.
 - New -s option to s6-svc, to send a signal by name or number.
 - s6-svscan has been entirely rewritten, in order to handle logged
services in a more logical, less ad-hoc way. It should also be more
performant when running as init for a system with lots of s6-supervise
processes (improved reaping routine).
 - The obsolete (and clunky) s6lockd subsystem has been deleted.
s6-setlock now implements timed locking in a much simpler way.

 https://skarnet.org/software/s6/
 git://git.skarnet.org/s6


 * s6-linux-init-1.1.2.0
   -

 - New -v option to s6-linux-init-maker, setting the boot verbosity.
 - Several small bugfixes, one of them being crucial: now your
systems shut down one second faster!

 https://skarnet.org/software/s6-linux-init/
 git://git.skarnet.org/s6-linux-init


 * s6-linux-utils-2.6.2.0
   --

 - Support for the minflt and majflt fields in s6-ps.

 https://skarnet.org/software/s6-linux-utils/
 git://git.skarnet.org/s6-linux-utils


 * s6-dns-2.3.6.0
   --

 - Support for on-demand /etc/hosts data in s6-dnsip and s6-dnsname.
It is achieved by first processing /etc/hosts into a cdb, then looking
up data in the cdb. You can, if you so choose, perform this processing
in advance via a new binary: s6-dns-hosts-compile.

 https://skarnet.org/software/s6-dns/
 git://git.skarnet.org/s6-dns


 * s6-networking-2.6.0.0
   -

 This is the package that has undergone the biggest changes.

 - No more s6-tcpserver{4,6}[d]. IPv4 and IPv6 are now handled by the
same program, s6-tcpserver, which chainloads into a unique long-lived
one, s6-tcpserverd.
 - s6-tcpserver now exports TCPLOCALIP and TCPLOCALPORT without the
need to invoke s6-tcpserver-access.
 - s6-tcpserver-access does not hardcode a warning when it is
invoked without a ruleset. It can now just be used for additional data
gathering (such as TCPREMOTEHOST) without jumping through hoops.
 - s6-tcpserverd has been thoroughly optimized for performance. It will
handle as heavy a load as the underlying system will allow.
 - Yes, this means you can now use s6-tcpserver to serve 

Re: tipidee/s6-tlsserver crash with tls launch

2023-11-05 Thread Laurent Bercot



 Fixed in latest s6-networking git head. It was an invocation of
tls_error() with the wrong context.
 When run with the fixed version, s6-tlsd-io prints this error:

 s6-tlsd-io: fatal: unable to tls_configure: failed to read private key

which means there's an issue with your fd.key file, probably the format.

But this shows it's important to test error paths too! Thanks for that 
:)


--
 Laurent



Re: [announce] tipidee is now in open beta!

2023-10-24 Thread Laurent Bercot



 The release date of tipidee is approaching.

 Since the last announcement, there have been some significant changes
to tipidee, including:

 - a more flexible logging configuration
 - custom error pages (by domain)
 - custom headers

as well as many bugfixes, thanks to everyone's reports.

 These changes were the last important ones. There will be no more
major additions before the release.

 But the new features need testing! In particular, custom error pages
and custom headers are new enough and complex enough that they could
use more people stress-testing them. So please, if you can, grab
tipidee's git head, and check that it works for you and your twisted
mind.

 (Some parts of the /etc/tipidee.conf syntax have changed, you will
need to perform minor edits to your configuration and run tipidee-config
again.)

 Thanks a lot in advance! The more testing, the sooner we can put a
number on it. :)

 (Alexis, the documentation is in a good place now and I do not expect
any important changes to it before the release. There will probably be
minor tweaks and rephrasing, but that's it.)

--
 Laurent



Re: tipidee/s6-tlsserver crash with tls launch

2023-10-13 Thread Laurent Bercot
Just tried with latest s6-networking HEAD (and deps) and also libressl 
3.7.3.

Unfortunately same issue.

I hope it was not due to my certs. Those are generated with openssl 3rd 
book (self signed certs).


 LibreSSL hardcodes its list of trusted anchors so it won't be able to
*verify* self-signed certs, but it should be able to *serve* them, which
is what you are doing. And in any case, it shouldn't crash.

 I'm going to need your full sysdeps file (from skalibs), as well as
your kernel and libc versions, please, to see if I can reproduce this.

--
 Laurent



Re: Any chainloading program to call prctl(PR_SET_PDEATHSIG, signal)

2023-10-11 Thread Laurent Bercot

It turns out there is this Linux specific syscall (prctl(PR_SET_PDEATHSIG, 
signal)) to set the saner behavior of actually being informed if your parent 
dies and react to it so s6 is able to bring service back up, but it’s opt-in. 
Is there any tool in the s6 ecosystem or otherwise that I can use to call it 
before exec’ing to the service itself? I couldn’t find in s6-linux-utils and I 
would guess it’s not part of the portable tools, being Linux specific. Is there 
a portable equivalent? Is there any interest in receiving a patch, 
alternatively?


 The short answer is: no, not yet, sorry.

 The long answer is: prctl is complicated, and I do intend to do
something with it in the long run because there are useful things in
there, but it requires thought and planning. Typically, what I'd want
is to check what functionality is available on other systems than Linux
and how reasonably easy it is to implement, then add that functionality
as part of a portable program suite, and add the rest, or a useful 
subset

of, as part of s6-linux-utils or something.

 Also, I'd like to see whether there isn't a better abstraction level
than blindly implementing all the options and flags in prctl. So, I
don't want to rush this, especially not while focused on other things.

 For your use case, however, you can certainly implement a
prctl(PR_SET_PDEATHSIG, SIGTERM) wrapper yourself, that doesn't sound
difficult. Or you could use cgroups: have your run script put itself
in a dedicated process cgroup, and a finish script ensure that all the
processes in the cgroup are killed when the service goes down.

--
 Laurent



Re: tipidee/s6-tlsserver crash with tls launch

2023-10-04 Thread Laurent Bercot
I used Libressl 3.8.1 with all official releases (skalibs, execline, 
s6, s6-networking, s6-dns and s6-portable-utils).

Except for tipidee with skalibs all on HEAD.


 Thanks.

 I cannot reproduce the crash with the s6-networking git head. Can you
please test with it? (Even if there's a bug in s6-networking-2.5.1.3,
the git head is where it would be fixed; and the next release is close.)
You'll need to build skalibs and s6 from git in order for s6-networking
to build.

 And if s6-tlsd-io from git still crashes, can you please test with
libressl-3.7.3 (the stable libressl release) as well? 3.8.1 is a
development release; it has worked for me so far, but you never know.

--
 Laurent



Re: tipidee/s6-tlsserver crash with tls launch

2023-10-04 Thread Laurent Bercot




In that situation it produces a SIGSEGV during s6-tlsd-io execution.In 
attachment 2 strace log outputs (pid 14138 for the caller of s6-tlsd-io, pid 
14141 for s6-tlsd-io itself).
Why this s6-tlsd-io is always crashing (some credential/Id's)?


 It's a libtls crash during the preparation of the tunnel (before
the handshake).
 What version of LibreSSL are you using?
 And what version of s6-networking are you using? There was such a
crash in an old version of s6-networking, that should be fixed in
2.5.1.3 (and is of course also fixed in the git head).

 In general, if you're going to run a piece of skaware from git - and
thank you for that! Beta testing is helpful! - then you need to
build the whole skaware stack from the git head.

--
 Laurent



Re: [announce] tipidee is now in open beta!

2023-09-25 Thread Laurent Bercot



- is it possible to customise error pages as static pages? Currently I 
think not but is it forecasted?


 I initially wanted to *specifically* avoid this, because some Web 
servers

return a 200 status when serving their customized error page, which is
a terrible idea. But then I realized you didn't have to do that, and
serve a customized page while still returning the proper error code.

 So, yeah, why not - maybe not right now, but in a later version, sure.
I'll add that to the future.html documentation page.


- in other words tipidee does only return error codes and no content 
linked to?


 There currently _is_ content returned with the 404 response. It's just
so minimal you get the impression there is none. :)

--
 Laurent



Re: [announce] tipidee is now in open beta!

2023-09-22 Thread Laurent Bercot

An mdoc(7) port of the documentation is now also available:

https://sr.ht/~flexibeast/tipidee-man-pages/


 Thanks a lot - what speed! :D
 But please be aware that everything can still be very much in flux
until the official release. Doc is getting fixed, completed,
reworded, as much as code is. ;)

--
 Laurent






[announce] tipidee is now in open beta!

2023-09-21 Thread Laurent Bercot



 Hi folks,

 For those who don't know, I've been working on a very normal, very sane
project, not rabbit-holey or scope-creepy *at all*: a web server.

 It's named tipidee, and I just made the switch - it is now serving
the skarnet.org site.

 It's in a good enough place that I can now declare it's in beta, which
means stable enough for other people to download and test it. I've done
all the debugging I could, now I need other people to misuse it in ways
I would never have imagined.

 Please download it, use it, abuse it, and send me all the bug-reports
you can.

 https://skarnet.org/software/tipidee/
 git://git.skarnet.org/tipidee/

 When it's good enough for a numbered version, there will be a *huge*
skarnet.org release, and then we'll move on to Good Things.

 Thanks a lot,

--
 Laurent



Re: [OT] djbdns + musl

2023-08-19 Thread Laurent Bercot

$ cat conf-cc
/opt/bin/musl-gcc -static -Os -march=x86-64 -fomit-frame-pointer -pipe
-Wall -Wno-trampolines -Wno-maybe-uninitialized -Werror=overflow
-mpreferred-stack-boundary=4 -falign-functions=1 -falign-jumps=1
-falign-loops=1 -fno-unwind-tables -fdata-sections -ffunction-sections
-Wl,--gc-sections -fno-asynchronous-unwind-tables -fstrict-aliasing
-Wstrict-aliasing=2 -Wno-unused-function -foptimize-sibling-calls
-std=c89 -fno-pic -Wl,-z,noseparate-code -fPIE

$ cat conf-ld
/opt/bin/musl-gcc -s


 That's off-topic indeed, but the answer is easy: -static is a linking
option, not a compilation option. Put -static in your conf-ld, not in
your conf-cc. :)

--
 Laurent



Re: Question about s6-linux-init

2023-08-15 Thread Laurent Bercot

While following the guide for the init part I noticed the init scripts seem to 
be shell scripts. Is there any particular reason they are not execline scripts? 
I’ve become much more fond of those while trying the waters before.
Would it be sensible for me to rewrite them in execline? And, similarly, would 
you be interested in receiving patches if I do?


 Hi Mario,

 execline is best used in very specific situations:
 - when scripts are *very* simple
 - for automatically generated scripts (execline is easier to generate
than shell)
 - for power users

 Scripts such as /etc/rc.init don't fit in these categories. They can
contain a lot of stuff (look at s6-overlay's rc.init for an example of
something generic enough to work in a Docker container), they're meant
to be written manually, and by non-specialists of the s6 ecosystem.
 That makes execline a bad fit.

 I obviously like execline as much as you do, but it has shown to
generally be an obstacle to s6 adoption, because people who are new to
the ecosystem see it, rightly or wrongly, as an additional hurdle.
There's a diehard piece of FUD about s6 that says you *have to* learn
execline in order to use it; it has never been true, but the myth
lingers nonetheless, and showing that you can write scripts in whatever
language you want helps dispel it.

 For this reason, the init script examples are provided in shell, and
will remain so. Of course, you're free to write your own scripts in
execline, and encouraged to do so if you're comfortable with it and
it makes them more efficient :)

--
 Laurent



Re: [PATCH] s6-tlsserver: actually pass on -Y to s6-tlsd

2023-08-09 Thread Laurent Bercot

The -Y flag was being treated as if it means the default of not asking
for a client cert.


 Thanks! Applied with a slightly different style.

 I should really have used a different name for the optional client
certificate. As is, -Y/-y is asymmetrical between s6-tlsc and s6-tlsd,
and that's ugly (and the reason for the bug, because I copied the
template for s6-tlsserver from s6-tlsclient and failed to fix the -Y
discrepancy).

 And yes, you may well be the first to use it. It's uncommon that a
server requires a client certificate - generally only people with a
serious PKI setup bother with this, which means big orgs, and those
haven't switched to s6-tlsserver yet. ;)

--
 Laurent



Re: s6-svstat up, down and ready time not correct after system timestamp update.

2023-07-20 Thread Laurent Bercot

This can be even worse than that: the timestamp from a GPS source can take 
several tens of seconds to stabilize, depending on the accuracy of your GPS 
system and available satellites. Until then, the system date can jump back and 
forth around the actual time.


 Ew. That's pretty bad indeed.
 Forward clock jumps aren't much of a problem, software is generally
resilient to that; but *backward* clock jumps are the devil. They can
mess up log sequentiality properties; they can make software 
unresponsive;

they can introduce errors in file access/modification times; etc.

 Basically no important operation should be performed on a system until
we know that the system clock isn't going to jump backwards. I don't 
know

how existing GPS-based systems manage that.

--
 Laurent



Re: s6-svstat up, down and ready time not correct after system timestamp update.

2023-07-18 Thread Laurent Bercot

I am setuping s6 for managing services on mine Linux embedded system.
Everything is fine. But I faced issue related to system datetime change.
My system does not have RTC, but it has GNSS module (managed by gpsd).
After GNSS get the location and time chronyd service update system time.


 And there's your problem: you cannot rely on timestamp data across
system clock changes, so any service you would need accurate timestamp
data for would need to be started *after* chronyd first updates your
system clock.

 I know it's a pain. I have modified s6, and all my software, to rely
on CLOCK_REALTIME as little as possible. But sometimes it's unavoidable,
and then you need to be able to trust your system clock, or only use it
after it has been updated to something reasonable.



I found that s6-svstat uses CLOCK_REALTIME and I think it will be more robust 
to use CLOCK_MONOTONIC.


 Unfortunately, no, it will not. The timestamps written by s6-supervise,
and used by s6-svstat, are snapshots of the absolute date at a given
moment: this is exactly what CLOCK_REALTIME is for. It's the same kind
of time data that, for instance, s6-log prepends log lines with when you
use the t directive: an absolute timestamp. It is really wallclock time,
not stopwatch time.

 If stopwatch time (i.e. CLOCK_MONOTONIC) were to be used here, sure,
you would get more stability across system clock changes for
s6-svstat -o updownfor,readyfor ; but you would lose all meaning for
s6-svstat -o updownsince,readysince as well as s6-svdt, s6-permafailon
and maybe others. Stopwatch time can only be used to compute intervals,
never to share absolute timestamps.

 There is no absolute point of reference for sharing CLOCK_MONOTONIC
time across processes, unless you store an offset in the filesystem
at boot time and by convention all your software accesses this offset.
This isn't specified anywhere, because sharing absolute time across
processes is already covered by CLOCK_REALTIME, which is significantly
simpler - it only gets messy when the system performs a significant
system clock change, which should only happen at most once and in the
early life of a system. You are in this case and I know it's ugh, but
there is nothing I can do about that.



Is the patch correct? Maybe I miss something (maybe some other utils also need 
to be patched).


 Even if you wanted to use stopwatch time, no, the patch would not be
correct, because CLOCK_MONOTONIC has no absolute meaning - and a tain
containing a raw CLOCK_MONOTONIC value would be unusable.
 The function you are looking for is tain_stopwatch_read():
https://git.skarnet.org/cgi-bin/cgit.cgi/skalibs/tree/src/libstddjb/tain_stopwatch.c
but it has to be paired with an initial call to tain_stopwatch_init(),
which computes the offset so that you get reasonable absolute times -
but this offset is only valid for the current process, and you should
never share these absolute times with other processes, because they
will be increasingly inaccurate with clock drift and difference between
time of creation and time of use.

 In fact, s6-supervise *does* use CLOCK_MONOTONIC: there is an initial
call to tain_now_set_stopwatch_g() here:
https://git.skarnet.org/cgi-bin/cgit.cgi/s6/tree/src/supervision/s6-supervise.c#n836
This means that afterwards, timestamps obtained by calls to tain_now_g()
will use CLOCK_MONOTONIC. But these timestamps are only used internally,
so that the timeout computations for iopause() (the main event loop)
are resilient to system clock changes - because s6-supervise, as you are
experiencing, can be used very early, before the system clock is 
properly

set, so there is a reason to use CLOCK_MONOTONIC here.
 The timestamps that are meant to be shared with other processes,
however, are all obtained via tain_wallclock_read(), which uses
CLOCK_REALTIME, and that is on purpose.

 I understand this is not the answer you were looking for, but it's the
only one I've got. If you cannot live with the inaccurate report of
s6-svstat -o updownfor,readyfor then my suggestion is to use s6-rc to
delay the start of your sys-dbus (and friends) services until your
initial system clock change has happened.

--
 Laurent



Re: [PATCH] configure: Catch all of variable values

2023-07-09 Thread Laurent Bercot

-*=*) eval "$arg" ;;
+*=*) eval "${arg%%=*}=\${arg#*=}" ;;


 I'm going to check, but that's probably correct. Thanks!

--
 Laurent



Re: posix_spawn (was: Bugs with execline/s6 documentation and skalibs functions using posix_spawn())

2023-06-29 Thread Laurent Bercot

Actually I mean a *directory* that is guaranteed to exist (and meanwhile
unexecutable): so /dev here.


 Indeed, /dev should work; but using it still makes me queasier than
crafting a nonexistent path. The mkstemp thing works, so, not going to
change it to save a couple of syscalls in a configure test. :)



Well I was intending to suggest that we simpliy avoided posix_spawn*()
where it disagreed with posix_spawn(3p); that is to say simply replacing
all previous `#ifdef HASPOSIXSPAWN' conditions with `#if (defined
HASPOSIXSPAWN) && (!defined SKALIBS_HASPOSIXSPAWNEARLYRETURN)'.  After
all it seems to me child_spawn*() is not used that prevalently, so the
performance penalty is really minor; of course, feel free to correct me.


 Yes, falling back to fork+exec when posix_spawn is bad is an option,
and I would probably have done just that if I hadn't been pointed to
the existence of waitid() to achieve the "test whether a child is dead
without reaping it" thing, without which there can be no workaround.

 But posix_spawn is more than a performance thing. The point of this
interface is that its implementation doesn't have to be vfork+exec
internally; it was precisely designed to allow spawning processes on
nommu machines, where vfork and fork are basically impossible. So,
using posix_spawn wherever possible helps with portability as well.

 Of course, it doesn't matter for glibc, and it doesn't matter for s6
which needs fork anyway. And chances are that platforms that
implement posix_spawn() with internals that are *not* fork+exec will
not make it return before the spawning has really succeeded. But still,
it's nice to make sure it can be used wherever it exists.

 If you don't like the workaround, nobody's preventing you from using
--with-sysdep-posixspawn=no manually. ;)

--
 Laurent



Re: posix_spawn (was: Bugs with execline/s6 documentation and skalibs functions using posix_spawn())

2023-06-29 Thread Laurent Bercot



 Fixes pushed to git, thanks!

 When given an unexecutable path, child_spawn() returns 0, but errno
is unset... that's on purpose. Unfortunately, in the parent there is
no way to know the child's execve() error code; all we have is the
exit status, 127, and we cannot report the reason for the failure.
Rather than set errno to something that may be wrong and prompt the
caller to take inadequate measures, I'd rather set it to 0, which
glibc reports as "success" but really means "no error information"
except in a few, well-known contexts; and let the caller deal with
the lack of more accurate reporting. I know it's not satisfying, but
we can't do any better.



I have realised that a simpler unexecutable path can be, for example,
/etc (is it mandated in POSIX?); this can save the mkstemp() call
in the sysdep test.


 POSIX doesn't mandate any path other than /dev/null and /dev/console
and I'd rather not try executing them, who knows what weird permissions
they may have on obscure OSes.
 It's a sysdep test, it's not performance-critical; I'd rather use
mkstemp() to be *sure* we have a path that does not exist.
(Of course the user could always race the program, but we're not trying
to harden against stupidity here.)



(And frankly I personally do not really find it much worthwhile to
introduce this amount of complexity for the broken dependency of a
quite minor performance optimisation...)


 I agree it's a lot of work for not much, but as you said, the
behaviour is arguably conformant, and your experience proves that old
glibcs are still around, so I'd rather make posix_spawn usable where
it exists instead of placing the burden of --with-sysdep-posixspawn=no
on users who have a bad version.

 As shown by the qemu bug I linked above, this impacts s6-svscan,
which relies on correct child_spawn() reporting when running custom
signal handlers, so not working around bad posix_spawn QoI may lead
to buggy signal management in s6-svscan, and nobody wants that.


 A cursory web search appears to say that glibc-2.27 is when they fixed
the posix_spawn QoI; 2.17 being bad is consistent with that. But I can't
be bothered to go spelunk in glibc code to check and/or bisect, so if
someone could confirm, thank you, otherwise, no big deal.

--
 Laurent



posix_spawn (was: Bugs with execline/s6 documentation and skalibs functions using posix_spawn())

2023-06-28 Thread Laurent Bercot



 I pushed a workaround to the skalibs git.
 Could you please try a build on a machine that exhibits the early
return behaviour and tell me if
 - the behaviour is correctly detected by ./configure (the last sysdep)
 - the child_spawn*() family of functions now works properly even on
this machine?

 Also, can you please tell me what version of glibc these distribution
versions are running?

 Thanks!

--
 Laurent



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Laurent Bercot

Actually I copied the fragment of posix_spawn(3) from a Devuan Chimaera
machine, so the problem may be not specific to CentOS 7.  I did not test
CentOS 6 or other distro (version)s, for example; but on Rocky Linux 8,
which I unfortunately also need to support at work, the behaviour is
as expected.  Attached is a simple test.


 It may be a bug in some old glibc, then.



If we assume posix_spawn(3) and posix_spawn(3p) were the only possible
behaviours (which is frankly not that reliable, judging from how
neither manpage noted the violation of conformance), then the two
behaviours could be distinguished with the attached test.


 They're not the only possible behaviours: for instance, [1] shows that
under some buggy qemu, posix_spawn() always returns early. But that
behaviour can also be caught by the same workaround as the glibc
behaviour you're observing, so it's fine.

 Since the bug is more widespread than "one old version of one distro",
is visible in production environments used at large, and seems 
constrained

to "posix_spawn succeeds even if exec fails", which is testable,
I'll add a sysdep to detect it and a workaround in child_spawn*, but it
will mean additional manual --with-sysdep-foobar=blah noise for
skalibs cross-builds.

[1]: https://skarnet.org/lists/skaware/1658.html

--
 Laurent



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Laurent Bercot

 Testing the behaviour may be challenging, however, because I suspect
the CentOS 7 implementation of posix_spawn() is just racy, and they
simply documented that they don't care.


 Thinking about it more, I'm afraid it's not a testable behaviour.
Not only isn't there any way to force the race since it entirely
happens inside a libc function, but also, the test would require
running code on the build machine, which doesn't work for cross-builds
and people would have to manually set the sysdep anyway.

 It seems like --with-sysdep-posixspawn=no, as you did, is the easiest
workaround.

--
 Laurent



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Laurent Bercot

As a more general fix, I think tryposixspawn.c should at least try
spawning a probably unexecutable path (like the one above) as well,
which corrects the sysdep on systems where the expected conformance
is broken.


 Adding a sysdep to detect that case is a good idea indeed!
 Rather than pretending it doesn't exist, though, I'd rather add a
different sysdep that tests its behaviour, so it can still be used
with the proper workaround.

 Testing the behaviour may be challenging, however, because I suspect
the CentOS 7 implementation of posix_spawn() is just racy, and they
simply documented that they don't care.

--
 Laurent



Re: Bugs with execline/s6 documentation and skalibs functions using posix_spawn()

2023-06-27 Thread Laurent Bercot

* In `trap.html', there is a reference to the removed `timeout' keyword.


 Fixed.



* In `s6-svscan-not-1.html', the systemd unit (traumatic experience with
  it, as you may easily expect) lacks a `KillMode = process'.


 I believe the correct setting is actually KillMode=mixed; and the
ExecStop= line is incorrect as well since ExecStop expects a synchronous
command, not an asynchronous one. Better let systemd just send a SIGTERM
to s6-svscan, wait for the supervision tree to exit on its own, and
SIGKILL the stragglers. I pushed a fix accordingly.



* The child_spawn*() family of functions, depending on using posix_spawn
  or not, exhibit different behaviours on CentOS 7 (trauma again), as
  posix_spawnp() may return 0 with argv pointing to unexecutable paths.
  This, for example, results in s6-svscan not exiting on SIGTERM when
  .s6-svscan/SIGTERM is absent.  The behaviour of posix_spawnp() on
  CentOS 7 does not conform to posix_spawn(3p), but is documented in
  posix_spawn(3): "Even when these functions return a success status,
  the child process may still fail for a plethora of reasons related to
  its pre-exec() initialization.  In addition, the exec(3) may fail."


 Yeah, well, tough for non-conforming systems.
 That said, I also pushed a change last week that should have fixed
this issue as a side effect, so it's all good. If you feel like it,
you can try the s6-svscan version in the latest s6 git. :)


> --with-sysdep-devurandom

 Also fixed.

 Thanks a lot for these reports!

--
 Laurent



Re: utmps privilege

2023-06-25 Thread Laurent Bercot


 What's happening is that utmps-utmpd only checks the value of the
*primary* gid of the client. It does not check supplementary groups.
I agree that it's counter-intuitive, and will see I can fix that.


 Unfortunately, no, that's not fixable. The credentials-passing
mechanism used by s6-ipcserverd (the superserver for utmps-utmpd) only
transmits the primary gid, not the supplementary groups; and I'm not
aware of another reasonably portable credentials-passing mechanism,
let alone that transmits supplementary groups - except the suid
mechanism, which, no.

 So you're going to have to keep setting your *primary* group to utmp
if you want to modify the utmp database as a regular user. Sorry.

--
 Laurent



Re: utmps privilege

2023-06-24 Thread Laurent Bercot



 Please avoid using a HTML client, it looks like your converter is
buggy and giving some garbled output (your top output is unreadable).

 What's happening is that utmps-utmpd only checks the value of the
*primary* gid of the client. It does not check supplementary groups.
I agree that it's counter-intuitive, and will see I can fix that.
Thanks for the report.

--
 Laurent



Re: s6-linux-init without virtual consoles

2023-06-07 Thread Laurent Bercot



 Thanks for the kind words, Oli :)
 It's all fine, really. In all fairness, yes, I *was* a little cheeky,
because Esben sounded very dramatic about a harmless warning.

 But there's a legitimate UX takeaway here: the warning is indeed
needlessly scary. So it will be changed in the next s6-l-i release.
And yes, I suppose I can add the verbosity setting while I'm at it
(it doesn't sound super useful, but it's not expensive either, so if
people want it, why not.)

--
 Laurent



Re: s6-linux-init without virtual consoles

2023-06-01 Thread Laurent Bercot




I think it would be fair to be able to configure s6-linux-init so that
it does not rely on specific details about what hardware is available.


 Then I have some good news for you: s6-linux-init already does not
rely on specific details about what hardware is available.

 Because if it did, and assumed that you have a virtual console, and you
didn't, then it would crash. And you would be a very sad panda. And
so would I.

 But it doesn't.

 What you're seeing is known as a run-time test: the existence of a
/dev/tty0 device is tested. And if such a device exists, then s6-l-i
attempts to support kbrequest on it. See? conditional support. It's
nice and sweet and simple and has fewer failure cases (because the
more configuration switches you have, the more you risk human error.)

 When you don't have a virtual console, s6-l-i works perfectly fine.

 If there was no warning message, you would never have noticed the extra
system call, and you wouldn't be here asking for offline configuration
where online configuration works. But there is a warning message, and
that's what you don't like.

 So yes, the problem you have *is* the warning message per se, not the
fact that s6-l-i performs one completely undetectable superflous open()
call in headless systems.

 So let's talk about the message.
 I agree it's not particularly elegant to print a warning on every boot
in a normal configuration. So it could be refined: if devtmpfs can be
relied on to always provide /dev/tty0 when a console exists, then
when there's no such device, instead of "warning: missing device",
s6-l-i could print "info: headless system detected".

 I think that would be less scary than a "warning", and users of 
headless

and headful (?) systems could keep living together in peace and harmony.

 What do you think?

--
 Laurent



Re: 回复: lastlog support

2023-06-01 Thread Laurent Bercot

I checked the shadow utils site. It's provide a lastlog CLI. while it's a lack 
of lastlogd similar to utmpd/wtmpd.


 The lastlog file isn't managed by utmp, but by the login program, with
or without assistance from PAM. It's an entirely different operation,
and I don't understand why you'd want utmps to be involved.

--
 Laurent



Re: s6-linux-init without virtual consoles

2023-06-01 Thread Laurent Bercot




While that might make sense when the system is expected to have a
/dev/tty0 device, it is kind of messy to see that on systems that is not
supposed to have /dev/tty0.


 Kernels and various parts of init systems print warning messages all
the time for similar reasons (some operation failed because it's not
supported in the current configuration), I don't think it's fair to
single out this one. I would prefer to do nothing.

 That said, if it's important for more users, I could probably add a
verbosity setting, where -v0 would silence warning messages.
 The problem is that it would do so for *all* warning messages, you'd
have no way to tell whether you missed a warning that was actually
relevant to you.

 And no, I'm not adding a separate switch for every warning message
in the program :P

--
 Laurent



Re: lastlog support

2023-05-31 Thread Laurent Bercot

is there any plan to support lastlog in utmps project?


 lastlog uses a separate /var/log/lastlog file, so it's not directly
tied to utmp. If anything, it *uses* utmp, so it's the other way around:
the shadow-utils package should support utmps.

--
 Laurent



Re: s6-log not responding to signals

2023-05-30 Thread Laurent Bercot

And the timeout is only going to start during exit, right?


 Naturally. :)

--
 Laurent



Re: s6-log not responding to signals

2023-05-26 Thread Laurent Bercot

While that would make s6-log nicer to integrate with s6-rc, I still
think that the current behavior of potentially blocking SIGTERM forever
is undesirable, so some kind of timeout in s6-log could still be a good
idea.


 That's why I was suggesting a timeout. And since logging a partial line
as a complete line is always strictly better than dropping the partial
line, once you have the timeout feature, you don't need anything else:
set the timeout to 1 ms if you want to exit immediately even with
partial lines.

--
 Laurent



Re: s6-log not responding to signals

2023-05-26 Thread Laurent Bercot


The goal is to never write partial lines.  So if the process is sent a
signal to exit while a partial line have been received, simply exit
without writing anything to file.


 One of the goals is not to write a partial line if it can be avoided;
but it defers to the more important goal of not losing any data.
Your suggestion goes against that more important goal.



I would vote for simply dropping it.  And as we are shutting down, the
whole thing is a kind of race anyway, so the first part of the line
could just as well have been not received at all, so I think we can
safely just throw it away without even waiting for it.


 Nope. Not happening.

 Certainly, on shutdown, it doesn't matter whether you get that last
log line or not. But loggers don't only get killed on shutdown. There
are other, good, reasons why you would want to kill (and restart) an
s6-log process, and not losing any data is important in these cases.

--
 Laurent



Re: s6-log not responding to signals

2023-05-26 Thread Laurent Bercot

How are you thinking changes to termination behaviour will interact with the 
existing -p option?


 There would be no specific interaction.
 -p only makes s6-log ignore SIGTERM. The signal is received, but does
nothing.
 The new timeout option would make it wait on receipt of an exit signal,
be it SIGTERM or SIGHUP. So, with -p, it would only trigger the new
behaviour on SIGHUP, and keep doing nothing on SIGTERM.



As suggested by the documentation, when s6-log is waiting for a newline to 
arrive,
its behaviour could be influenced by a) EOF on stdin, b) termination signal.

Are you thinking of adding the timeout only if there is a termination signal,
but EOF has not yet been detected?


 There are two exit conditions for s6-log:
 1. it reads EOF on stdin;
 2. it receives a SIGTERM (unless -p), or a SIGHUP.

 On EOF, s6-log exits *immediately*. If it has a partial line in its
buffer, it will process and log it as a full line before exiting. It
does not wait because there is no reason to: the producer closed the
data stream, so s6-log is never getting any more data to finish the
line.

 On a termination signal, the producer isn't necessarily done sending
logs; the signal comes from a third party (the administrator). s6-log's
goal is to exit asap but without losing any data, and on a line 
boundary.

If there's nothing in its buffer, it exits immediately, but if there's
a partial line, it will wait for the producer to send it the rest of
the line, process this line, and then exit without reading anything
more.
 (If the producer has more to send, it can do so if the pipe to s6-log
is being fd-held; the next s6-log incarnation then resumes where the
old one has stopped. If the pipe isn't being fd-held, then the producer
gets a broken pipe error, but knows exactly what it has successfully
sent and what it has not: no data has been lost in a buffer.)

 My suggestion is to add a timeout in the only case s6-log doesn't
exit immediately: when it gets a termination signal and there is a
partial line in the buffer. The wait is meant to give some leeway for
the producer to send the rest of the line before s6-log exits, but if
no such rest of the line is coming, it would be better for s6-log not
to wait forever.

--
 Laurent



Re: s6-log not responding to signals

2023-05-25 Thread Laurent Bercot

The problem is that until a new-line is received, s6-log will not
respond to SIGHUP and SIGTERM.  I assume this is not as expected.


 This is expected; the goal is to finish reading partial lines
before existing. This is useful with services that are writing a
large amount of logs, where the buffer length does not necessarily
align with a newline: after receiving the signal, the logger reads
until the next newline, processes the line, then exits.

 No service should ever write a partial line at the end of their
lifetime.

 However, I agree that the situation you're describing is not ideal
and s6-log should be more robust. I'm thinking of adding a timeout:
if s6-log hasn't received the end of a partial line n milliseconds
after receiving a terminating signal, then it should process the
partial line anyway and exit. What do you think?

--
 Laurent



Re: s6-linux-init-man-pages

2023-04-07 Thread Laurent Bercot

An mdoc(7) port of the documentation for s6-linux-init is now available:

https://git.sr.ht/~flexibeast/s6-linux-init-man-pages/archive/v1.1.1.0.1.tar.gz


 拾



Re: *-man-pages: s6-rc port, Makefile fix

2023-04-04 Thread Laurent Bercot




An mdoc(7) port of the documentation for s6-rc is now available:


 That's awesome, thanks a lot Alexis! 拾

--
 Laurent



[announce] April 2023 bugfix release

2023-04-02 Thread Laurent Bercot



 Hello,

 New versions of some skarnet.org packages are available. They fix a few
visible bugs, so users are encouraged to upgrade.

 I usually do not announce bugfix releases. This e-mail is sent because
two new functionalities were also in git when the bugfixes needed to be
made, so they're now available:

 - A new -D option to execline's elgetopt, allowing customization of the
value for the ELGETOPT_n variable when the -n option is given and does
not expect an argument.

 - A new -R option to s6-linux-init-maker, allowing you to set hard
resource limits for the system at boot time.

skalibs-2.13.1.1   (release)
execline-2.9.3.0   (minor)
s6-2.11.3.2(release)
s6-linux-init-1.1.1.0  (minor)
s6-portable-utils-2.3.0.2  (release)


 Enjoy,
 Bug-reports always welcome.

--
 Laurent



Re: [PATCH] Multicall improvements didn't improve trap

2023-03-28 Thread Laurent Bercot

Sending signals to the trap process does nothing. If I revert
9d55d49dad0f4cb90e6ff2f9b1c3bc46a6fcf05f, trap works as expected. After
some debugging I think that the pids array in trap.c contains garbage
since it isn't initialized statically anymore (see the attached patch).


 Your diagnosis is correct indeed, the change from static to automatic
removed the implicit initialization to 0.

 Patch applied, thanks.

--
 Laurent



Re: [execline] Conditional export

2023-03-23 Thread Laurent Bercot



 I'm going to regret this.

ifthenelse -s { eltest -f ${FILE} } { export EXISTS ${FILE} } { }
env

 This construct is purposefully not documented, because it breaks
syntactic and logic assumptions that are true in the rest of execline.
But it can simplify your life in a handful of cases, like this one.

 What it does: it will *prepend the rest of your script* with the
contents of the second or the third block, depending on whether the
test in the first block is true.

 Do not overuse it. Do not ask for support about it. If it makes your
script easier to maintain, enjoy. If you start feeling like a sorcerer
and are tempted to explore what kind of magical feats you can accomplish
with it, don't - it will not end well.

--
 Laurent



Re: [PATCH] execline: multicall: make sort independent of locale

2023-02-18 Thread Laurent Bercot



 Can you please tell me what locale you're using, for testing purposes ?

--
 Laurent



Re: [PATCH] execline: multicall: make sort independent of locale

2023-02-18 Thread Laurent Bercot

reset LC_ALL to avoid locale dependent sorting.
This is critical to ensure bsort() works reliably.

In my locale "execline-cd" was sorted after "execlineb"


... lol. Changing the sorting order for ASCII characters is probably the
most insane misdesign in locales. Good catch!

 Thanks for the patch. Going to apply with a slight modification: making
the change global to the whole script, for easier maintainability.

--
 Laurent



[announce] skarnet.org February 2023 release

2023-02-17 Thread Laurent Bercot



 Hello,

 New versions of some skarnet.org packages are available. It hasn't been
long since the last release, but lots of small things have happened and
it doesn't make much sense to let them rot in git.

 The main addition is a new multicall configuration for the execline,
s6-portable-utils and s6-linux utils packages. When you give the
--enable-multicall option to configure, a single binary is compiled,
and 'make install' installs this binary and creates symlinks to it.
This is useful to setups that focus on saving disk space.

 Credit for this addition goes to Dominique Martinet, who nerd-sniped me
into actually testing such a configuration; and it turned out the disk
space gains were very impressive for execline (up to 87%!)
I applied the idea to the s6-portable-utils and s6-linux-utils packages,
which are also made of small, very simple, independent programs, to see
whether it was viable in the general case; but as I suspected, the gains
were not as impressive, and making it work required a significant
refactoring effort. Since other skarnet.org packages would have an even
worse gains/effort ratio, the experiment is stopping there. execline is
an outlier, with a 177 kB amd64 static binary being able to replace a
1.3 MB set of binaries; that's much better than I thought it would be,
so it's worth supporting. Enjoy.

 Other changes include mostly bugfixes and quality-of-life improvements.

 The new versions are the following:

skalibs-2.13.1.0  (minor)
nsss-0.2.0.3  (release)
execline-2.9.2.0  (minor)
s6-2.11.3.0   (minor)
s6-rc-0.5.4.0 (minor)
s6-linux-init-1.1.0.0 (major)
s6-portable-utils-2.3.0.0 (major)
s6-linux-utils-2.6.1.0(minor)
s6-networking-2.5.1.3 (release)
mdevd-0.1.6.2 (release)

 Details of some of these package changes follow.

* skalibs-2.13.1.0
  

 - Bugfixes.
 - New function: sals, listing the contents of a directory in a 
stralloc.

Straightforward, but a large-ish piece of code that was used in multiple
places and needed to be factored.

 https://skarnet.org/software/skalibs/
 git://git.skarnet.org/skalibs


* execline-2.9.1.0
  

 - New --enable-multicall configure option. This is the big one for
some distributions, that don't want to spend 1 MB of disk space on
execline binaries. (They already know my position on that.)

 https://skarnet.org/software/execline/
 git://git.skarnet.org/execline


* s6-2.11.3.0
  ---

 - Bugfixes.
 - Instance-related internal changes. Instanced service directories
need to be recreated with the new version of s6-instance-maker.
 - New s6-svc -Q command, instructing s6-supervise not to restart the
service when it dies (like -O) and to additionally create a ./down file
in the service directory.
 - s6-ioconnect will now always shutdown() socket endpoints at EOF time;
the -0, -1, -6 and -7 options are still supported, but deprecated.

 https://skarnet.org/software/s6/
 git://git.skarnet.org/s6


* s6-rc-0.5.4.0
  -

 - Bugfixes. In particular, s6-rc-update now conserves the existing
instances in an instanced service, whether the service is currently
active or not. In case of a live update, the current instances keep
running, but will restart with the new template next time they die
(which can be forced by a s6-instance-control -r invocation).
 - New s6-rc subcommands: start and stop, equivalent to "-u change"
and "-d change" respectively.

 https://skarnet.org/software/s6-rc/
 git://git.skarnet.org/s6-rc


* s6-linux-init-1.1.0.0
  -

 - s6-linux-init-maker: -U option removed. No early utmpd script is
created. The reason for this change is that distros using utmps need
stage 2 utmp services anyway (because the wtmp database needs to be
persistent so wtmpd and btmpd can only be started after a log filesystem
has been mounted), so utmp is unusable before that point no matter what.
Distros should have an utmpd service started at the same time as wtmpd
and btmpd; so utmp management goes entirely out of scope for
s6-linux-init.

 https://skarnet.org/software/s6-linux-init/
 git://git.skarnet.org/s6-linux-init


* s6-portable-utils-2.3.0.0
  -

 - s6-test removed, hence the major update.
 - New --enable-multicall configure option.

 https://skarnet.org/software/s6-portable-utils/
 git://git.skarnet.org/s6-portable-utils


* s6-linux-utils-2.6.1.0
  --

 - s6-mount option support updated.
 - New --enable-multicall configure option.

 https://skarnet.org/software/s6-linux-utils/
 git://git.skarnet.org/s6-linux-utils


 Enjoy,
 Bug-reports welcome.

--
 Laurent



Re: s6 instanced services are "forgotten" after s6-rc-update

2023-02-09 Thread Laurent Bercot

Am I missing something?


 Sorry, I had failed to push the changes. Fixed now.


Re: s6 instanced services are "forgotten" after s6-rc-update

2023-02-07 Thread Laurent Bercot




On s6-instance-update, as a user I'd expect it to exist and be run
automatically as part of s6-rc-update. Since restarting applies the
new definition for longruns, having to do one extra step (or two if
s6-instance-update isn't made) per instance of a templated longrun
would be counterintuitive.


 It took a bit of creative juice, and significant refactoring, but I
think I finally got it down.

 The current s6-rc git will remember your created instances from one
compiled db to the next across s6-rc-update, no matter the state of the
service at the time of the update.

 If the updated instanced service is live, then the new template will
be copied to all the instances, but the instances will still be
running on the old template until they're killed via 
s6-instance-control,

at which point they will restart on the new template.
 This is in line with the principle of maximizing service uptime and
waiting for admin input before killing processes.

 Please try it and tell me if it's working for you. You'll need to
build against the latest git of skalibs and s6.

--
 Laurent



Re: single-binary for execline programs?

2023-02-02 Thread Laurent Bercot

Yes, this is only possible because you did a very good job in the first
place. Good work! This cannot be said enough.


 Thanks.
 I managed to de-global the arrays in trap.c, so now the only 
unavoidable

global is in forstdin: a pointer to a structure accessed by a signal
handler.
 You'd think with all the siginfo stuff, POSIX would have thought of
mandating a void * auxiliary pointer you'd give to sigaction() and that
would be stored and provided to the signal handler, but no, there's just
no room to pass user data other than globally. Yet another example of
wonderful, user-friendly design.

 But yeah, 8 bytes of bss/data for the whole thing is pretty good, the
crt and the libc are basically the only static RAM users, so there's
nothing more to do here.



I was also curious about starting time and should have done that in my
previous mail, it's a bit slower as expected.


 Yeah, a 0.2 ms difference is fine, I think. :P But I'm not sure if
it's possible to get an accurate benchmark, because the cost of 4-5
strcmp()s are negligible before the cost of the execve's in the first
place. I suspect at least half of the difference comes from mapping a
bigger executable.



I think the main reason to like shared libraries as a distribution is
that if you upgrade it, you get the upgrade for all programs that depend
on it -- which isn't reall a problem for this.


 Oh, absolutely, and that's why it's hard to advocate static linking to
distributions. It's a very reasonable argument for dynamic linking.



At the risk of repeating myself, I'll be happy to help with anything
related to this -- that's the least I can do given I brought it up.


 Thank you. I might seriously take you up on that offer further down
the road. :)
 But really, since the "cat everything together" method works in this
case, there's not much more to do except pay attention when writing
or editing normal programs in the future.

 I pushed "multicall-strip" and "multicall-install" targets in git,
and documented the setup in the INSTALL file. As experimental, because
although I *think* everything is working, there may still be some
interaction I've missed.

--
 Laurent



Re: single-binary for execline programs?

2023-02-01 Thread Laurent Bercot




allow you to link against a dynamic libexecline. Can you do it, see
how much space you gain? That's a configuration I would definitely
support, even if it's slower - people usually love shared libraries.


 I'm tired. This configuration is obviously already supported, and no
need to patch. You just need to ./configure --disable-allstatic.

 Compared to fully static binaries with musl, a fully static multicall
is a size gain of 87%. Compared to fully dynamic binaries, it's a gain
of 69%. Still impressive.

 A fully dynamic multicall binary is only 80 kB on x86_64, but it's
a pretty stupid configuration because it's the only user of libexecline
so nothing is gained by sharing it. There still may be a case for
sharing libskarnet; but toolchains make it so difficult to link against
some libraries statically and some others dynamically that supporting
that configuration is just not worth it.

--
 Laurent



Re: single-binary for execline programs?

2023-02-01 Thread Laurent Bercot

Look, here's a trivial, suboptimal wrapper, far from pretty:

> (...)

(look, I said it wasn't pretty -- there are at least a dozen of problems
with this, but nothing a day of work I offered to do can't fix; I wrote
this because it was faster than talking without a concrete example to
have some figures, and that took me less time than the rest of this mail)


 Damn you to the nine circles of Hell, one by one, slowly, then all of
them at the same time.
 You piqued my curiosity, so I did it, and I spent the day making it 
work.


 The execline git now has a 'multicall' make target. It will make an
"execline" binary that has *everything* in it. You can symlink it to
the name of an execline program and it will do what you expect. You can
also call the subcommand as argv[1]: "execline exit 3" will exit 3.

 No install targets, no automatic stripping, no symlinks, nothing.
I don't want to officially support this configuration, because I *know*
it will be a time sink - every ricer on the planet will want me to 
change

something. So you get the binary for your own enjoyment, and that's it.
Have fun. If it breaks, you get to keep both pieces.

 It's really rough: it only marginally improves on your model, fixing
the most glaring problems. The only fancy thing it does is find the
applet via bsearch(), because that's easy and it saves about 20 strcmp()
per call. Apart from that, it's super dumb.

 That said, you were right: that's some pretty hefty saving of disk
space. The execline binary is 169kB, statically linked against musl on
x86_64. That's neat. I expected it to be at least twice bigger. And
the data/bss isn't too bad either: only 2 pages. But that's because
execline programs use very little global memory in the first place -
the only places where globals are used is when state needs to be
accessed by signal handlers, and there's nothing I can do about that in
short-lived programs. (Long-lived programs use a selfpipe, so only
one int of global data is ever needed for them.)

 So, all in all, much better results than I expected, it was a pleasant
surprise. Still, concatenating all the code feels really clunky, and
a real multicall program needs to be designed for this from the start,
which won't happen for execline in the foreseeable future, so this is
as much as you get for now.

 If you're interested in hacking the thing, the magic happens in
tools/gen-multicall.sh.



 libexecline is statically linked, so these pages aren't shared afaik?


 That's right, I forgot it was always statically linked. If it helps,
changing ${LIBEXECLINE} to -lexecline in the src/execline/deps-exe
files, then running ./tools/gen-deps.sh > package/deps.mak, should
allow you to link against a dynamic libexecline. Can you do it, see
how much space you gain? That's a configuration I would definitely
support, even if it's slower - people usually love shared libraries.



I really don't see what's different between e.g. execline and coreutils,
who apparently thought it was worth it;


 coreutils also thought it was worth it to implement true --help and
true --version, so I'll leave to your imagination how much I value their
technical judgment.
 The only way to know for sure whether it will be worth it is to stop
speculating and start profiling, which is what I did. And it appears
the results are interesting, so, that's great!

 Sigh. I shouldn't feel that way, and any potential improvement should
be a source of joy, not dread - but really I wish the results weren't
so good. Now Pandora's box has been opened and everyone will want to
use the multicall exclusively, so at some point I'll have to support it,
i.e. ensure it's actually correct and enhance its maintainability.
And that means a lot more work. :(



But, unfortunately for you, the full openrc suite is 2.2MB (5 on arm
with bloated aarch64), which is a bit less than the s6 suite :-D


 No, that's fair. It's true that s6 takes a bit more disk space.
Where OpenRC loses is RAM and CPU, because it does everything in
shell scripts. And shell scripts definitely win on disk space. :)

--
 Laurent



Re: single-binary for execline programs?

2023-02-01 Thread Laurent Bercot

I believe I did my homework looking first -- are there other discussion
channels than this list that one should be aware of?


 The lists are definitely the only place you *should* be aware of, but
there are a lot of informal spaces where discussions happen, because not
everyone is as well-behaved as you are :) Github issues, webforums of
other projects, IRC channels, etc.
 The important stuff normally only happens here, but I'm getting user
feedback from several sources.



I'd go out a limb and say if you only support single-binary mode, some
of the code could be simplified further by sharing some argument
handling, but it's hard to do simpler than your exlsn_main wrapper so
it'll likely be identical with individual programs not changing at all,
with just an extra shim to wrap them all; it's not like busybox where
individual binaries can be selected so a static wrapper would be dead
simple.


 I doubt much sharing would be possible.

 The main problem I have with multicall is that the program's
functionality changes depending on argv[0]. You need to first select
on argv[0], and *then* you can parse options and handle arguments.
Note that each exlsn_* function needs its own call to subgetopt_r(),
despite the options being very similar because they all fill an
eltransforminfo_t structure.

 Having a shim over *all* the execline programs would be that,
multiplied by the number of programs; at the source level, there would
not be any significant refactoring, because each program is pretty much
its own thing. An executable is its own atomic unit, more or less.

 If anything, execline is the package that's the *least* adapted to
multicall because of this. There is no possible sharing between
"if" and "piperw", for instance, because these are two small units with
very distinct functionality. The only way to make execline suited to
multicall would be to entirely refactor the code of the executables and
make a giant library, à la busybox. And I am familiar enough with
analyzing and patching busybox that I certainly do not want to add that
kind of maintenance nightmare to execline.

 Anything that can be shared in execline is pretty much already shared
in libexecline. If you build execline with full shared libraries, you
get as much code sharing as is reasonably accessible without a complete
rearchitecture.
 Any significant disk space you would gain in a multicall binary
compared to a bunch of dynamically linked executables would come from
the deduplication of unavoidable ELF boilerplate and C run-time, and
that's basically it.

 The "one unique binary" argument applies better to some of my other
software; for instance, the latest s6-instance-* additions to s6.
I considered making a unique "s6-instance" binary, with varying
functionality depending on an argv[1] subcommand; I eventually decided
against it because it would have broken UI consistency with the rest of
s6, but it would have been a reasonable choice for this set of programs 
-

which are already thin wrappers around library calls and share a lot
of code. Same thing with s6-fdholder-*.
 execline binaries, by contrast, are all over the place, and *not* good
candidates for multicall.



Hmm, I'd need to do some measurements, but my impression would be that
since the overall size is smaller it should pay off for any pipeline
calling more than a handful of binaries, as you'll benefit from running
the same binary multiple times rather than having to look through
multiple binaries (even without optimizing the execs out).


 Yes, you might win a few pages by sharing the text, but I'm more
concerned about bss and data. Although I take some care in minimizing
globals, I know that in my typical small programs, it won't matter if
I add an int global, because the amount of global data I need will
never reach 4k, so it won't map an extra page.

 When you start aggregating applets, the cost of globals skyrockets.
You need to pay extra attention to every piece of data. Let me bring
the example of busybox again: vda, the maintainer, does an excellent
job of keeping the bss/data overhead low (only 2 pages of global
private/dirty), but that's at the price of keeping it front and
center, always, when reviewing and merging patches, and nacking stuff
that would otherwise be a significant improvement. It's *hard*, and
hampers code agility in a serious way. I don't want that.

 Sure, you can say that globals are a bad idea anyway, but a lot of
programs need *some* state, if local to a TU - and the C and ELF models
make it so that TU-local variables still end up in the global data
section.



Even almost 1MB (the x86_64 version that doesn't have the problem,
package currently 852KB installed size + filesystem overhead..) is
still something I consider big for the systems I'm building, even
without the binutils issue it's getting harder to fit in a complete
rootfs in 100MB.


 I will never understand how disk space is an issue for execline and s6.
 RAM absolutely is, because 

Re: s6 instanced services are "forgotten" after s6-rc-update

2023-01-31 Thread Laurent Bercot




Agree on avoiding restarting old instances. If instances were atomic
services, s6-rc-update wouldn't restart them either.

OTOH, the template's files are copied, not symlinked, which means
restarting old instances will use the old template. Does this call for
an s6-instance-update program?


 The fix I currently have in git does exactly that: instances are now
correctly transmitted across s6-rc-update, and not restarted; the new
template is copied, but it's not copied to existing instances, it will
only be used for new ones. To get the new template on an existing
instance, you need s6-instance-delete + s6-instance-create.

 There may indeed be some value to an s6-instance-update program that
would provide a new template to an existing instance, with an option
to immediately restart the instance or not. I'll think about it some 
more,

inputs welcome.

--
 Laurent



Re: single-binary for execline programs?

2023-01-31 Thread Laurent Bercot

In particular there's a "feature" with recent binutils that makes every
binary be at least 64KB on arm/aarch64[1], so the execline package is a
whopping 3.41MB[2] there (... and still 852KB on x86_64[3]) -- whereas
just doing a dummy sed to avoid conflict on main and bundling all .c
together in a single binary yields just 148KB (x86_64 but should be
similar on all archs -- we're talking x20 bloat from aarch64/armv7
sizes! Precious memory and disk space!)

> (...)

It should be fairly easy to do something like coreutils'
--enable-single-binary without much modification


 The subject has come up a few times recently, so, at the risk of being
blunt, I will make it very clear and definitive, for future reference:

 No. It will not happen.

 The fact that toolchains are becoming worse and worse is not imputable
to execline, or to the way I write or package software. It has always
been possible, and reasonable, to provide a lot of small binaries.
Building a binary is not inherently more complicated today than it was
20 years ago. There is no fundamental reason why this should change; the
only reason why people are even thinking this is that there is an
implicit assumption that software always becomes better with time, and
using the latest versions is always a good idea. I am guilty of this
too.

 This assumption is true when it comes to bugs, but it becomes false if
the main functionality of a project is impacted.
 If a newer version of binutils is unable to produce reasonably small
binaries, to the point that it incites software developers to change
their packaging to accommodate the tool, then it's not an improvement,
it's a recession. And the place to fix it is binutils.
 The tooling should be at the service of programmers, not the other way
around.

 It is a similar issue when glibc makes it expensive in terms of RAM to
run a large number of copies of the same process. Linux, like other
Unix-like kernels, is very efficient at this, and shares everything that
can be shared, but glibc performs *a lot* of private mappings that incur
considerable overhead. (See the thread around this message:
https://skarnet.org/lists/supervision/2804.html
for an example.)
 Does that mean that running 100 copies of the same binary is a bad
model? No, it just means that glibc is terrible at that and needs
improvement.

 Back in the day when Solaris was relevant, it had an incredibly
expensive implementation of fork(), which made it difficult, especially
with the processing power of 1990s-era Sun hardware, to write servers
that forked and still served a reasonable number of connections.
It led to emerging "good practices", that were taught by my (otherwise
wonderful) C/Unix programming teacher, and that were: fork as little as
possible, use a single process to do everything. And that's how most
userspace on Solaris worked indeed.
 It did a lot of harm to the ecosystem, turning programs into giant
messes because people did not want to use the primitives that were
available to them for fear of inefficiency, and jumping through hoops
to work around it at the expense of maintainability.
 Switching to Linux and its efficient fork() was a relief.

 Multicall binaries have costs, mostly maintainability costs.
Switching from a multiple binaries model to a multicall binary model
because the tooling is making the multiple binaries model unusably
expensive is basically moving the burden from the tooling to the
maintainer. Here's a worse tool, do more effort to accommodate it!

 Additionally to maintainability costs, multicall binaries also have a
small cost in CPU usage (binary starting time) and RAM usage (larger
mappings, fewer memory optimizations) compared to multiple binaries.
These costs are paid not by the maintainer, but by the users.
Everyone loses.

 Well, no. If having a bunch of execline binaries becomes more expensive
in disk space because of an "upgrade" in binutils, that is a binutils
problem, and the place to fix it is binutils.



In the long run this could also provide a workaround for conflicting
names, cf. old 2016 thread[4], if we'd prefer either running the
appropriate main directly or re-exec'ing into the current binary after
setting argv[0] appropriately for "builtins".


 There have been no conflicts since "import". I do not expect more name
conflicts in the future, and in any case, that is not an issue that
multicall binaries can solve any better than multiple binaries. These
are completely orthogonal things.



(I assume you wouldn't like the idea of not installing the individual
commands, but that'd become a possibility as well. I'm personally a bit
uncomfortable having something in $PATH for 'if' and other commands that
have historically been shell builtins, but have a different usage for
execline...)


 You're not the only one who is uncomfortable with it, but it's really a
perception thing. There has never been a problem caused by it. Shells
don't get confused. External tools don't get confused. On this 

Re: s6 instanced services are "forgotten" after s6-rc-update

2023-01-31 Thread Laurent Bercot




I can provide an strace of s6-rc-update if needed. Looking into it, it
seems s6-rc-update "uncritically" unlinks the live instance/ and instances/
folders and replaces them with brand-new copies from the compiled database.


 I can confirm that this happens and that it was an oversight; I'm now
in the process of fixing it (which will involve a few changes to s6 
ending

up in a major update, I'm afraid).

 A question I have is: what should s6-rc-update do when the template has
changed? The template will obviously be changed in the new service, but
should the old instances stay alive, with the old template? My natural
inclinaton is to say yes; if the user wants the service restarted they
can say so explicitly in the conversion file. But maybe there are better
alternatives I haven't thought about.

--
 Laurent



Re: s6 instanced services are "forgotten" after s6-rc-update

2023-01-28 Thread Laurent Bercot




After having an instanced service definition for s6-rc, subsequent
calls to s6-rc-update seem to clobber the instance/ and instances/
subfolders in a way that keeps the instances running, but makes it
impossible to control them.

After s6-rc-update, I get error messages like:

fatal: unable to open /run/service/agetties/instance/.s6-svscan/lock:
No such file or directory
fatal: unable to control /run/service/agetties/instance/tty1: No such
file or directory


 Hmm, that's weird. Are you sure you're using the latest s6-rc version?
Only 0.5.3.3 will correctly manage instances.

 If you are, I'll try to reproduce the issue to understand what's going 
on.


--
 Laurent



[announce] skarnet.org January 2023 release

2023-01-14 Thread Laurent Bercot



 Hello,

 New versions of the skarnet.org packages are available. This release
is overdue, sorry for the delay - but finally, happy new year everyone!

 skalibs' strerr_* functions and macros, meant to provide shortcuts for
error message composition and output, have been rewritten; they're no
longer split between strerr.h and strerr2.h, but are all gathered in
strerr.h - the skalibs/strerr2.h headers is now deprecated.
 This is released as a major version upgrade to skalibs because some
hardly ever used strerr macros have been outright removed; and the
deprecation of strerr2.h also counts as an API change. However, unless
you were using the deleted strerr macros (highly unlikely, as there
was no reason to, which is why they're being deleted in the first 
place),

your software should still build as is with the new skalibs, maybe
with warnings.

 The rest of the skarnet.org software stack has undergone at least a
release bump, in order to build with the new skalibs with no warnings.
Most packages also include several bugfixes, so upgrading the whole
stack is recommended.

 The new version of s6 includes a feature that has often been asked
for: an implementation of dynamically instanced services. Six new
commands allow you to create and manage dynamic instances of a given
service directory template, parameterized by an argument you give
to the run script.
 It also comes with a few quality-of-life changes, such as s6-log
line prefixing, as well as a good number of minor bugfixes.

 The "s6-test" program, formerly in s6-portable-utils, has migrated
to the execline package, where it is named "eltest". It still exists
in s6-portable-utils, but is deprecated and will be removed in a
future release.

 The new versions are the following:

skalibs-2.13.0.0 (major)
nsss-0.2.0.2 (release)
utmps-0.1.2.1(release)
execline-2.9.1.0 (minor)
s6-2.11.2.0  (minor)
s6-rc-0.5.3.3(release)
s6-linux-init-1.0.8.1(release)
s6-portable-utils-2.2.5.1(release)
s6-linux-utils-2.6.0.1   (release)
s6-dns-2.3.5.5   (release)
s6-networking-2.5.1.2(release)
mdevd-0.1.6.1(release)
smtpd-starttls-proxy-0.0.1.2 (release)
bcnm-0.0.1.6 (release)
dnsfunnel-0.0.1.5(release)

 Details of some of these package changes follow.


* skalibs-2.13.0.0
  

 - Bugfixes.
 - New functions: buffer_timed_put, buffer_timed_puts, for synchronous
writes to a file descriptor with a time limit.
 - strerr2.h deprecated. strerr.h entirely revamped. Every existing
strerr interface is now a variable argument macro around the new
strerr_warnv, strerr_warnvsys, strerr_diev and strerr_dievsys functions,
which just prints arrays of strings to stderr. This reduces the amount
of adhocness in the strerr code considerably, allows calls without an
upper bound on the number of strings, and should save some bytes in
resulting binaries.

 https://skarnet.org/software/skalibs/
 git://git.skarnet.org/skalibs


* execline-2.9.1.0
  

 - Bugfixes.
 - New program: eltest. This is the program formely available in
s6-portable-utils as "s6-test", that has changed packages and be 
renamed.

It's a quasi-implementation of the POSIX "test" utility, that was too
useful in execline scripts to be off in a separate package. (Quasi
because the exact spec is bad.) It understands -v, for testing the
existence of a variable, and =~, for regular expression matching.

 https://skarnet.org/software/execline/
 git://git.skarnet.org/execline


* s6-2.11.2.0
  ---

 - Bugfixes.
 - The one-second service restart delay can now only be skipped when the
service is ready. This prevents CPU hogging when a heavy service takes
a long time to start and fails before reaching readiness.
 - The name of the service is now passed as the first argument to ./run
and as the third argument to ./finish.
 - s6-log now understands a new directive: p. "pfoobar:" means that the
current log line will be prepended with the "foobar: " prefix. This
allows service differentiation in downstream log processing, which was
an often requested feature.
 - New commands available: s6-instance-maker, s6-instance-create,
s6-instance-delete, s6-instance-control, s6-instance-status,
s6-instance-list. They allow you to manage supervised sets of services
created from the same templated service directory with only a parameter
(the name of the instance) changing.

 https://skarnet.org/software/s6/
 git://git.skarnet.org/s6


* s6-portable-utils-2.2.5.1
  -

 - s6-test is now deprecated, replaced with the eltest program in the
execline package.

 https://skarnet.org/software/s6-portable-utils/
 git://git.skarnet.org/s6-portable-utils


 Enjoy,
 Bug-reports welcome as always.

--
 Laurent



Re: skabus: more related software

2023-01-06 Thread Laurent Bercot

This is a complete message bus implementation https://codeberg.org/maandree/bus
Perhaps it can be reused?
Or at least mentioned under "Similar work" here 
https://skarnet.org/software/skabus/


 I wouldn't say "complete", because depending on your definition of 
"bus"

it's unfortunately not possible to implement one without a daemon on
Unix - but yeah it's doing pubsub, similarly to s6's libftrig (but
more efficiently via shmem).
 Added to the skabus "similar work" section, thanks for mentioning it!

--
 Laurent



Re: [PATCH] Document skalibs/siovec.h header

2023-01-04 Thread Laurent Bercot



 Thanks! Merged with some rewrites where the decription wasn't accurate.
 Also wrote some doc for siovec_search() which is the one that required
actual effort to come up with :P

 It allowed me to spot and fix a small bug, too.

 The release is coming soon, but I still need to document, test, and
polish a new s6 feature, so it will be a few more days, sorry about
that.

--
 Laurent



Re: (u)intN_bfmt macros use (u)intN0_fmt_base instead of (u)intN_fmt_base

2022-11-20 Thread Laurent Bercot

In src/header/bits-template,

line 22:
#define uint@BITS@_bfmt(s, b) uint@BITS@0_fmt_base(s, (b), 2)

and line 45:
#define int@BITS@_bfmt(s, b) int@BITS@0_fmt_base(s, (b), 2)

shouldn't have zeros


 Good catch, thanks! Fixed in current git.

--
 Laurent



Re: Reading s6-rc database without root for completion

2022-11-10 Thread Laurent Bercot

I'd like for the user to be able to complete `sudo s6-rc -u change
some...` from a non-root terminal, but trying to use the output of
`s6-rc-db list services` fails as it can't take a lock in
/run/s6-rc/compiled.


 Hm, that's an oversight on my part. Reading the database should be
possible by normal users, but the lock is currently taken O_RDWR 
(because

the locking primitive is the same for reading and for writing), so
it fails. I will fix that.

 Until then, sure, read your info from the source directory, but be
aware it may not be in sync with the current live database.

 Thanks for the report!

--
 Laurent



Re: [s6-dns] is there a particular reason skadns_packet would return NULL errno ENETUNREACH?

2022-10-12 Thread Laurent Bercot

To clarify, I'm referring to the ->target member (in srv) or the ->exchange 
member (in mx).

Are those not the same as the input format for skadns_send?


 When parsed by s6dns_message_parse_answer_srv() and
s6dns_message_parse_answer_mx(), the domains are obtained from the
packet via s6dns_message_get_domain(), which call s6dns_domain_decode().

 In other words, when you obtain a s6dns_message_rr_srv_t or a
s6dns_message_rr_mx_t, the domains in these structures are in string
format. (Because usually they're destined to be returned to the
application and displayed, not used in another packet right away.)
 So if you want to reuse these domains for another skadns_send()
query, you need to re-encode them first via s6dns_domain_encode().

 Thanks Guillermo for getting to the bottom of this! :)

--
 Laurent



Re: [s6-dns] is there a particular reason skadns_packet would return NULL errno ENETUNREACH?

2022-10-10 Thread Laurent Bercot




However, the OS would still deliver them to skadnsd in a recv() /
recvfrom() call, right? If my reading of the truss outputs is correct,
the HardenedBSD system isn't getting a response at all,


 That's right, which is why my hypothesis of the RD bit filter only
applied to OmniOS, which did get responses but these got ignored
by skadnsd. On HardenedBSD, 18 queries getting no answers from the
caches is absolutely a different problem.



 and whatever
error happens with the program running on the OmniOS system, if any,
does not involve the network


 It involves the relevance test:
 
https://github.com/skarnet/s6-dns/blob/master/src/libs6dns/s6dns_engine.c#L32

 This function is called on every incoming message that is a potential
response. If it returns 0, the message is deemed irrelevant to the
current query, and ignored. When you see a recv() (or recvfrom()) from
a UDP socket, but no answer is reported to the client and the socket is
still polled until it times out, it means that the relevant() test 
failed.


 Until tonight, the "h.rd != (q[2] & 1)" test, i.e. "is the rd bit of
the response different from the rd bit of the query", was performed
outside of the "strict" guard. This made some responses be ignored as
malformed, because it's the cache not following the RFC; it is quite
possible that it's what happened on OmniOS here.



 (I can't tell if skadnsd is delivering
all received answers to the client).


 After the first one which is a connection/synchronization marker,
a write() to the async pipe to the client (10 on HardenedBSD, 9 on
OmniOS) is an answer or a sequence of answers. (skadnsd buffers the
answers into a textmessage_sender, i.e. a bufalloc, which is flushed
at the next ppoll() invocation.) Writes of length 7 are failures
(4 bytes length, 2 bytes query id, 1 byte errno); writes of length
14 are 2 reports of failure, you can see it in the string. 28 is
4 failures; 95 and 140 are likely 1 success (length, query id, 0
for success, then the response packet); 279 is likely two successes.

 At the end of the traces, we get EOF on 0 while there are still a
lot of sockets being polled. That's the client exiting - or at least
closing the skadns connection - while some queries are in-flight.
The bro math checks out, it definitely looks like all received
answers, positive and negative, have been delivered.



I feel that packet capture tools like tcpdump(1) or OmniOS' snoop(8)
would be better suited for answering the questions that have been
raised so far (malformed packets, ignored responses, lack of
responses, etc.).


 strace has an option to print full strings. truss should have a
similar option (if its display can be trusted...) You're right that
packet capture tools would be good to use in this situation, but since
I personally loathe using them, I don't want to ask other people to
use them, and I can work with what we have. On HardenedBSD at least,
the traces are readable.



 Also, aren't 18 outstanding queries in a short
amount of time from one single host, like, a lot? Couldn't Shaw's
caches think that they are being DoS'ed :P ?


 That's definitely possible, and I would say likely, but I don't want
to lay the blame on others before making sure we're in the clear. :)

--
 Laurent



Re: [s6-dns] is there a particular reason skadns_packet would return NULL errno ENETUNREACH?

2022-10-10 Thread Laurent Bercot

Anyway. Pre-update `/package/web/s6-dns/command/s6-dnsip[46] 
perihelion.ultradian.club` returns the correct response on both machines, even 
if run after doing the SRV and MX lookups.


 Wilder and wilder. Can you test s6-dnsip[46]-filter?
{ echo domain1.org ; echo domain2.org ; ... } | s6-dnsip4-filter
These do A and  queries, but via skadns. If skadnsd is the culprit,
the -filter programs should fail.



(side note: I'm realizing that my program makes duplicate queries. This 
shouldn't impact the accuracy of the responses, but it does mean the caches 
could be blocking me or something, but not blocking me when I use 
/package/web/s6-dns/command/s6-dnsip[46].)


 Could be. We're trying to build a simple test case that fails. If our
simple test cases all pass and your program fails, the cause may be in
the way your program is spamming the cache - but you'd have to ask the
cache administrators about querying policies to test that hypothesis.

--
 Laurent



Re: [s6-dns] is there a particular reason skadns_packet would return NULL errno ENETUNREACH?

2022-10-10 Thread Laurent Bercot



 s6dns_engine filters answers that do not seem relevant to in-flight
queries. That includes malformed answers or ones that do not follow
RFC 1035.
 I was made aware (thanks, Ermine) that some caches fail to set the
RD bit in their responses to queries containing the RD bit; these
answers were ignored.
 I just pushed a workaround to the s6-dns git, to only perform the
RD check on answers when a "strict" flag is given, which it's not
in any of the command-line wrappers or in skadnsd.

 Can you please try with the latest s6-dns git and see if the answers
you're getting on OmniOS are accepted this time?

--
 Laurent



Re: [s6-dns] is there a particular reason skadns_packet would return NULL errno ENETUNREACH?

2022-10-10 Thread Laurent Bercot




On OmniOS, all the DNS queries (apparently 58) received a response. On
HardenedBSD, only the first 4 queries received a response, the next 18
timed out. They were retried 4 additional times, as expected, again
timing out without receiving a response.


 The fd of the async pipe to the client isn't the same in both outputs:
it's 9 on OmniOS and 10 on HardenedBSD, which means the client uses one
more fd on HardenedBSD for some reason. (Does OmniOS support signalfd()?
That would explain it.)

 On HardenedBSD, 4 queries received responses, that were properly
reported to the client. The others were pending and retried with longer
timeouts, but only 6 of them reported a full timeout to the client.
The client exited while 12 queries were technically still in flight.

 On OmniOS, I can't even make sense of some of the strings, typically
in the async responses to the client. What is the endianness of this
machine? A network byte order 32-bit number equal to 3 seems to be
encoded as { 0, 0, 3, 0 }, which doesn't look right. (I did check my
uint32_bswap() primitive.) If the client isn't complaining very loudly
when it receives such strings, it means the strings are correct and the
truss tool displays them incorrectly, which doesn't help me diagnose
what's going on.

 In any case the problems look unrelated to skadnsd and come from the
interaction between the s6-dns library and the caches: either the
packets are correct and the caches are not sending the responses they
should, and that's not an s6-dns problem, or the packets are malformed
and that's why the servers are ignoring them, and I need to fix that.
 Amelia, could you do some tests (with the same caches) from s6-dns
command-line clients such as s6-dnsip4? That will bypass the skadns
layer, and will be easier to trace and understand. Thanks :)

--
 Laurent



Re: [s6-dns] is there a particular reason skadns_packet would return NULL errno ENETUNREACH?

2022-10-06 Thread Laurent Bercot

Neither of those conditions actually apply - my network is up and my resolver 
is responding (albeit slowly - it takes about a second). I get the expected 
response on the first batch of queries I fire off, but then the second batch 
gets ENETUNREACH. This happens every time I run my program (albeit on special 
snowflake illumos; I have not tried on other OSes).


 If you think s6-dns is behaving incorrectly, please pastebin a strace
(or local equivalent) of skadnsd somewhere, so we can check what it is
doing.

--
 Laurent



Re: [s6-dns] is there a particular reason skadns_packet would return NULL errno ENETUNREACH?

2022-10-06 Thread Laurent Bercot




i source spelunked and the story is that, if the error is coming from 
s6dns_engine_prepare, dt->protostate exceeds or equals 4. I chased that struct 
member around a few times and I couldn't figure out what it means to s6dns.


 dt->protostate is used for two things:

 - in UDP mode, to track how many times the query has been sent to the
whole list of caches and all of them have failed to answer within a
given timeout. The timeout increases for each round.

 - in TCP mode, to track how many bytes of the query have been written
and how many bytes of the answer have been received (a congested
network may result in short writes or reads).

 The error you got indeed happens when you're in UDP mode (the starting
default for every query), dt->protostate has reached 4 and
s6dns_engine_prepare() returns 0 ENETUNREACH, which
s6dns_engine_timeout() stores into dt->status and skadnsd then sends
back to your client.

 What it means is that your query was sent in succession to every
cache listed in dt->servers (most likely, the list of "nameserver"
entries in your /etc/resolv.conf, unless you overrode it with the
DNSCACHEIP environment variable), and every one of them failed to
answer within 1 second, then within 3 seconds, then within 11
seconds, then within 45 seconds. That sounds like either your
nameserver list is bad, or your own network is down; and s6-dns reports
this as ENETUNREACH.

--
 Laurent



  1   2   3   4   5   6   >