Re: s6-rc - odd warn logging and a best practices question

2015-08-28 Thread Colin Booth
On Fri, Aug 21, 2015 at 2:11 AM, Laurent Bercot ska-skaw...@skarnet.org wrote:
  Wow. Is it a mount -o remount, or a umount followed by a mount ?
  If a -o remount has this effect on file handles, then it's probably
 worth reporting to the kernel guys, because it's insane.

  Even if the script does something nonsensical such as remounting
 everything read-only, which hardly makes any sense for a tmpfs,
 this is not normal behaviour: when I remount a partition in read-only
 mode, and there are still open descriptors for writing, the mount()
 call fails with EBUSY; it does not silently invalidate all the writing
 descriptors!

First reboot in a while so I spent some time tracking this down. It
was caused by some really cute interactions between a few of the
Debian single-user mode system prep scripts. checkfs-bootclean.sh is
safe to run against tmpfs mounts right until you run bootmisc.sh,
which removes the flag files that the clean_all function uses to
identify a tmpfs. So that's been fixed.


  Last time I looked at a mainstream distro's boot cycle, i.e. almost
 10 years ago, it was already unnecessarily complex and convoluted; and
 Debian was far from the worst. I doubt it has become simpler since.

It probably doesn't help that I'm working against the hardest target
too: laptops. Thankfully, the only place where I really need to
interact with the sysvinit stuff is in the collection of oneshots that
are emulating the single-user portion of the bootcycle. I did find a
script in there that will halt an s6-init system if you run it . That
was fun. It's the only place that I found that actually cares about
what init you're running under. In the case where you have sysvinit
but no initctl control pipe (such as can happen if you mount a new
/run over the old one) it recreates that and then fires off SIGUSR1 at
whatever happens to be init at the time.

The only things left to fix are some file permissions and mounts that
the aforementioned script fixes up, and that ACPI sleep handler
weirdness that I mentioned earlier. Plus, you know, not running a
pre-alpha rc system ;)

  systemd will probably make scripting simpler, by moving a lot of the
 complexity into the C code. Which is obviously the worst possible
 solution.

Probably. I almost want to build out a systemd machine to see what the
early boot land looks like. Depending on what the system prep stuff
looks like it might be easier to gut. Like I said though, almost.

Cheers!



-- 
If the doors of perception were cleansed every thing would appear to
man as it is, infinite. For man has closed himself up, till he sees
all things thru' narrow chinks of his cavern.
  --  William Blake


Re: s6-rc - odd warn logging and a best practices question

2015-08-20 Thread Colin Booth
On Thu, Aug 20, 2015 at 1:16 PM, Colin Booth cathe...@gmail.com wrote:
 By the way, I've found a maybe-bug that, if real, is pretty severe.
 `s6-rc -d change all ; some stuff ; s6-rc -u change all' has caused my
 s6-init + s6-rc testbed system to remove the control pipe for my pid 1
 s6-svscan. I need to make sure it wasn't something I did between
 things, and to make sure it wasn't mucked up handling in various
 scripts that I was running. I'm at work right now so I can't test it
 out, but sometime in the next day or so I should have the cycles to
 test it out.

Not a bug in s6-rc or s6 but in some Debian script somewhere. Some
single-user script appears to re-mount all mount points, which has the
net result of causing all file handles into tmpfs mounts to go stale.
That's what's breaking s6-svscan. Once I isolate it, I'll see if I can
avoid calling that script, and if I have to I'll see about moving its
execution somewhere safe.

I am learning way more about the complexities of the distro boot cycle
than I'd ever expected to this week.

Cheers!


-- 
If the doors of perception were cleansed every thing would appear to
man as it is, infinite. For man has closed himself up, till he sees
all things thru' narrow chinks of his cavern.
  --  William Blake


Re: s6-rc - odd warn logging and a best practices question

2015-08-20 Thread Colin Booth
On Thu, Aug 20, 2015 at 2:35 AM, Laurent Bercot ska-skaw...@skarnet.org wrote:

  I can't grep the word addition in my current git, either s6 or s6-rc.
 Are you sure it's not a message you wrote? Can you please give me the
 exact line you're running and the exact output you're getting?
  Thanks,

Ugh, it was something I'd hacked in to s6-svc early on in the life of
s6-rc to track down some issue I was having with something. I never
committed it and sort of assumed that the next git pull I made would
have complained and forced me to back it out. Apparently git merge got
smarter about unstaged non-conflicts recently.

Mystery solved! I'll reply to the other stuff in the other mail fork.


-- 
If the doors of perception were cleansed every thing would appear to
man as it is, infinite. For man has closed himself up, till he sees
all things thru' narrow chinks of his cavern.
  --  William Blake


Re: s6-rc - odd warn logging and a best practices question

2015-08-20 Thread Colin Booth
On Thu, Aug 20, 2015 at 1:57 AM, Laurent Bercot ska-skaw...@skarnet.org wrote:
  Just don't have a notification-fd file. s6-rc will assume your daemon
 is ready as soon as the run script is started. It may spam you with a
 warning on high verbosity levels, but that's it. :)

Yeah, this is for the special case where you have a daemon that
doesn't do readiness notification but also has a non-trivial amount of
initialization work before it starts. For most things doing the below
talked about oneshot/longrun split is best, but sometimes you need to
run that initialization every time (data validators are the most
obvious example).

  If your daemon doesn't support readiness notification, I'd generally
 advise not to pretend it does: even if daemon availability is fast,
 the scheduler can always screw you. So yes, if at all possible, having
 the init in a oneshot and the daemon in a longrun depending on the
 init oneshot is the best way to go, without declaring a notification-fd
 for the longrun. If it's not possible, foregrounding the init, then
 sending a blank notification message, then execing the daemon, is
 probably the least ugly way to proceed.

I was using the readiness signal to enforce the timing between udev's
heavyweight system prep scripts and everything that depends on udev.
Starting udev itself is trivial, and I'm pretty sure that udev doesn't
need to guaranteedly be running for other things to start, it just
needs to be running for the preparation steps. Hence that run the
sysvinit udev script, immediately afterwards stop udev, start it again
supervised dance. Breaking out into a pair of atomics is a lot more
elegant. Why didn't I think of that until last night when I've been
experimenting with this since Monday? Dunno.

  I'm surprised that systemd-udevd doesn't provide notification: that's
 one of the least bad reasons for daemons to integrate with systemd.
 Doesn't it use sd_notify() ?

It does provide notification, but only if you're running under
systemd. At least according to the sd_notify() docs. I'll see about
faking up the environment so sd_notify() is happy and report back.

 Also, it'll be nice to have s6-rc-update, I've been rebooting...
  a lot.


  No need to reboot:

  s6-rc -da change
  for i in `ls -1 $live/servicedirs` ; do rm $scandir/$i ; done
  s6-svscanctl -an $scandir
  rm -rf `basedir $live`/`s6-linkname $live`
  rm $live
  s6-rc-init -l $live -c $newcompiled $scandir
  s6-rc -u change $everythingbundle

  That's more or less what s6-rc-update will do, of course with
 optimizations to avoid restarting everything.

Actually, the more I think about it, the less s6-rc-update will help
me avoid reboots in the short term since part of what I need to get
back is a pristine post-boot environment.

  Power management on Linux laptops is high-level demonology, and mere
 mortals should not dabble in it, lest their souls be consumed. I had a
 friend who tried and came back shaking and drooling... it took him a
 long while to recover. Fortunately, there's *almost* no permanent
 damage to his mind.

HA! I'm pretty sure the failure is in some acpi policy handling glue
code that isn't getting set right. The init.d/acpid script isn't
terribly complicated, I simply need to capture the system state before
and after the init script is run.

Cheers!

-- 
If the doors of perception were cleansed every thing would appear to
man as it is, infinite. For man has closed himself up, till he sees
all things thru' narrow chinks of his cavern.
  --  William Blake


Re: s6-rc - odd warn logging and a best practices question

2015-08-20 Thread Colin Booth
On Thu, Aug 20, 2015 at 10:24 AM, Laurent Bercot
ska-skaw...@skarnet.org wrote:

  Oh, the protocol is complicated too. If I start to implement it,
 there's no stopping, and I'll be running behind systemd every time
 they add something to the protocol, which is exactly what I don't
 want to do.

Sure. And I bet that listening for any message on the socket isn't
good enough since things might be chattery.

  You can enforce a non-race by synchronizing both processes, i.e.
 making the notification listener notify the notification sender
 that it is ready to receive a message. I'm not even joking.
 Notifiception is a thing with the wonderful systemd APIs.

NOW we're talking!

  I see. You could pull those out of the set of services managed by s6-rc
 and just run them sequentially at boot time, until s6-rc-update is out.

Yeah, but then you get into that question of what you do with oneshots
that depend on longruns which are required for initialization... Like
I said, it's a bit of a mess but isn't any more of a mess than someone
who is doing early boot optimization in any other init. Once I've
sorted out all the timing issues (and I think I'm close) it should be
fine.

By the way, I've found a maybe-bug that, if real, is pretty severe.
`s6-rc -d change all ; some stuff ; s6-rc -u change all' has caused my
s6-init + s6-rc testbed system to remove the control pipe for my pid 1
s6-svscan. I need to make sure it wasn't something I did between
things, and to make sure it wasn't mucked up handling in various
scripts that I was running. I'm at work right now so I can't test it
out, but sometime in the next day or so I should have the cycles to
test it out.

Cheers!

-- 
If the doors of perception were cleansed every thing would appear to
man as it is, infinite. For man has closed himself up, till he sees
all things thru' narrow chinks of his cavern.
  --  William Blake


Re: s6-rc - odd warn logging and a best practices question

2015-08-20 Thread Laurent Bercot

On 20/08/2015 16:43, Colin Booth wrote:

Yeah, this is for the special case where you have a daemon that
doesn't do readiness notification but also has a non-trivial amount of
initialization work before it starts. For most things doing the below
talked about oneshot/longrun split is best, but sometimes you need to
run that initialization every time (data validators are the most
obvious example).


 In that case, yes,
 if { init } if { notification } daemon is probably the best. It
represents service readiness almost correctly, if service includes
the initialization.



It does provide notification, but only if you're running under
systemd. At least according to the sd_notify() docs. I'll see about
faking up the environment so sd_notify() is happy and report back.


 systemd's notification API is a pain. It forces you to have a daemon
listening on a Unix socket. So basically you'd have to have a
notification receiver service, communicating with the supervisors -
which eventually makes it a lot simpler to integrate everything into
a single binary.
 This API was made to make systemd look like the only possible design
for a service manager. That's political design to the utmost, and I
hate that with a passion.

 I have a wrapper to make things work the other way (i.e. using
s6-like daemons under systemd), but a wrapper that would actually
understand sd_notify() notifications would be much more painful to
write.


Actually, the more I think about it, the less s6-rc-update will help
me avoid reboots in the short term since part of what I need to get
back is a pristine post-boot environment.


 What do you have in that post-boot environment that would be different
from what you have after shutting down all your s6-rc services and
wiping the live directory ?

--
 Laurent



Re: s6-rc - odd warn logging and a best practices question

2015-08-20 Thread Laurent Bercot

On 20/08/2015 10:57, Laurent Bercot wrote:

s6-svc: warning: /run/s6/rc/scandir/s6rc-fdholder/notification-fdpost
addition of notification-fd

  Looks like a missing/wrong string terminator. Thanks for the report,
I'll look for it.


 I can't grep the word addition in my current git, either s6 or s6-rc.
Are you sure it's not a message you wrote? Can you please give me the
exact line you're running and the exact output you're getting?
 Thanks,

--
 Laurent



Re: s6-rc - odd warn logging and a best practices question

2015-08-20 Thread Colin Booth
On Thu, Aug 20, 2015 at 8:44 AM, Laurent Bercot ska-skaw...@skarnet.org wrote:
  In that case, yes,
  if { init } if { notification } daemon is probably the best. It
 represents service readiness almost correctly, if service includes
 the initialization.

Cool. Not the most elegant but good to know I was on the right track.

 It does provide notification, but only if you're running under
 systemd. At least according to the sd_notify() docs. I'll see about
 faking up the environment so sd_notify() is happy and report back.


  systemd's notification API is a pain. It forces you to have a daemon
 listening on a Unix socket. So basically you'd have to have a
 notification receiver service, communicating with the supervisors -
 which eventually makes it a lot simpler to integrate everything into
 a single binary.
  This API was made to make systemd look like the only possible design
 for a service manager. That's political design to the utmost, and I
 hate that with a passion.

I think only the socket part is fancy systemd-centric design, so
presumably a stupid subscript that takes socket messages and emits
s6-ftrig events could do the reverse of sdnotify_wrapper. I'm thinking
something like s6-ipcserver-socketbinder execing into a background'ed
puller to s6-ftrig-notify chain. The puller would be something like
s6-ftrig-wait but for generic file descriptors instead of fifo dirs
(this probably exists and if not should be reasonably easy), and
s6-ftrig-notify would handle the actual readiness alarm.

The API is definitely more complicated than the s6 notification one,
but it doesn't seem insurmountable. My solution is a bit racy, though
I'd hope a socket puller would start faster than a daemon, scheduler
whims or no.


  What do you have in that post-boot environment that would be different
 from what you have after shutting down all your s6-rc services and
 wiping the live directory ?

Adjustments to modules, locale and hostname setting, re-seeding the
random device. Basically everything that happens in the single-user
boot stage on distro systems. For example, the udev init script does a
lot of work that can't easily be un-done without a reboot.

Cheers!

-- 
If the doors of perception were cleansed every thing would appear to
man as it is, infinite. For man has closed himself up, till he sees
all things thru' narrow chinks of his cavern.
  --  William Blake