Re: Improving Shepherd

2018-02-15 Thread Mark H Weaver
Jelle Licht  writes:

> tl;dr: I cannot seem to block signals from being handled by guile in
> some way, which to me seems a prerequisite for using signalfd-based
> signal handling. My uneducated guess is that guile needs to support a
> way to set signal masks for all threads in order to deal with this.

Does POSIX provide a way to set the signal mask for another thread?
Last time I looked, I couldn't find one.

I've long desired to get rid of the signal thread in Guile, and instead
arrange for signals to be delivered directly to the thread that's
supposed to receive it.  Guile's 'sigaction' has long allowed the user
to specify which thread should receive each kind of signal, although
POSIX doesn't support this.

I want to do this for a couple of reasons.  One is to avoid spawning
threads unless the user asks for it, to avoid possible safety issues
with fork.  Another reason is that I'd like to arrange for long-running
system calls to be reliably interrupted when a signal is received.

The main thing I've been stuck on is that I haven't found a way to set
the signal mask on other threads.

  Mark



Re: Improving Shepherd

2018-02-15 Thread Jelle Licht


Ludovic Courtès  writes:


Heya,

Jelle Licht  skribis:

Good news: signfalfd seems to work as far as I can see. I am 
not quite sure

how to make it work consistently with guile ports yet though.


Good!  What do you mean by “work with guile ports” though?



It seems that I am running into problems with the way guile 
handles

signals atm. As far as I understood the good people of #guile on
freenode, guile handles signals with a separate thread that 
actually
makes sure signal handling is done at the 'right' time. As such, 
it
seems that there is no easy way to set the mask of blocked signals 
for

all guile threads.

My approach was to wrap `pthread_sigmask' (initially 
`sigprocmask') icw
a call to `signalfd', but it seems that "my" guile thread only 
receives

the signal about ~two-thirds of the time. This only happens when
triggering the signal via 'external' means, such as the kill 
command.
Using the `raise' function from within my guile repl/program did 
always

reliably trigger events coming in on my signalfd based port.

Without being able to block all relevant signals via 
`pthread_sigmask'
from the other guile threads, it seems very difficult to reliably 
use

signalfd based ports to handle signals. Some (ugly) code at [1]
demonstrates this: run the guile script, and find the pid of the 
guile

process via `pgrep', and then send a SIGCHLD signal via `kill -17
'. You should still see the signal handler for the supposedly
blocked signal be triggered.

tl;dr: I cannot seem to block signals from being handled by guile 
in
some way, which to me seems a prerequisite for using 
signalfd-based
signal handling. My uneducated guess is that guile needs to 
support a
way to set signal masks for all threads in order to deal with 
this.



To make use of signalfd, one normally masks signals so that 
these can
handled via signalfd instead of the default signal handlers; 
any process
forked start out with the same signal mask, so we would need to 
make

sure to either reset the signal mask for spawned processes.


Right, we could do that in ‘exec-command’, which is the central 
place

for fork+exec.


Right, this does not seem as difficult as I initially thought. If 
the
earlier things I mentioned are resolved/worked around, this should 
be

easy to implement.


Well, let us know what to do next, then!  :-)

Ludo’.


-Jelle

[1]: https://paste.debian.net/1010454/



Re: Improving Shepherd

2018-02-15 Thread Andy Wingo
On Wed 14 Feb 2018 14:10, l...@gnu.org (Ludovic Courtès) writes:

> Christopher Lemmer Webber  skribis:
>
>> Ludovic Courtès writes:
>>
>>> Hopefully it’s nothing serious: Fibers doesn’t rely on anything
>>> architecture-specific.
>>
>> I think it relies on epoll currently?  But I think there should be no
>> reason other architectures couldn't also be supported.
>
> Ooh good point, that may rule out GNU/Hurd.  :-/

You can always replace the epoll module :)

Andy



Re: Improving Shepherd

2018-02-14 Thread Ludovic Courtès
Heya,

Jelle Licht  skribis:

> Good news: signfalfd seems to work as far as I can see. I am not quite sure
> how to make it work consistently with guile ports yet though. 

Good!  What do you mean by “work with guile ports” though?

> To make use of signalfd, one normally masks signals so that these can 
> handled via signalfd instead of the default signal handlers; any process
> forked start out with the same signal mask, so we would need to make
> sure to either reset the signal mask for spawned processes. 

Right, we could do that in ‘exec-command’, which is the central place
for fork+exec.

Well, let us know what to do next, then!  :-)

Ludo’.



Re: Improving Shepherd

2018-02-14 Thread Ludovic Courtès
Christopher Lemmer Webber  skribis:

> Ludovic Courtès writes:
>
>> Hopefully it’s nothing serious: Fibers doesn’t rely on anything
>> architecture-specific.
>
> I think it relies on epoll currently?  But I think there should be no
> reason other architectures couldn't also be supported.

Ooh good point, that may rule out GNU/Hurd.  :-/

Ludo’.



Re: Improving Shepherd

2018-02-10 Thread Jelle Licht
Hey all,

2018-02-05 14:08 GMT+01:00 Ludovic Courtès :

> Hello!
>
> [...]
>
> Currently shepherd monitors SIGCHLD, and it’s not supposed to miss
> those; in some cases it might handle them later than you’d expect, which
> means that in the meantime you see a zombie process, but otherwise it
> seems to work.
>
> ISTR you reported an issue when using ‘shepherd --daemonize’, right?
> Perhaps the issue is limited to that mode?
>

Playing around with signalfd(2) for a bit, it seems that implementations
are
allowed to coalesce several 'pending' signals at the same time. In the case
of SIGCHLD, this means the parent process might never be properly
informed of *mutliple* signals being received around the same time. Could
it have something to do with this problem as well?

>
> > Concurrency/parallelism - I think Jelle was planning to work on this,
> > but I might be wrong about that. Maybe I volunteered? We're keen to
> > see Shepherd starting services in parallel, where possible. This will
> > require some changes to the way we start/stop services (because at the
> > moment we just send a "start" signal to a single service to start it,
> > which makes it hard to be parallel), and will require us to actually
> > build some sort of real dependency resolution. Longer-term our goal
> > should be to bring fibers into Shepherd, but Efraim mentioned that
> > fibers doesn't compile on ARM at the moment, so we'll have to get that
> > working first at least.
>
> I’d really like to see that happen.  I’ve become more familiar with
> Fibers, and I think it’ll be perfect for the Shepherd (and we’ll fix the
> ARM build issue, no doubt.)
>
> One thing I’d like to do is to handle SIGCHLD via signalfd(2) instead of
> an actual signal handler like we do now.  That would make it easy to
> have signal handling part of the main event loop and thus, it would
> integrate well with Fibers.
>
> It seems that signalfd(2) is Linux-only though, which is a bummer.  The
> solution might be to get over it and have it implemented on GNU/Hurd…
> (I saw this discussion:
> ; I
> suspect it’s within reach.)
>

Good news: signfalfd seems to work as far as I can see. I am not quite sure
how to make it work consistently with guile ports yet though.

To make use of signalfd, one normally masks signals so that these can
handled via signalfd instead of the default signal handlers; any process
forked start out with the same signal mask, so we would need to make
sure to either reset the signal mask for spawned processes.

>
> [...]
>
> Ludo’.
>
>
Jelle


Re: Improving Shepherd

2018-02-09 Thread Ludovic Courtès
Hey!

Danny Milosavljevic  skribis:

> On Mon, 05 Feb 2018 21:49:08 +1100
> Carlo Zancanaro  wrote:
>
>> User services - Alex has already sent a patch to the list to allow 
>> generating user services from the Guix side. The idea is to 
>> generate a Shepherd config file, allowing a user to invoke 
>> shepherd manually to start their services.
>
>>A further extension to 
>> this would be to have something like systemd's "user sessions", 
>> where the pid 1 Shepherd automatically starts a user's services 
>> when they log in.
>
> I assume that means "starts a user's shepherd when they log in".
>
> elogind already emits a signal on dbus which tells you when a user logged in
>
> return sd_bus_emit_signal(
> u->manager->bus,
> "/org/freedesktop/login1",
> "org.freedesktop.login1.Manager",
> new_user ? "UserNew" : "UserRemoved",
> "uo", (uint32_t) u->uid, p);

I think there’s Guile D-Bus client though.  Another yak to shave…

> Also, a directory /run/user/ appears - which alternatively can be
> monitored by inotify or something.
>
> So the system shepherd could have a shepherd service which does
>
>   while (1) {
>  wait until /run/user/ appears
>  vfork
>if child: setuid, exec user shepherd, _exit
>if parent: wait until child dies
>   }
>
> We better be sure that no one else can create directories in /run/user .
>
> In non-pseudocode, both "wait until /run/user/ appears" and
> "wait until child dies" would have to be in the same call,
> maybe epoll or something.

Yes, inotify (ISTR there *are* inotify bindings for Guile somewhere.)

> Maybe call the service shepherd-nursery-service or something, like a star
> nursery :)

:-)

Ludo’.



Re: Improving Shepherd

2018-02-05 Thread Carlo Zancanaro

Hey Danny,

On Mon, Feb 05 2018, Danny Milosavljevic wrote:

I assume that means "starts a user's shepherd when they log in".


Either that, or run the services itself. In either case, what you 
have sent is very helpful!


User namespaces just present a different set of names to your 
process

(via VFS) so it looks like a chroot basically.
It does nothing for processes except fake their ids and limit 
your

overview of them.

You probably want process groups (see setsid(2)) or maybe 
containers.


Okay. I've been trying to read about containers/cgroups/namespaces 
and I think my mind has just blurred them all into the same thing. 
I'll read up about process groups. Thanks!


Carlo


signature.asc
Description: PGP signature


Re: Improving Shepherd

2018-02-05 Thread Danny Milosavljevic
Hi Carlo,

On Mon, 05 Feb 2018 21:49:08 +1100
Carlo Zancanaro  wrote:

> User services - Alex has already sent a patch to the list to allow 
> generating user services from the Guix side. The idea is to 
> generate a Shepherd config file, allowing a user to invoke 
> shepherd manually to start their services.

>A further extension to 
> this would be to have something like systemd's "user sessions", 
> where the pid 1 Shepherd automatically starts a user's services 
> when they log in.

I assume that means "starts a user's shepherd when they log in".

elogind already emits a signal on dbus which tells you when a user logged in

return sd_bus_emit_signal(
u->manager->bus,
"/org/freedesktop/login1",
"org.freedesktop.login1.Manager",
new_user ? "UserNew" : "UserRemoved",
"uo", (uint32_t) u->uid, p);

Also, a directory /run/user/ appears - which alternatively can be
monitored by inotify or something.

So the system shepherd could have a shepherd service which does

  while (1) {
 wait until /run/user/ appears
 vfork
   if child: setuid, exec user shepherd, _exit
   if parent: wait until child dies
  }

We better be sure that no one else can create directories in /run/user .

In non-pseudocode, both "wait until /run/user/ appears" and
"wait until child dies" would have to be in the same call,
maybe epoll or something.

Maybe call the service shepherd-nursery-service or something, like a star
nursery :)

> Child process control - this is my personal frustration, where 
> Shepherd loses track of processes that fork away (e.g. "emacs 
> --daemon"). I barely know anything about Linux process management, 
> but from my reading this can be solved through Linux namespaces 
> (if user namespaces are available). Could someone who knows more 
> about this let me know if that's a productive direction for me to 
> investigate? Or tell me a better way to go about it?

User namespaces just present a different set of names to your process
(via VFS) so it looks like a chroot basically.
It does nothing for processes except fake their ids and limit your
overview of them.

You probably want process groups (see setsid(2)) or maybe containers.



Re: Improving Shepherd

2018-02-05 Thread Carlo Zancanaro

Hey Ludo,

On Mon, Feb 05 2018, Ludovic Courtès wrote:
User services - Alex has already sent a patch to the list to 
allow
generating user services from the Guix side. The idea is to 
generate a
Shepherd config file, allowing a user to invoke shepherd 
manually to
start their services. A further extension to this would be to 
have
something like systemd's "user sessions", where the pid 1 
Shepherd

automatically starts a user's services when they log in.


After replying to Alex’ message, I realized that we could just 
as well
have a separate “guix service” or similar tool to take care of 
this?


This needs more thought (and perhaps taking a look at systemd 
user
sessions, which I’m not familiar with), but I think Alex’ 
approach is a

good starting point.


We were thinking it might work like this:
- services->package constructs a package which places a file in 
the profile containing the necessary references
- pid 1 shepherd listens to elogind login/logout events, and 
starts the services when necessary


Admittedly this isn't the nicest way for it to work, but it might 
be a good starting point.


There were some discussions on the list a while ago about how to 
have `guix environment` automatically start services, too, so I 
wonder what overlap there could be there. Although maybe 
environment services (in containers) have more in common with 
system services than user services.



Child process control - this is my personal frustration, where
Shepherd loses track of processes that fork away (e.g. "emacs
--daemon"). I barely know anything about Linux process 
management, but
from my reading this can be solved through Linux namespaces (if 
user
namespaces are available). Could someone who knows more about 
this let
me know if that's a productive direction for me to investigate? 
Or

tell me a better way to go about it?


Currently shepherd monitors SIGCHLD, and it’s not supposed to 
miss
those; in some cases it might handle them later than you’d 
expect, which
means that in the meantime you see a zombie process, but 
otherwise it

seems to work.

ISTR you reported an issue when using ‘shepherd --daemonize’, 
right?

Perhaps the issue is limited to that mode?


I no longer use the daemonize function. My user shepherd runs "in 
the foreground" (it's started when my X session starts), so it's 
not that. Jelle fixed the problem I was having by delaying the 
SIGCHLD handler registration until it's needed. It is still buggy 
if a process is started before the daemonize command is given to 
root service, though.


If you try running "emacs --daemon" with 
"make-forkexec-constructor" (and #:pid-file, and put something in 
your emacs config to make it write out the pid) you should be able 
to reproduce what I am seeing. If you kill emacs (or if it 
crashes) then shepherd continues to report that it is started and 
running. When I look at htop's output I can also see that my emacs 
process is not a child of my shepherd process.


I would like to add a --daemon/--daemonize command line argument 
to shepherd instead of the current "send the root service a 
daemonize message". I think the use cases of turning it into a 
daemon later are limited, and it just gives you an additional way 
of shooting yourself in the foot.


Concurrency/parallelism - I think Jelle was planning to work on 
this,
but I might be wrong about that. Maybe I volunteered? We're 
keen to
see Shepherd starting services in parallel, where possible. 
This will
require some changes to the way we start/stop services (because 
at the
moment we just send a "start" signal to a single service to 
start it,
which makes it hard to be parallel), and will require us to 
actually
build some sort of real dependency resolution. Longer-term our 
goal
should be to bring fibers into Shepherd, but Efraim mentioned 
that
fibers doesn't compile on ARM at the moment, so we'll have to 
get that

working first at least.


I’d really like to see that happen.  I’ve become more familiar 
with
Fibers, and I think it’ll be perfect for the Shepherd (and we’ll 
fix the

ARM build issue, no doubt.)


I'm not going to put much time/effort into this until we have 
fibers building on ARM. I think these changes are likely to break 
shepherd's config API, too. In particular, with higher levels of 
concurrency I want to move the mutable state out of  
objects.


It seems that signalfd(2) is Linux-only though, which is a 
bummer.  The
solution might be to get over it and have it implemented on 
GNU/Hurd…

(I saw this discussion:
; 
I

suspect it’s within reach.)


Failing that, could we have our signal handlers just convert the 
signal to a message in our event loop? I have a very rudimentary 
understanding of signal handling, but I assume we could have our 
main event loop just reading things off of two channels: one of 
signal events, one of fd events.


This would mean that Shepherd could decide the best way to 

Re: Improving Shepherd

2018-02-05 Thread Ludovic Courtès
Hello!

Carlo Zancanaro  skribis:

> A few people came to join me on Friday to think about Shepherd. Thanks
> Alex, Efraim, and Jelle.

Thanks for summarizing!  I was hoping to chime in as well but that did
not happen.

> User services - Alex has already sent a patch to the list to allow
> generating user services from the Guix side. The idea is to generate a
> Shepherd config file, allowing a user to invoke shepherd manually to
> start their services. A further extension to this would be to have
> something like systemd's "user sessions", where the pid 1 Shepherd
> automatically starts a user's services when they log in.

After replying to Alex’ message, I realized that we could just as well
have a separate “guix service” or similar tool to take care of this?

This needs more thought (and perhaps taking a look at systemd user
sessions, which I’m not familiar with), but I think Alex’ approach is a
good starting point.

> Child process control - this is my personal frustration, where
> Shepherd loses track of processes that fork away (e.g. "emacs
> --daemon"). I barely know anything about Linux process management, but
> from my reading this can be solved through Linux namespaces (if user
> namespaces are available). Could someone who knows more about this let
> me know if that's a productive direction for me to investigate? Or
> tell me a better way to go about it?

Currently shepherd monitors SIGCHLD, and it’s not supposed to miss
those; in some cases it might handle them later than you’d expect, which
means that in the meantime you see a zombie process, but otherwise it
seems to work.

ISTR you reported an issue when using ‘shepherd --daemonize’, right?
Perhaps the issue is limited to that mode?

> Concurrency/parallelism - I think Jelle was planning to work on this,
> but I might be wrong about that. Maybe I volunteered? We're keen to
> see Shepherd starting services in parallel, where possible. This will
> require some changes to the way we start/stop services (because at the
> moment we just send a "start" signal to a single service to start it,
> which makes it hard to be parallel), and will require us to actually
> build some sort of real dependency resolution. Longer-term our goal
> should be to bring fibers into Shepherd, but Efraim mentioned that
> fibers doesn't compile on ARM at the moment, so we'll have to get that
> working first at least.

I’d really like to see that happen.  I’ve become more familiar with
Fibers, and I think it’ll be perfect for the Shepherd (and we’ll fix the
ARM build issue, no doubt.)

One thing I’d like to do is to handle SIGCHLD via signalfd(2) instead of
an actual signal handler like we do now.  That would make it easy to
have signal handling part of the main event loop and thus, it would
integrate well with Fibers.

It seems that signalfd(2) is Linux-only though, which is a bummer.  The
solution might be to get over it and have it implemented on GNU/Hurd…
(I saw this discussion:
; I
suspect it’s within reach.)

> I mentioned an idea to the guys on Friday about how Shepherd should
> treat enabled/disabled services. I've thought about it some more, and
> I think it might work. The general idea is that Shepherd would always
> try to run an enabled service, and it would leave a disabled service
> as-is (unless it's needed to start another service). So it would kind
> of work like this:
> - if stopped and enabled: try to start service
> - if started and enabled: monitor, and restart service if it fails
> - if retrying too often: disable this service, and all which depend on
> it
> - else: only start if another enabled service depends on this one
>
> This would mean that Shepherd could decide the best way to start/stop
> services, including doing so in parallel if possible.

Sounds good.  That’s annoyed most of us already, so if you get that
fixed, you’ll make a lot of people happy.  :-)

Ludo’.



Re: Improving Shepherd

2018-02-05 Thread Carlo Zancanaro
A few people came to join me on Friday to think about Shepherd. 
Thanks Alex, Efraim, and Jelle.


We talked about a few different things that we'd like to achieve 
with Shepherd. The most significant and achievable things were, I 
think: user services, child process control, and 
concurrency/parallelism.


User services - Alex has already sent a patch to the list to allow 
generating user services from the Guix side. The idea is to 
generate a Shepherd config file, allowing a user to invoke 
shepherd manually to start their services. A further extension to 
this would be to have something like systemd's "user sessions", 
where the pid 1 Shepherd automatically starts a user's services 
when they log in.


Child process control - this is my personal frustration, where 
Shepherd loses track of processes that fork away (e.g. "emacs 
--daemon"). I barely know anything about Linux process management, 
but from my reading this can be solved through Linux namespaces 
(if user namespaces are available). Could someone who knows more 
about this let me know if that's a productive direction for me to 
investigate? Or tell me a better way to go about it?


Concurrency/parallelism - I think Jelle was planning to work on 
this, but I might be wrong about that. Maybe I volunteered? We're 
keen to see Shepherd starting services in parallel, where 
possible. This will require some changes to the way we start/stop 
services (because at the moment we just send a "start" signal to a 
single service to start it, which makes it hard to be parallel), 
and will require us to actually build some sort of real dependency 
resolution. Longer-term our goal should be to bring fibers into 
Shepherd, but Efraim mentioned that fibers doesn't compile on ARM 
at the moment, so we'll have to get that working first at least.


I mentioned an idea to the guys on Friday about how Shepherd 
should treat enabled/disabled services. I've thought about it some 
more, and I think it might work. The general idea is that Shepherd 
would always try to run an enabled service, and it would leave a 
disabled service as-is (unless it's needed to start another 
service). So it would kind of work like this:

- if stopped and enabled: try to start service
- if started and enabled: monitor, and restart service if it 
fails
- if retrying too often: disable this service, and all which 
depend on it

- else: only start if another enabled service depends on this one

This would mean that Shepherd could decide the best way to 
start/stop services, including doing so in parallel if possible.


So, there are our ideas! Any thoughts, or words of wisdom? 
Feedback is welcome.


Carlo

On Mon, Jan 29 2018, Carlo Zancanaro wrote:
I'm keen to do some work on shepherd. Partially this is driven 
by

me using it to manage my user session and having it not always
work right, and partially this is driven by me grepping the code
for "FIXME" (which was slightly overwhelming). If anyone is keen
to chat about it on Friday, please find me! I have some ideas
about things I'd like to do, but I don't really have any idea 
what

I'm doing. Any help/advice/encouragement you can give me will be
appreciated!

Carlo


signature.asc
Description: PGP signature


Re: Improving Shepherd

2018-01-29 Thread Jelle Licht
2018-01-29 22:14 GMT+01:00 Carlo Zancanaro :

> I'm keen to do some work on shepherd. Partially this is driven by me using
> it to manage my user session and having it not always work right, and
> partially this is driven by me grepping the code for "FIXME" (which was
> slightly overwhelming). If anyone is keen to chat about it on Friday,
> please find me! I have some ideas about things I'd like to do, but I don't
> really have any idea what I'm doing. Any help/advice/encouragement you can
> give me will be appreciated!
>

Count me in! I am currently not using GNU Shepherd for my user session yet,
but would like to collaborate on some future direction on making it more
easy to use.
I'll only be there after/around lunch though ;-).

>
> Carlo
>
- Jelle


Improving Shepherd

2018-01-29 Thread Carlo Zancanaro
I'm keen to do some work on shepherd. Partially this is driven by 
me using it to manage my user session and having it not always 
work right, and partially this is driven by me grepping the code 
for "FIXME" (which was slightly overwhelming). If anyone is keen 
to chat about it on Friday, please find me! I have some ideas 
about things I'd like to do, but I don't really have any idea what 
I'm doing. Any help/advice/encouragement you can give me will be 
appreciated!


Carlo


signature.asc
Description: PGP signature