Re: data structures for use in signal handlers

2021-02-09 Thread Eric Wong
Bruno Haible  wrote:
> Eric Wong wrote:
> > > How would setting up a pipe or eventfd help with a communication
> > > problem between threads in the same process? I thought these were
> > > for communication between processes.
> > 
> > The "self-pipe trick" is a common idiom in Unix for signal
> > handling; especially when pselect/ppoll aren't available.
> > I seem to recall learning of it reading something by djb.

> What I understand from [1] is that if a program uses signals AND
> select()/poll(), then it is useful to map signals into events that
> can be received by select()/poll(). The self-pipe trick is a way
> to do that.

Right.

> Maybe I didn't describe accurately the situation I am asking for.
> I'm not attempting to transform a program to an event-based engine.
> I'm looking to let signal handlers (SIGTERM, SIGSEGV, SIGWINCH, etc.)
> read and traverse data structures that the main program has prepared
> (and that contain information about files to delete, memory areas,
> window lines to redraw etc.).

No worries, I figured pipes/sockets weren't an options given
your mention of libsigsegv in the original post.

One possible solution is to spawn a dedicated thread which does
nothing but run an event loop to handle signals sequentially
without reentrancy.  Then, signal handlers are guaranteed to run
alone and existing thread-safe structures (e.g. rculfhash) and
locks would be able to work properly.

It all depends on whether you can spawn a dedicated thread for
that.  FWIW, liburcu uses background threads internally, too.


> [1] http://osiris.978.org/~alex/safesignalhandling.html



Re: data structures for use in signal handlers

2021-02-09 Thread Ben Pfaff
On Tue, Feb 9, 2021 at 6:07 PM Bruno Haible  wrote:
> Maybe I didn't describe accurately the situation I am asking for.
> I'm not attempting to transform a program to an event-based engine.
> I'm looking to let signal handlers (SIGTERM, SIGSEGV, SIGWINCH, etc.)
> read and traverse data structures that the main program has prepared
> (and that contain information about files to delete, memory areas,
> window lines to redraw etc.).

This is a situation that I've always worked hard to avoid. So little can
be safely done in a signal handler that it's worthwhile to try not to do
more than the minimum possible. Have you already concluded that
you must do significant work in a signal handler? I would not want to
try to, say, redraw lines in a window in a signal handler, because the
functions that I would need to call to do that would not be prepared
to run inside such a handler: I don't think ncurses or GTK+, for
example, would be happy at all about it.



Re: data structures for use in signal handlers

2021-02-09 Thread Paul Eggert

On 2/9/21 5:21 PM, Eric Wong wrote:

I know the (C)Ruby VM uses it internally (or eventfd on Linux);
I think CPython's GVL does, too.


GNU Make does too. It's where I learned of the trick.



Re: data structures for use in signal handlers

2021-02-09 Thread Bruno Haible
Eric Wong wrote:
> > How would setting up a pipe or eventfd help with a communication
> > problem between threads in the same process? I thought these were
> > for communication between processes.
> 
> The "self-pipe trick" is a common idiom in Unix for signal
> handling; especially when pselect/ppoll aren't available.
> I seem to recall learning of it reading something by djb.
> 
> It's perfectly valid to use pipes in single-threaded or
> multi-threaded, or multi-process situations.  POSIX gives
> well-defined semantics w.r.t. atomicity.

Atomicity alone is not an argument; the compiler and OS have
sufficient primitives (CAS instructions, barriers, spin-locks, locks)
to make things atomic between threads.

What I understand from [1] is that if a program uses signals AND
select()/poll(), then it is useful to map signals into events that
can be received by select()/poll(). The self-pipe trick is a way
to do that.

Maybe I didn't describe accurately the situation I am asking for.
I'm not attempting to transform a program to an event-based engine.
I'm looking to let signal handlers (SIGTERM, SIGSEGV, SIGWINCH, etc.)
read and traverse data structures that the main program has prepared
(and that contain information about files to delete, memory areas,
window lines to redraw etc.).

Bruno

[1] http://osiris.978.org/~alex/safesignalhandling.html




Re: data structures for use in signal handlers

2021-02-09 Thread Ben Pfaff
On Tue, Feb 9, 2021 at 5:01 PM Bruno Haible  wrote:
>
> Ben Pfaff wrote:
> > Building an RCU implementation isn't necessarily difficult (I have done it,
> > but the implementation isn't suitable for gnulib).
> >
> > There is a liburcu that is under LGPL v2.1: https://liburcu.org/
>
> Thanks for the pointers, Ben. Yes, RCU is the technique I had in mind.
> So: "can it actually be made to work?" -> Yes.
>
> But the implementation of liburcu uses extra signals and a helper thread of
> its own (or even a helper thread per CPU).
>
> I have to rephrase the question: can it be made to work in a straightforward
> (not necessarily scalability-optimized) way?

Signals are not needed. I haven't studied liburcu (it wasn't suitable for my
userspace use case because it only worked with GCC), so I am not sure
why it needs signals.

The userspace RCU implementation that I built in OVS (which is free but
Apache licensed) does use a thread, but not signals. The nice thing about
using a thread is that you always have something freeing memory that is
known to no longer be in use.

But I can imagine implementations that don't even use an extra thread.
For example, each thread could keep its own local list of callbacks to be
called after the next grace period. Then, whenever a thread quiesces, it
goes through its own list and executes all of the callbacks that now can
be. It could take a long time for some callbacks to be called, though,
especially if a thread goes to sleep and thus won't call any of its own
until it wakes up again.



Re: data structures for use in signal handlers

2021-02-09 Thread Eric Wong
Bruno Haible  wrote:
> Hi Eric,
> 
> Eric Wong wrote:
> > I would've recommended just using a pipe, socket or eventfd;
> 
> How would setting up a pipe or eventfd help with a communication
> problem between threads in the same process? I thought these were
> for communication between processes.

The "self-pipe trick" is a common idiom in Unix for signal
handling; especially when pselect/ppoll aren't available.
I seem to recall learning of it reading something by djb.

It's perfectly valid to use pipes in single-threaded or
multi-threaded, or multi-process situations.  POSIX gives
well-defined semantics w.r.t. atomicity.

I know the (C)Ruby VM uses it internally (or eventfd on Linux);
I think CPython's GVL does, too.

My gnulib-using cmogstored uses a self-pipe when epoll_pwait
isn't available (Though it could probably use EVFILT_SIGNAL on
*BSD).



Re: data structures for use in signal handlers

2021-02-09 Thread Bruno Haible
Hi Eric,

Eric Wong wrote:
> I would've recommended just using a pipe, socket or eventfd;

How would setting up a pipe or eventfd help with a communication
problem between threads in the same process? I thought these were
for communication between processes.

Bruno




Re: data structures for use in signal handlers

2021-02-09 Thread Bruno Haible
Ben Pfaff wrote:
> On Sun, Feb 7, 2021 at 2:57 PM Bruno Haible  wrote:
> > (2) Let the signal handler work only on immutable copies of the data
> > structure. Whenever the other code manipulates the data structure,
> > it creates an immutable copy of it, for the signal handler to use.
> > Use an 'asyncsafe-spin' lock through which the signal handler tells
> > the other threads to not free that immutable copy while it running.
> >
> > This is tricky; can it actually be made to work?
> >
> > Btw, there is an obvious requirement: the technique should use 
> > malloc/
> > free for memory management, and should not have memory leaks.
> > Algorithms that assume a garbage collected memory management are not
> > suitable here.
> 
> This makes me think of read-copy-update aka RCU:
> https://en.wikipedia.org/wiki/Read-copy-update
> https://lwn.net/Articles/262464/
> 
> In RCU, code that updates the data structure takes a lock, creates and
> modifies a copy, and then installs a new pointer to the data structure,
> which is otherwise immutable. Readers always access the data structure
> through a pointer. Whichever pointer they happen to see when they
> read the pointer remains available until they're done with it.
> 
> Using RCU is pretty straightforward once you've done a little of it, but
> it takes some reading to understand all of its concepts. It's probably best
> for me not to try to explain it all, because I'll surely get some parts of it
> wrong.
> 
> Building an RCU implementation isn't necessarily difficult (I have done it,
> but the implementation isn't suitable for gnulib).
> 
> There is a liburcu that is under LGPL v2.1: https://liburcu.org/

Thanks for the pointers, Ben. Yes, RCU is the technique I had in mind.
So: "can it actually be made to work?" -> Yes.

But the implementation of liburcu uses extra signals and a helper thread of
its own (or even a helper thread per CPU). 

I have to rephrase the question: can it be made to work in a straightforward
(not necessarily scalability-optimized) way?

Bruno




Re: data structures for use in signal handlers

2021-02-08 Thread Eric Wong
Bruno Haible  wrote:
> Hi Marc, Ben,
> 
> I need your expertise regarding data structures.
> 
> Assume a program has a signal handler, which needs to share some data
> structure with the rest of the program.
> 
> There are two cases:
> 
> (A) Assume that the program is single-threaded.
> Then it is sufficient to use 'volatile' variables for communication
> between the handler and the rest of the program.
> 
> (B) Assume that the program is multi-threaded.
> Then 'volatile' is not sufficient, because the signal handler and some
> other threads may be running at the same time.
> 
> As far as I can see, there are three possibilities to write such a
> signal handler:

> (2) Let the signal handler work only on immutable copies of the data
> structure. Whenever the other code manipulates the data structure,
> it creates an immutable copy of it, for the signal handler to use.
> Use an 'asyncsafe-spin' lock through which the signal handler tells
> the other threads to not free that immutable copy while it running.
> 
> This is tricky; can it actually be made to work?

Maybe, it sounds like RCU as Ben mentioned.  I've never used RCU
inside signal handlers, though.

> Btw, there is an obvious requirement: the technique should use malloc/
> free for memory management, and should not have memory leaks.
> Algorithms that assume a garbage collected memory management are not
> suitable here.

liburcu uses intrusive data structures (via container_of) like
the Linux kernel does, so memory can be pre-allocated at startup
before a signal handler runs.

> (3) Use lock-free algorithms. What lock-free algorithms can you propose?
> What I want is
>   (a) a list,

I haven't used it for signal handlers, but wfcqueue from liburcu
might be a possibility, here.

>   (b) a balanced binary tree,

That seems way tricky.  I've never gone beyond atomic ops (xchg,
cmpxchg, add/sub/or, etc...) and the usual pipe/eventfd/socket
stuff.

> and the other code can malloc/free/add/insert in the data structure,
> whereas the signal handler should only be allowed to iterate and
> search an element within the data structure.
> 
> What algorithms can you recommend for this purpose?

> This would be useful for gnulib in general. The balanced binary tree would
> also help making GNU libsigsegv multithread-safe.

I would've recommended just using a pipe, socket or eventfd;
but I guess that's not an option for your particular structure
or libsigsegv?



Re: data structures for use in signal handlers

2021-02-07 Thread Ben Pfaff
On Sun, Feb 7, 2021 at 2:57 PM Bruno Haible  wrote:
> (2) Let the signal handler work only on immutable copies of the data
> structure. Whenever the other code manipulates the data structure,
> it creates an immutable copy of it, for the signal handler to use.
> Use an 'asyncsafe-spin' lock through which the signal handler tells
> the other threads to not free that immutable copy while it running.
>
> This is tricky; can it actually be made to work?
>
> Btw, there is an obvious requirement: the technique should use malloc/
> free for memory management, and should not have memory leaks.
> Algorithms that assume a garbage collected memory management are not
> suitable here.

This makes me think of read-copy-update aka RCU:
https://en.wikipedia.org/wiki/Read-copy-update
https://lwn.net/Articles/262464/

In RCU, code that updates the data structure takes a lock, creates and
modifies a copy, and then installs a new pointer to the data structure,
which is otherwise immutable. Readers always access the data structure
through a pointer. Whichever pointer they happen to see when they
read the pointer remains available until they're done with it.

Using RCU is pretty straightforward once you've done a little of it, but
it takes some reading to understand all of its concepts. It's probably best
for me not to try to explain it all, because I'll surely get some parts of it
wrong.

Building an RCU implementation isn't necessarily difficult (I have done it,
but the implementation isn't suitable for gnulib).

There is a liburcu that is under LGPL v2.1: https://liburcu.org/