> Date: Sat, 2 Jun 2018 12:04:16 +0200
> From: Martin Pieuchot <[email protected]>
>
> On 01/06/18(Fri) 19:13, Mark Kettenis wrote:
> > The diff below changes our futex implementation to allow futexes that
> > are shared between processes. Such futexes are a little bit tricky
> > since there is no guarantee that a futex is mapped at the same address
> > in each process. To solve tis issue I adopted the strategy chosen by
> > FreeBSD. Shared memory objects have a uvm object associated with
> > them. So instead of using the virtual address I can use the offset
> > into that object. Together with the a pointer to the object itself
> > that should form a unique ID for the futex.
>
> Nice.
>
> > In order to establish that we're dealing with shared memory we need to
> > lookup the mapping corresponding to the userspace address of the
> > futex. That involves locking the map. If we are worried about the
> > additional overhead we could introduce shared versions of the futex
> > operations and skip the lookup for non-shared (process-private) futexes.
> >
> > These futexes could be used to implement support for the POSIX Thread
> > Process-Shared Synchronization option, i.e. the PTHREAD_PROCESS_SHARED
> > attribute for mutexes. But I need these to make libxshmfence work on
> > OpenBSD, which is in turn needed to get DRI3 support working.
> >
> > Making DRI3 support work is somewhat important. We're currently using
> > DRI2 but Linux has moved to DRI3 and some of the DRI2-specific
> > codepaths in X are starting to rot. DRI3 is also supposed to be more
> > secure than DRI2. With DRI2, applications that can guess a 32-bit
> > handle can in principle access graphiucs buffers from other
> > applications. With DRI3 graphics buffers are shared using file
> > descriptor passing which means that only processes that were passed
> > the file descriptor can access the graphics buffer.
> >
> > Another feature of DRI3 is that graphics buffers can be shared between
> > graphics devices. The idea there is that a GPU can render into a
> > buffer which is then shared with the Intel CPU graphics which can then
> > display it on the screen. But my initial implementation won't support
> > that.
> >
> > Thoughts? Especially on whether we want to distinguish between
> > process-shared and process-private futexes!
>
> I'd prefer if we could do the distinction. Would it be enough to pass
> a flag to futex_get()?
More or less. The question is what what we should do if someone asks
for a process-shared futex in memory that isn't shared. Should we
automatically create a process-private futex? Or should this result
in an error?
I suspect that people expect PTHREAD_PROCESS_SHARED mutexes to work
even if they're not created in shared memory. So a failure to create
a process-shared futex should probably fall through into the
process-private case.
Turns out Linux already has a FUTEX_PRIVATE_FLAG and
FUTEX_WAIT_PRIVATE, FUTEX_WAKE_PRIVATE and FUTEX_REQUEUE_PRIVATE
operations. So I'll just stick to that API.
> > Index: kern/sys_futex.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/sys_futex.c,v
> > retrieving revision 1.7
> > diff -u -p -r1.7 sys_futex.c
> > --- kern/sys_futex.c 24 Apr 2018 17:19:35 -0000 1.7
> > +++ kern/sys_futex.c 1 Jun 2018 15:50:17 -0000
> > @@ -31,6 +31,8 @@
> > #include <sys/ktrace.h>
> > #endif
> >
> > +#include <uvm/uvm_extern.h>
> > +
> > /*
> > * Atomicity is only needed on MULTIPROCESSOR kernels. Fall back on
> > * copyin(9) until non-MULTIPROCESSOR architectures have a copyin32(9)
> > @@ -46,6 +48,8 @@
> > struct futex {
> > LIST_ENTRY(futex) ft_list; /* list of all futexes */
> > TAILQ_HEAD(, proc) ft_threads; /* sleeping queue */
> > + struct uvm_object *ft_obj;
> > + voff_t ft_offset;
> > uint32_t *ft_uaddr; /* userspace address */
> > pid_t ft_pid; /* process identifier */
> > unsigned int ft_refcnt; /* # of references */
> > @@ -130,8 +134,22 @@ struct futex *
> > futex_get(uint32_t *uaddr, int flag)
> > {
> > struct proc *p = curproc;
> > + vm_map_t map = &p->p_vmspace->vm_map;
> > + vm_map_entry_t entry;
> > + struct uvm_object *obj = NULL;
> > + voff_t offset = 0;
> > struct futex *f;
> >
> > + vm_map_lock_read(map);
> > + if (uvm_map_lookup_entry(map, (vaddr_t)uaddr, &entry)) {
> > + if (entry->object.uvm_obj &&
> > + entry->inheritance == MAP_INHERIT_SHARE) {
> > + obj = entry->object.uvm_obj;
> > + offset = entry->offset + ((vaddr_t)uaddr -
> > entry->start);
> > + }
> > + }
> > + vm_map_unlock_read(map);
> > +
> > rw_assert_wrlock(&ftlock);
> >
> > LIST_FOREACH(f, &ftlist, ft_list) {
> > @@ -139,6 +157,10 @@ futex_get(uint32_t *uaddr, int flag)
> > f->ft_refcnt++;
> > break;
> > }
> > + if (f->ft_obj == obj && f->ft_offset == offset) {
> > + f->ft_refcnt++;
> > + break;
> > + }
> > }
> >
> > if ((f == NULL) && (flag & FT_CREATE)) {
> > @@ -146,10 +168,15 @@ futex_get(uint32_t *uaddr, int flag)
> > * We rely on the rwlock to ensure that no other thread
> > * create the same futex.
> > */
> > - f = pool_get(&ftpool, PR_WAITOK);
> > + f = pool_get(&ftpool, PR_WAITOK | PR_ZERO);
> > TAILQ_INIT(&f->ft_threads);
> > - f->ft_uaddr = uaddr;
> > - f->ft_pid = p->p_p->ps_pid;
> > + if (obj) {
> > + f->ft_obj = obj;
> > + f->ft_offset = offset;
> > + } else {
> > + f->ft_uaddr = uaddr;
> > + f->ft_pid = p->p_p->ps_pid;
> > + }
> > f->ft_refcnt = 1;
> > LIST_INSERT_HEAD(&ftlist, f, ft_list);
> > }
> >
>