Re: librthread sem_t opaqueness, storage & unnamed semaphore sharing

2020-03-02 Thread Ted Unangst
On 2020-03-02, Lauri Tirkkonen wrote:

> Thanks for the input, and ping - is there still something about this
> diff that I should fix?

I'm kinda busy, but I should be able to look into it eventually.



Re: librthread sem_t opaqueness, storage & unnamed semaphore sharing

2020-03-01 Thread Lauri Tirkkonen
On Mon, Feb 24 2020 15:33:35 -0500, Ted Unangst wrote:
> Martin Pieuchot wrote:
> > On 24/02/20(Mon) 11:29, Lauri Tirkkonen wrote:
> > > On Mon, Feb 24 2020 10:24:53 +0100, Martin Pieuchot wrote:
> > > > On 23/02/20(Sun) 14:48, Lauri Tirkkonen wrote:
> > > > > I was working on a make jobserver implementation that uses POSIX
> > > > > semaphores as job tokens instead of a complicated socket-based 
> > > > > approach.
> > > > > Initially I used named semaphores, which work fine, except if child
> > > > > processes with less privileges need to also open the named semaphore
> > > > > (eg. 'make build' as root executing 'su build -c make'). For that 
> > > > > reason
> > > > > I wanted to use an unnamed semaphore (sem_init()) which is stored in 
> > > > > shm
> > > > > -- that way I could leave the shm fd open and pass it to children.
> > > > > 
> > > > > But unfortunately, sem_t is currently just a pointer to the opaque
> > > > > struct __sem, and sem_int() always calloc()s the storage for the 
> > > > > struct.
> > > > 
> > > > That's by design.
> > > 
> > > Ok - could you elaborate what the design is?
> > 
> > If the size of a descriptor change, because some fields are added and/or
> > removed, it doesn't matter for the application because it only manipulates
> > pointers.  That means we can change the data types without creating an ABI
> > break.
> 
> I think we are approaching the point where we can settle on fixed sized types
> now. If we want to be cautious, we can add a reserved padding field, too. But
> there are some edge cases which I think can be removed by eliminating the
> dynamic allocation paths.
> 
> > > See the followup patch -- sharing the semaphore between processes does
> > > work with it.
> > 
> > Well ignoring the `pshared' argument is questionable.  Why don't you
> > remove the "#if notyet" and start playing with the existing code and
> > try to figure out if something is missing for your use case?
> 
> I'm not sure the code in notyet will work. It was based on a misunderstanding
> I had of the requirements. Returning control of the sem_t placement to the
> application is the right approach.

Thanks for the input, and ping - is there still something about this
diff that I should fix?

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: librthread sem_t opaqueness, storage & unnamed semaphore sharing

2020-02-24 Thread Ted Unangst
Martin Pieuchot wrote:
> On 24/02/20(Mon) 11:29, Lauri Tirkkonen wrote:
> > On Mon, Feb 24 2020 10:24:53 +0100, Martin Pieuchot wrote:
> > > On 23/02/20(Sun) 14:48, Lauri Tirkkonen wrote:
> > > > I was working on a make jobserver implementation that uses POSIX
> > > > semaphores as job tokens instead of a complicated socket-based approach.
> > > > Initially I used named semaphores, which work fine, except if child
> > > > processes with less privileges need to also open the named semaphore
> > > > (eg. 'make build' as root executing 'su build -c make'). For that reason
> > > > I wanted to use an unnamed semaphore (sem_init()) which is stored in shm
> > > > -- that way I could leave the shm fd open and pass it to children.
> > > > 
> > > > But unfortunately, sem_t is currently just a pointer to the opaque
> > > > struct __sem, and sem_int() always calloc()s the storage for the struct.
> > > 
> > > That's by design.
> > 
> > Ok - could you elaborate what the design is?
> 
> If the size of a descriptor change, because some fields are added and/or
> removed, it doesn't matter for the application because it only manipulates
> pointers.  That means we can change the data types without creating an ABI
> break.

I think we are approaching the point where we can settle on fixed sized types
now. If we want to be cautious, we can add a reserved padding field, too. But
there are some edge cases which I think can be removed by eliminating the
dynamic allocation paths.

> > See the followup patch -- sharing the semaphore between processes does
> > work with it.
> 
> Well ignoring the `pshared' argument is questionable.  Why don't you
> remove the "#if notyet" and start playing with the existing code and
> try to figure out if something is missing for your use case?

I'm not sure the code in notyet will work. It was based on a misunderstanding
I had of the requirements. Returning control of the sem_t placement to the
application is the right approach.



Re: librthread sem_t opaqueness, storage & unnamed semaphore sharing

2020-02-24 Thread Lauri Tirkkonen
On Mon, Feb 24 2020 10:42:22 +0100, Martin Pieuchot wrote:
> > Yes, that's what I'm trying to do. Yes, I've seen the current
> > implementation -- that's why I started this thread, in an attempt to
> > make them supported. :)
> > 
> > See the followup patch -- sharing the semaphore between processes does
> > work with it.
> 
> Well ignoring the `pshared' argument is questionable.

It does feel strange, but a fully userspace implementation of semaphores
(ie. using only mmap MAP_SHARED files for communication with other
thread/processes) doesn't really have a meaningful distinction whether
sharing happens between threads or processes. The other semaphore
implementations I glanced at all had some kind of 'kernel semaphore'.

I suppose a userspace implementation like this could store the pid of
the process allowed to access the semaphore in struct __sem if
pshared=0, and check that on every access, but I'm honestly not sure
what value it brings for a userspace library to essentially prevent
access to userspace memory if the application went through the effort to
share that memory between processes anyway.

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: librthread sem_t opaqueness, storage & unnamed semaphore sharing

2020-02-24 Thread Lauri Tirkkonen
On Mon, Feb 24 2020 10:42:22 +0100, Martin Pieuchot wrote:
> On 24/02/20(Mon) 11:29, Lauri Tirkkonen wrote:
> > On Mon, Feb 24 2020 10:24:53 +0100, Martin Pieuchot wrote:
> > > On 23/02/20(Sun) 14:48, Lauri Tirkkonen wrote:
> > > > I was working on a make jobserver implementation that uses POSIX
> > > > semaphores as job tokens instead of a complicated socket-based approach.
> > > > Initially I used named semaphores, which work fine, except if child
> > > > processes with less privileges need to also open the named semaphore
> > > > (eg. 'make build' as root executing 'su build -c make'). For that reason
> > > > I wanted to use an unnamed semaphore (sem_init()) which is stored in shm
> > > > -- that way I could leave the shm fd open and pass it to children.
> > > > 
> > > > But unfortunately, sem_t is currently just a pointer to the opaque
> > > > struct __sem, and sem_int() always calloc()s the storage for the struct.
> > > 
> > > That's by design.
> > 
> > Ok - could you elaborate what the design is?
> 
> If the size of a descriptor change, because some fields are added and/or
> removed, it doesn't matter for the application because it only manipulates
> pointers.  That means we can change the data types without creating an ABI
> break.

I understand this, that's why I bumped the major and build-tested it.

What I don't understand is how an application could ever share a
semaphore with another process without sharing the userspace structure
that contains the important bits. As I mentioned, NetBSD works around
that in the pshared case by stuffing a kernel semaphore identifier into
the sem_t* returned by sem_init -- essentially, that identifier becomes
the only important bit and no storage is allocated, but from the
application point of view, sem_t must still be shared with another
process through shm otherwise (only now it's only a pointer that is used
to hold an integer value).

> > > > This means the application cannot control where unnamed semaphores are
> > > > stored, so I can't put it in shm.
> > > 
> > > Are you trying to use semaphore shared between process?  Did you called
> > > sem_init() with pshared=1?  Have you seen that the current implementation
> > > doesn't support them?
> > 
> > Yes, that's what I'm trying to do. Yes, I've seen the current
> > implementation -- that's why I started this thread, in an attempt to
> > make them supported. :)
> > 
> > See the followup patch -- sharing the semaphore between processes does
> > work with it.
> 
> Well ignoring the `pshared' argument is questionable.  Why don't you
> remove the "#if notyet" and start playing with the existing code and
> try to figure out if something is missing for your use case?

I did read the existing code behind the #ifdef notyet, and even found a
commit from 2013 disabling the code (by adding the #ifdef) "until it
really works". It seems to me that time has not come.

I don't understand how the pshared case under #ifdef notyet was even
supposed to work. It seems to just turn an unnamed semaphore into a
randomly-named semaphore, but provides no way for the application to
share it with another process; if the application puts sem_t in shm,
only a pointer is now in shm -- the struct __sem is not. Named
semaphores at least can be opened by other processes if they know the
name, but randomly-named semaphores that are unlinked after creation can
certainly not.

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: librthread sem_t opaqueness, storage & unnamed semaphore sharing

2020-02-24 Thread Martin Pieuchot
On 24/02/20(Mon) 11:29, Lauri Tirkkonen wrote:
> On Mon, Feb 24 2020 10:24:53 +0100, Martin Pieuchot wrote:
> > On 23/02/20(Sun) 14:48, Lauri Tirkkonen wrote:
> > > I was working on a make jobserver implementation that uses POSIX
> > > semaphores as job tokens instead of a complicated socket-based approach.
> > > Initially I used named semaphores, which work fine, except if child
> > > processes with less privileges need to also open the named semaphore
> > > (eg. 'make build' as root executing 'su build -c make'). For that reason
> > > I wanted to use an unnamed semaphore (sem_init()) which is stored in shm
> > > -- that way I could leave the shm fd open and pass it to children.
> > > 
> > > But unfortunately, sem_t is currently just a pointer to the opaque
> > > struct __sem, and sem_int() always calloc()s the storage for the struct.
> > 
> > That's by design.
> 
> Ok - could you elaborate what the design is?

If the size of a descriptor change, because some fields are added and/or
removed, it doesn't matter for the application because it only manipulates
pointers.  That means we can change the data types without creating an ABI
break.

> > > This means the application cannot control where unnamed semaphores are
> > > stored, so I can't put it in shm.
> > 
> > Are you trying to use semaphore shared between process?  Did you called
> > sem_init() with pshared=1?  Have you seen that the current implementation
> > doesn't support them?
> 
> Yes, that's what I'm trying to do. Yes, I've seen the current
> implementation -- that's why I started this thread, in an attempt to
> make them supported. :)
> 
> See the followup patch -- sharing the semaphore between processes does
> work with it.

Well ignoring the `pshared' argument is questionable.  Why don't you
remove the "#if notyet" and start playing with the existing code and
try to figure out if something is missing for your use case?



Re: librthread sem_t opaqueness, storage & unnamed semaphore sharing

2020-02-24 Thread Lauri Tirkkonen
On Mon, Feb 24 2020 10:24:53 +0100, Martin Pieuchot wrote:
> On 23/02/20(Sun) 14:48, Lauri Tirkkonen wrote:
> > I was working on a make jobserver implementation that uses POSIX
> > semaphores as job tokens instead of a complicated socket-based approach.
> > Initially I used named semaphores, which work fine, except if child
> > processes with less privileges need to also open the named semaphore
> > (eg. 'make build' as root executing 'su build -c make'). For that reason
> > I wanted to use an unnamed semaphore (sem_init()) which is stored in shm
> > -- that way I could leave the shm fd open and pass it to children.
> > 
> > But unfortunately, sem_t is currently just a pointer to the opaque
> > struct __sem, and sem_int() always calloc()s the storage for the struct.
> 
> That's by design.

Ok - could you elaborate what the design is?

> > This means the application cannot control where unnamed semaphores are
> > stored, so I can't put it in shm.
> 
> Are you trying to use semaphore shared between process?  Did you called
> sem_init() with pshared=1?  Have you seen that the current implementation
> doesn't support them?

Yes, that's what I'm trying to do. Yes, I've seen the current
implementation -- that's why I started this thread, in an attempt to
make them supported. :)

See the followup patch -- sharing the semaphore between processes does
work with it.

-- 
Lauri Tirkkonen | lotheac @ IRCnet



Re: librthread sem_t opaqueness, storage & unnamed semaphore sharing

2020-02-24 Thread Martin Pieuchot
On 23/02/20(Sun) 14:48, Lauri Tirkkonen wrote:
> I was working on a make jobserver implementation that uses POSIX
> semaphores as job tokens instead of a complicated socket-based approach.
> Initially I used named semaphores, which work fine, except if child
> processes with less privileges need to also open the named semaphore
> (eg. 'make build' as root executing 'su build -c make'). For that reason
> I wanted to use an unnamed semaphore (sem_init()) which is stored in shm
> -- that way I could leave the shm fd open and pass it to children.
> 
> But unfortunately, sem_t is currently just a pointer to the opaque
> struct __sem, and sem_int() always calloc()s the storage for the struct.

That's by design.

> This means the application cannot control where unnamed semaphores are
> stored, so I can't put it in shm.

Are you trying to use semaphore shared between process?  Did you called
sem_init() with pshared=1?  Have you seen that the current implementation
doesn't support them?



Re: librthread sem_t opaqueness, storage & unnamed semaphore sharing

2020-02-23 Thread Lauri Tirkkonen
On Sun, Feb 23 2020 14:48:36 +0200, Lauri Tirkkonen wrote:
> So, diff below makes struct __sem non-opaque and removes the indirect
> allocations, so that the application is required to provide storage and
> can therefore control where it's stored (which could be eg. shm).

followup diff that makes sem_init() ignore pshared arg -- it doesn't
need to care, if the application puts the semaphore in shm. I tested an
unnamed shm_init() semaphore in shm_mkstemp() -created shm with this,
passing the fd to a child process and mmap()ing in the child process;
sharing a semaphore this way does seem to work.

diff --git a/lib/librthread/rthread_sem.c b/lib/librthread/rthread_sem.c
index 984e5fb0047..441843d87ca 100644
--- a/lib/librthread/rthread_sem.c
+++ b/lib/librthread/rthread_sem.c
@@ -105,41 +105,6 @@ sem_init(sem_t *sem, int pshared, unsigned int value)
return (-1);
}
 
-   if (pshared) {
-   errno = EPERM;
-   return (-1);
-#ifdef notyet
-   char name[SEM_RANDOM_NAME_LEN];
-   sem_t *sempshared;
-   int i;
-
-   for (;;) {
-   for (i = 0; i < SEM_RANDOM_NAME_LEN - 1; i++)
-   name[i] = arc4random_uniform(255) + 1;
-   name[SEM_RANDOM_NAME_LEN - 1] = '\0';
-   sempshared = sem_open(name, O_CREAT | O_EXCL, 0, value);
-   if (sempshared != SEM_FAILED)
-   break;
-   if (errno == EEXIST)
-   continue;
-   if (errno != EPERM)
-   errno = ENOSPC;
-   return (-1);
-   }
-
-   /* unnamed semaphore should not be opened twice */
-   if (sem_unlink(name) == -1) {
-   sem_close(sempshared);
-   errno = ENOSPC;
-   return (-1);
-   }
-
-   *semp = *sempshared;
-   free(sempshared);
-   return (0);
-#endif
-   }
-
bzero(sem, sizeof(*sem));
sem->value = value;
 
diff --git a/lib/librthread/rthread_sem_compat.c 
b/lib/librthread/rthread_sem_compat.c
index 91c888cd49b..8dc863a04d3 100644
--- a/lib/librthread/rthread_sem_compat.c
+++ b/lib/librthread/rthread_sem_compat.c
@@ -116,41 +116,6 @@ sem_init(sem_t *sem, int pshared, unsigned int value)
return (-1);
}
 
-   if (pshared) {
-   errno = EPERM;
-   return (-1);
-#ifdef notyet
-   char name[SEM_RANDOM_NAME_LEN];
-   sem_t *sempshared;
-   int i;
-
-   for (;;) {
-   for (i = 0; i < SEM_RANDOM_NAME_LEN - 1; i++)
-   name[i] = arc4random_uniform(255) + 1;
-   name[SEM_RANDOM_NAME_LEN - 1] = '\0';
-   sempshared = sem_open(name, O_CREAT | O_EXCL, 0, value);
-   if (sempshared != SEM_FAILED)
-   break;
-   if (errno == EEXIST)
-   continue;
-   if (errno != EPERM)
-   errno = ENOSPC;
-   return (-1);
-   }
-
-   /* unnamed semaphore should not be opened twice */
-   if (sem_unlink(name) == -1) {
-   sem_close(sempshared);
-   errno = ENOSPC;
-   return (-1);
-   }
-
-   *semp = *sempshared;
-   free(sempshared);
-   return (0);
-#endif
-   }
-
bzero(sem, sizeof(*sem));
sem->lock = _SPINLOCK_UNLOCKED;
sem->value = value;

-- 
Lauri Tirkkonen | lotheac @ IRCnet



librthread sem_t opaqueness, storage & unnamed semaphore sharing

2020-02-23 Thread Lauri Tirkkonen
I was working on a make jobserver implementation that uses POSIX
semaphores as job tokens instead of a complicated socket-based approach.
Initially I used named semaphores, which work fine, except if child
processes with less privileges need to also open the named semaphore
(eg. 'make build' as root executing 'su build -c make'). For that reason
I wanted to use an unnamed semaphore (sem_init()) which is stored in shm
-- that way I could leave the shm fd open and pass it to children.

But unfortunately, sem_t is currently just a pointer to the opaque
struct __sem, and sem_int() always calloc()s the storage for the struct.
This means the application cannot control where unnamed semaphores are
stored, so I can't put it in shm.

I might be missing something here, but I don't see why sem_t is not just
typedef'd to a non-opaque struct. The equivalent structs have
definitions in semaphore.h on FreeBSD and glibc (though glibc doesn't
expose the members in this definition, it seems to be provided just for
sizing). NetBSD uses an opaque struct which is allocated, *except* in
the sem_init() pshared==1 case, where it instead returns a kernel
semaphore identifier cast into a sem_t*; I find that to be somewhat
convoluted.

So, diff below makes struct __sem non-opaque and removes the indirect
allocations, so that the application is required to provide storage and
can therefore control where it's stored (which could be eg. shm).

After bootstrap (installing new header, building and installing the new
library) this survived base and xenocara 'make build' on amd64.

---
 include/semaphore.h | 11 +++--
 lib/libc/include/thread_private.h   |  7 ---
 lib/librthread/rthread.h|  4 +-
 lib/librthread/rthread_sem.c| 70 +
 lib/librthread/rthread_sem_compat.c | 68 +---
 lib/librthread/shlib_version|  4 +-
 6 files changed, 53 insertions(+), 111 deletions(-)

diff --git a/include/semaphore.h b/include/semaphore.h
index 21dc1a25bed..98471644357 100644
--- a/include/semaphore.h
+++ b/include/semaphore.h
@@ -38,10 +38,15 @@
 #define _SEMAPHORE_H_
 
 #include 
+#include 
 
-/* Opaque type definition. */
-struct __sem;
-typedef struct __sem *sem_t;
+struct __sem {
+   _atomic_lock_t lock;
+   volatile int waitcount;
+   volatile int value;
+   int shared;
+};
+typedef struct __sem sem_t;
 struct timespec;
 
 #define SEM_FAILED  ((sem_t *)0)
diff --git a/lib/libc/include/thread_private.h 
b/lib/libc/include/thread_private.h
index 98dfaa63223..f39094e0274 100644
--- a/lib/libc/include/thread_private.h
+++ b/lib/libc/include/thread_private.h
@@ -273,13 +273,6 @@ __END_HIDDEN_DECLS
 
 #define_SPINLOCK_UNLOCKED _ATOMIC_LOCK_UNLOCKED
 
-struct __sem {
-   _atomic_lock_t lock;
-   volatile int waitcount;
-   volatile int value;
-   int shared;
-};
-
 TAILQ_HEAD(pthread_queue, pthread);
 
 #ifdef FUTEX
diff --git a/lib/librthread/rthread.h b/lib/librthread/rthread.h
index 1af4509a9a0..89e39c9e0b6 100644
--- a/lib/librthread/rthread.h
+++ b/lib/librthread/rthread.h
@@ -79,8 +79,8 @@ struct pthread_spinlock {
(((size) + (_thread_pagesize - 1)) & ~(_thread_pagesize - 1))
 
 __BEGIN_HIDDEN_DECLS
-int_sem_wait(sem_t, int, const struct timespec *, int *);
-int_sem_post(sem_t);
+int_sem_wait(sem_t *, int, const struct timespec *, int *);
+int_sem_post(sem_t *);
 
 void   _rthread_init(void);
 struct stack *_rthread_alloc_stack(pthread_t);
diff --git a/lib/librthread/rthread_sem.c b/lib/librthread/rthread_sem.c
index bd96769dc39..984e5fb0047 100644
--- a/lib/librthread/rthread_sem.c
+++ b/lib/librthread/rthread_sem.c
@@ -55,7 +55,7 @@
  * Internal implementation of semaphores
  */
 int
-_sem_wait(sem_t sem, int can_eintr, const struct timespec *abstime,
+_sem_wait(sem_t *sem, int can_eintr, const struct timespec *abstime,
 int *delayed_cancel)
 {
unsigned int val;
@@ -86,7 +86,7 @@ _sem_wait(sem_t sem, int can_eintr, const struct timespec 
*abstime,
 
 /* always increment count */
 int
-_sem_post(sem_t sem)
+_sem_post(sem_t *sem)
 {
membar_exit_before_atomic();
atomic_inc_int(>value);
@@ -98,10 +98,8 @@ _sem_post(sem_t sem)
  * exported semaphores
  */
 int
-sem_init(sem_t *semp, int pshared, unsigned int value)
+sem_init(sem_t *sem, int pshared, unsigned int value)
 {
-   sem_t sem;
-
if (value > SEM_VALUE_MAX) {
errno = EINVAL;
return (-1);
@@ -142,26 +140,19 @@ sem_init(sem_t *semp, int pshared, unsigned int value)
 #endif
}
 
-   sem = calloc(1, sizeof(*sem));
-   if (!sem) {
-   errno = ENOSPC;
-   return (-1);
-   }
+   bzero(sem, sizeof(*sem));
sem->value = value;
-   *semp = sem;
 
return (0);
 }
 
 int
-sem_destroy(sem_t *semp)
+sem_destroy(sem_t *sem)
 {
-   sem_t sem;
-
if (!_threads_ready) /* for SEM_MMAP_SIZE */