Re: light weight user level semaphores

2001-04-23 Thread Alexander Viro



On 24 Apr 2001, David Wagner wrote:

> Linus Torvalds  wrote:
> >Ehh.. I will bet you $10 USD that if libc allocates the next file
> >descriptor on the first "malloc()" in user space (in order to use the
> >semaphores for mm protection), programs _will_ break.
> >
> >You want to take the bet?
> 
> Good point.  Speaking of which:
>   ioctl(fd, UIOCATTACHSEMA, ...);
> seems to act like dup(fd) if fd was opened on "/dev/usemaclone"
> (see drivers/sgi/char/usema.c).  According to usema(7), this is
> intended to help libraries implement semaphores.
> 
> Is this a bad coding?

Yes. Not to mention side effects, it's just plain ugly. Anyone who invents
identifiers of _that_ level of ugliness should be forced to read them
aloud for a week or so, until somebody will shoot him out of mercy.
Out of curiosity: who was the author? It looks unusually nasty, even for
SGI.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-23 Thread David Wagner

Linus Torvalds  wrote:
>Ehh.. I will bet you $10 USD that if libc allocates the next file
>descriptor on the first "malloc()" in user space (in order to use the
>semaphores for mm protection), programs _will_ break.
>
>You want to take the bet?

Good point.  Speaking of which:
  ioctl(fd, UIOCATTACHSEMA, ...);
seems to act like dup(fd) if fd was opened on "/dev/usemaclone"
(see drivers/sgi/char/usema.c).  According to usema(7), this is
intended to help libraries implement semaphores.

Is this a bad coding?  Should the kernel really support an ioctl()
that can silently allocate the next file descriptor?  This seems
like asking for trouble.  Or, maybe I just misunderstood something.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-23 Thread Ingo Oeser

On Thu, Apr 19, 2001 at 09:46:17AM -0700, Linus Torvalds wrote:
> > libc is entitled to, and most definitely does exactly that. Take a look at
> > things like gethostent, getpwent etc etc.
> 
> Ehh.. I will bet you $10 USD that if libc allocates the next file
> descriptor on the first "malloc()" in user space (in order to use the
> semaphores for mm protection), programs _will_ break.

But we would not open the semaphore on malloc() but instead in
the init functions of the libc. So the semaphore will be already
allocated. May be dup2()ed to some very high range
(INT_MAX-__GLIBC_MALLOC_SEM_FD) and the original fd closed.

So this will be no real problem. That's why I don't like lazy
init: May be you cannot init anymore, if you come to and
condition, where you would need it.

Also init/fini are usally very slow operations and as many things
as possible are burdend onto their shoulders.

Semaphores tend to be structures living very long (at least in
all code I've written and seen so far) so I see no point in
defering their initialization.

Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag 
  been there and had much fun   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-23 Thread David Howells

Alon Ziv <[EMAIL PROTECTED]> wrote:
> Obviously... since they're handles, not FDs...
> [BTW, are you using Windows' idea of storing the objects in process space,
> in a page that's inaccessible to the app itself, and passing pointers into
> this page as the handles?]

No... I grab a page in kernel space and use it as an array. One problem is
that if an exit occurs, I have to be able to discard all attached objects
after the process's VM has been cleaned up (ie: what if it gets swapped
out?). Plus, mmap can clobber existing mappings, MapViewOfFile can't.

> So what if they aren't files?

Small structures private to my Win32 module.

> I'm afraid I'm not following your logic in this; I believe most Win32 attrs
> can be mapped to more generic abstractions which should be able to exist at
> 'struct file' level.

"Most"...

It'd mean adding extra fields into struct file (and possibly struct inode)
just for the use of this module (which probably wouldn't be accepted).

> (And even if not, a Win32 file handle could just hold two pointers---

No. the extra data has to be accessible from CreateFile (potentially running
in other processes), and this'd mean it'd have to go speculatively searching
all Win32 handle tables currently in use.

> And breaks _completely_ with the existing scheme :-/

So what? This is for a WINE accelerator/Win32 module only. There's already
been an argument over making the whole lot available as general Linux
functionality, but most people said that it'd be a bad idea because it'd not
be portable.

> Huh? Where did you get this?
> Looking at my copy of MSDN (July '00), the PulseEvent remarks more-or-less
> suggest an implementation like
> SetEvent(e)
> ResetEvent(e)

Consider the following:

WAITER 1WAITER 2WAITER 3WAKER
wait-on-event   wait-on-event   wait-on-event
sleep   sleep   sleep
PulseEvent
set-event
wake(WAITER 1)
wake(WAITER 2)
wake(WAITER 3)
reset-event
wakewakewake
what-happened?  what-happened?  what-happened?
nothing!nothing!nothing!
sleep   sleep   sleep

All three waiters should wake up with a note that the event triggered, but
they don't. Plus a fourth waiter who begins to wait on the event after the
set-event is issue probably shouldn't wake up.

> I wonder if it's possible to add _just_ this to poll()...

No... there's no way to pass this to poll (or select).

Better to add a WaitForMultipleObjects() syscall and have that call
do_select() with a flag.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-23 Thread Jeff Garzik

Linus Torvalds wrote:
> Trust me. You're arguing for clearly broken behaviour. malloc() and
> friends MUST NOT open file descriptors. It _will_ break programs that
> rely on traditional and documented features.

Indeed; STDIN_FILENO and friends are constants...

-- 
Jeff Garzik  | The difference between America and England is that
Building 1024| the English think 100 miles is a long distance and
MandrakeSoft | the Americans think 100 years is a long time.
 |  (random fortune)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-23 Thread Alon Ziv

From: "David Howells" <[EMAIL PROTECTED]>
> David Woodhouse <[EMAIL PROTECTED]> wrote:
> > [EMAIL PROTECTED] said:
> > >  [BTW, another solution is to truly support opaque "handles" to kernel
> > > objects; I believe David Howells is already working on something like
> > > this for Wine?
>
> Yes. However, it uses a different system call set to use them. They
translate
> to small object structures internally.
>

Obviously... since they're handles, not FDs...
[BTW, are you using Windows' idea of storing the objects in process space,
in a
page that's inaccessible to the app itself, and passing pointers into this
page
as the handles?]

> > > The poll interface can be trivially extended to support
> > > waiting on those...]
>
> No, they aren't files. I did not want to use "files" because this would
incur
> a fairly major penalty for each object:
>
So what if they aren't files?
If you look at (e.g.) AIX's poll(), it allows you to put SysV semaphore IDs
in
pollfd structures. (Actually they do even more--- they have an extended
pollfd
struct; but even without it, just putting a handle instead of FD and a
special
event code in a normal pollfd should suffice...)

> struct file + struct dentry + struct inode
>
> Which would mean that Win32 File objects would require two of each, one
set to
> hold the extra Win32 attributes and one set for the actual Linux file.
>

I'm afraid I'm not following your logic in this; I believe most Win32 attrs
can
be mapped to more generic abstractions which should be able to exist at
'struct
file' level.  (And even if not, a Win32 file handle could just hold two
pointers---
one to the 'struct file', and one to the extra attrs...)

> The way I've chosen uses somewhat less memory and should be faster.
>

And breaks _completely_ with the existing scheme :-/

> > ISTR it wasn't quite trivial to do it that way - it would require the
> > addition of an extra argument to the fops->poll() method.
>
> Yes, the PulseEvent operation demands that all processes currently waiting
on
> the event should be woken, but that no processes attaching immediately
> afterward get triggered.
>

Huh? Where did you get this?
Looking at my copy of MSDN (July '00), the PulseEvent remarks more-or-less
suggest
an implementation like
SetEvent(e)
ResetEvent(e)
I don't see any mention of 'currently waiting' vs 'new' waiters. (Besides, I
doubt MS
tries to solve this in the SMP case...)

> Oh... and WaitForMultipleObjects also has a "wait for all" option.

Yes, this is a valid point...  I wonder if it's possible to add _just_ this
to
poll()...

-az


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-23 Thread David Howells

David Woodhouse <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] said:
> >  [BTW, another solution is to truly support opaque "handles" to kernel
> > objects; I believe David Howells is already working on something like
> > this for Wine?

Yes. However, it uses a different system call set to use them. They translate
to small object structures internally.

> > The poll interface can be trivially extended to support
> > waiting on those...]

No, they aren't files. I did not want to use "files" because this would incur
a fairly major penalty for each object:

struct file + struct dentry + struct inode

Which would mean that Win32 File objects would require two of each, one set to
hold the extra Win32 attributes and one set for the actual Linux file.

The way I've chosen uses somewhat less memory and should be faster.

> ISTR it wasn't quite trivial to do it that way - it would require the 
> addition of an extra argument to the fops->poll() method.

Yes, the PulseEvent operation demands that all processes currently waiting on
the event should be woken, but that no processes attaching immediately
afterward get triggered.

This means that the PulseEvent handler has to be able to notify all the
processes currently waiting on the queue and only those processes. I got it to
do this by marking the waiter records each process links into the queue.

Oh... and WaitForMultipleObjects also has a "wait for all" option.

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-23 Thread David Howells

David Woodhouse [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] said:
   [BTW, another solution is to truly support opaque handles to kernel
  objects; I believe David Howells is already working on something like
  this for Wine?

Yes. However, it uses a different system call set to use them. They translate
to small object structures internally.

  The poll interface can be trivially extended to support
  waiting on those...]

No, they aren't files. I did not want to use files because this would incur
a fairly major penalty for each object:

struct file + struct dentry + struct inode

Which would mean that Win32 File objects would require two of each, one set to
hold the extra Win32 attributes and one set for the actual Linux file.

The way I've chosen uses somewhat less memory and should be faster.

 ISTR it wasn't quite trivial to do it that way - it would require the 
 addition of an extra argument to the fops-poll() method.

Yes, the PulseEvent operation demands that all processes currently waiting on
the event should be woken, but that no processes attaching immediately
afterward get triggered.

This means that the PulseEvent handler has to be able to notify all the
processes currently waiting on the queue and only those processes. I got it to
do this by marking the waiter records each process links into the queue.

Oh... and WaitForMultipleObjects also has a wait for all option.

David
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-23 Thread Alon Ziv

From: David Howells [EMAIL PROTECTED]
 David Woodhouse [EMAIL PROTECTED] wrote:
  [EMAIL PROTECTED] said:
[BTW, another solution is to truly support opaque handles to kernel
   objects; I believe David Howells is already working on something like
   this for Wine?

 Yes. However, it uses a different system call set to use them. They
translate
 to small object structures internally.


Obviously... since they're handles, not FDs...
[BTW, are you using Windows' idea of storing the objects in process space,
in a
page that's inaccessible to the app itself, and passing pointers into this
page
as the handles?]

   The poll interface can be trivially extended to support
   waiting on those...]

 No, they aren't files. I did not want to use files because this would
incur
 a fairly major penalty for each object:

So what if they aren't files?
If you look at (e.g.) AIX's poll(), it allows you to put SysV semaphore IDs
in
pollfd structures. (Actually they do even more--- they have an extended
pollfd
struct; but even without it, just putting a handle instead of FD and a
special
event code in a normal pollfd should suffice...)

 struct file + struct dentry + struct inode

 Which would mean that Win32 File objects would require two of each, one
set to
 hold the extra Win32 attributes and one set for the actual Linux file.


I'm afraid I'm not following your logic in this; I believe most Win32 attrs
can
be mapped to more generic abstractions which should be able to exist at
'struct
file' level.  (And even if not, a Win32 file handle could just hold two
pointers---
one to the 'struct file', and one to the extra attrs...)

 The way I've chosen uses somewhat less memory and should be faster.


And breaks _completely_ with the existing scheme :-/

  ISTR it wasn't quite trivial to do it that way - it would require the
  addition of an extra argument to the fops-poll() method.

 Yes, the PulseEvent operation demands that all processes currently waiting
on
 the event should be woken, but that no processes attaching immediately
 afterward get triggered.


Huh? Where did you get this?
Looking at my copy of MSDN (July '00), the PulseEvent remarks more-or-less
suggest
an implementation like
SetEvent(e)
ResetEvent(e)
I don't see any mention of 'currently waiting' vs 'new' waiters. (Besides, I
doubt MS
tries to solve this in the SMP case...)

 Oh... and WaitForMultipleObjects also has a wait for all option.

Yes, this is a valid point...  I wonder if it's possible to add _just_ this
to
poll()...

-az


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-23 Thread David Howells

Alon Ziv [EMAIL PROTECTED] wrote:
 Obviously... since they're handles, not FDs...
 [BTW, are you using Windows' idea of storing the objects in process space,
 in a page that's inaccessible to the app itself, and passing pointers into
 this page as the handles?]

No... I grab a page in kernel space and use it as an array. One problem is
that if an exit occurs, I have to be able to discard all attached objects
after the process's VM has been cleaned up (ie: what if it gets swapped
out?). Plus, mmap can clobber existing mappings, MapViewOfFile can't.

 So what if they aren't files?

Small structures private to my Win32 module.

 I'm afraid I'm not following your logic in this; I believe most Win32 attrs
 can be mapped to more generic abstractions which should be able to exist at
 'struct file' level.

Most...

It'd mean adding extra fields into struct file (and possibly struct inode)
just for the use of this module (which probably wouldn't be accepted).

 (And even if not, a Win32 file handle could just hold two pointers---

No. the extra data has to be accessible from CreateFile (potentially running
in other processes), and this'd mean it'd have to go speculatively searching
all Win32 handle tables currently in use.

 And breaks _completely_ with the existing scheme :-/

So what? This is for a WINE accelerator/Win32 module only. There's already
been an argument over making the whole lot available as general Linux
functionality, but most people said that it'd be a bad idea because it'd not
be portable.

 Huh? Where did you get this?
 Looking at my copy of MSDN (July '00), the PulseEvent remarks more-or-less
 suggest an implementation like
 SetEvent(e)
 ResetEvent(e)

Consider the following:

WAITER 1WAITER 2WAITER 3WAKER
wait-on-event   wait-on-event   wait-on-event
sleep   sleep   sleep
PulseEvent
set-event
wake(WAITER 1)
wake(WAITER 2)
wake(WAITER 3)
reset-event
wakewakewake
what-happened?  what-happened?  what-happened?
nothing!nothing!nothing!
sleep   sleep   sleep

All three waiters should wake up with a note that the event triggered, but
they don't. Plus a fourth waiter who begins to wait on the event after the
set-event is issue probably shouldn't wake up.

 I wonder if it's possible to add _just_ this to poll()...

No... there's no way to pass this to poll (or select).

Better to add a WaitForMultipleObjects() syscall and have that call
do_select() with a flag.

David
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-23 Thread David Wagner

Linus Torvalds  wrote:
Ehh.. I will bet you $10 USD that if libc allocates the next file
descriptor on the first malloc() in user space (in order to use the
semaphores for mm protection), programs _will_ break.

You want to take the bet?

Good point.  Speaking of which:
  ioctl(fd, UIOCATTACHSEMA, ...);
seems to act like dup(fd) if fd was opened on /dev/usemaclone
(see drivers/sgi/char/usema.c).  According to usema(7), this is
intended to help libraries implement semaphores.

Is this a bad coding?  Should the kernel really support an ioctl()
that can silently allocate the next file descriptor?  This seems
like asking for trouble.  Or, maybe I just misunderstood something.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-23 Thread Alexander Viro



On 24 Apr 2001, David Wagner wrote:

 Linus Torvalds  wrote:
 Ehh.. I will bet you $10 USD that if libc allocates the next file
 descriptor on the first malloc() in user space (in order to use the
 semaphores for mm protection), programs _will_ break.
 
 You want to take the bet?
 
 Good point.  Speaking of which:
   ioctl(fd, UIOCATTACHSEMA, ...);
 seems to act like dup(fd) if fd was opened on /dev/usemaclone
 (see drivers/sgi/char/usema.c).  According to usema(7), this is
 intended to help libraries implement semaphores.
 
 Is this a bad coding?

Yes. Not to mention side effects, it's just plain ugly. Anyone who invents
identifiers of _that_ level of ugliness should be forced to read them
aloud for a week or so, until somebody will shoot him out of mercy.
Out of curiosity: who was the author? It looks unusually nasty, even for
SGI.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-22 Thread Alon Ziv

Oh, I don't argue about that. (Well, almost--- see below...)
It's just that we need _some_ method for getting over the silly POSIX
FD-handling restrictions...  And the negative-FDs may be the solution.

(Note I said we 'can' declare other semantics; not 'should'.  So these
FDs can still be normal ones, just at the other end of the numbering
range...)

My misgivings are:
* There's no way to integrate other signalling mechanisms; e.g., we may
  wish for a 'wake-all-waiters' signaller, or for a 'timed-wait' that
  arrives via an FD and not as a signal
* a pipe is a more-or-less good semaphore; it may be too heavyweight,
  as it's forced to pass useless [in this case] info, and we can't
  control its wakeup order [although POSIX doesn't seem to require this]

[ Actually, I once had an idea of binding signals into an FD, so they can be
  'read' out of it...  with that, an alarm() is a 'timed-wait' waitable by
  poll() :-) ]

-az

- Original Message -
From: "Alexander Viro" <[EMAIL PROTECTED]>
To: "Alon Ziv" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>; "Alan Cox" <[EMAIL PROTECTED]>
Sent: Sunday, April 22, 2001 16:31
Subject: Re: light weight user level semaphores


>
>
> On Sun, 22 Apr 2001, Alon Ziv wrote:
>
> > Well, that's the reason for my small-negative-integer semaphore-FD
idea...
> > (It won't support select() easily, but poll() is prob'ly good enough)
> > Still, there is the problem of read()/write()/etc. semantics; sure, we
can
> > declare that 'negative FDs' have their own semantics which just happen
to
> > include poll(), but it sure looks like a kludge...
>
> You _still_ don't get it. The question is not "how to add magic kernel
> objects that would look like descriptors and support a binch of
> ioctls, allowing to do semaphores", it's "do we need semaphores
> to be kernel-level objects". Implementation with pipes allows to avoid
> the magic crap - they are real, normal pipes - nothing special from
> the kernel POV. read(), write(), etc. are nothing but reading and writing
> for pipes.
>
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-22 Thread Alexander Viro



On Sun, 22 Apr 2001, Alon Ziv wrote:

> Well, that's the reason for my small-negative-integer semaphore-FD idea...
> (It won't support select() easily, but poll() is prob'ly good enough)
> Still, there is the problem of read()/write()/etc. semantics; sure, we can
> declare that 'negative FDs' have their own semantics which just happen to
> include poll(), but it sure looks like a kludge...

You _still_ don't get it. The question is not "how to add magic kernel
objects that would look like descriptors and support a binch of
ioctls, allowing to do semaphores", it's "do we need semaphores
to be kernel-level objects". Implementation with pipes allows to avoid
the magic crap - they are real, normal pipes - nothing special from
the kernel POV. read(), write(), etc. are nothing but reading and writing
for pipes.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-22 Thread Alon Ziv

Well, that's the reason for my small-negative-integer semaphore-FD idea...
(It won't support select() easily, but poll() is prob'ly good enough)
Still, there is the problem of read()/write()/etc. semantics; sure, we can
declare that 'negative FDs' have their own semantics which just happen to
include poll(), but it sure looks like a kludge...

-az

- Original Message -
From: "Alan Cox" <[EMAIL PROTECTED]>
To: "Alon Ziv" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Sunday, April 22, 2001 14:44
Subject: Re: light weight user level semaphores


> > All of this FD allocation stuff is truly distrurbing.
> > This appears to be the one place where Win32 got it (almost) right---
> > quite about every kernel object looks to userland just like an opaque
> > handle, and the same operations apply to all of them.
>
> Unix got this right, then AT broke it in System III. One very good
reason
> for pipe based semaphore stuff is precisely that it works in
poll/select/SIGIO
>
> Alan
>
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-22 Thread David Woodhouse


[EMAIL PROTECTED] said:
>  [BTW, another solution is to truly support opaque "handles" to kernel
> objects; I believe David Howells is already working on something like
> this for Wine? The poll interface can be trivially extended to support
> waiting on those...]

ISTR it wasn't quite trivial to do it that way - it would require the 
addition of an extra argument to the fops->poll() method.

David?

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-22 Thread David Woodhouse


[EMAIL PROTECTED] said:
  [BTW, another solution is to truly support opaque "handles" to kernel
 objects; I believe David Howells is already working on something like
 this for Wine? The poll interface can be trivially extended to support
 waiting on those...]

ISTR it wasn't quite trivial to do it that way - it would require the 
addition of an extra argument to the fops-poll() method.

David?

--
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-22 Thread Alon Ziv

Well, that's the reason for my small-negative-integer semaphore-FD idea...
(It won't support select() easily, but poll() is prob'ly good enough)
Still, there is the problem of read()/write()/etc. semantics; sure, we can
declare that 'negative FDs' have their own semantics which just happen to
include poll(), but it sure looks like a kludge...

-az

- Original Message -
From: "Alan Cox" [EMAIL PROTECTED]
To: "Alon Ziv" [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Sunday, April 22, 2001 14:44
Subject: Re: light weight user level semaphores


  All of this FD allocation stuff is truly distrurbing.
  This appears to be the one place where Win32 got it (almost) right---
  quite about every kernel object looks to userland just like an opaque
  handle, and the same operations apply to all of them.

 Unix got this right, then ATT broke it in System III. One very good
reason
 for pipe based semaphore stuff is precisely that it works in
poll/select/SIGIO

 Alan




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-22 Thread Alexander Viro



On Sun, 22 Apr 2001, Alon Ziv wrote:

 Well, that's the reason for my small-negative-integer semaphore-FD idea...
 (It won't support select() easily, but poll() is prob'ly good enough)
 Still, there is the problem of read()/write()/etc. semantics; sure, we can
 declare that 'negative FDs' have their own semantics which just happen to
 include poll(), but it sure looks like a kludge...

You _still_ don't get it. The question is not "how to add magic kernel
objects that would look like descriptors and support a binch of
ioctls, allowing to do semaphores", it's "do we need semaphores
to be kernel-level objects". Implementation with pipes allows to avoid
the magic crap - they are real, normal pipes - nothing special from
the kernel POV. read(), write(), etc. are nothing but reading and writing
for pipes.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-22 Thread Alon Ziv

Oh, I don't argue about that. (Well, almost--- see below...)
It's just that we need _some_ method for getting over the silly POSIX
FD-handling restrictions...  And the negative-FDs may be the solution.

(Note I said we 'can' declare other semantics; not 'should'.  So these
FDs can still be normal ones, just at the other end of the numbering
range...)

My misgivings are:
* There's no way to integrate other signalling mechanisms; e.g., we may
  wish for a 'wake-all-waiters' signaller, or for a 'timed-wait' that
  arrives via an FD and not as a signal
* a pipe is a more-or-less good semaphore; it may be too heavyweight,
  as it's forced to pass useless [in this case] info, and we can't
  control its wakeup order [although POSIX doesn't seem to require this]

[ Actually, I once had an idea of binding signals into an FD, so they can be
  'read' out of it...  with that, an alarm() is a 'timed-wait' waitable by
  poll() :-) ]

-az

- Original Message -
From: "Alexander Viro" [EMAIL PROTECTED]
To: "Alon Ziv" [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; "Alan Cox" [EMAIL PROTECTED]
Sent: Sunday, April 22, 2001 16:31
Subject: Re: light weight user level semaphores




 On Sun, 22 Apr 2001, Alon Ziv wrote:

  Well, that's the reason for my small-negative-integer semaphore-FD
idea...
  (It won't support select() easily, but poll() is prob'ly good enough)
  Still, there is the problem of read()/write()/etc. semantics; sure, we
can
  declare that 'negative FDs' have their own semantics which just happen
to
  include poll(), but it sure looks like a kludge...

 You _still_ don't get it. The question is not "how to add magic kernel
 objects that would look like descriptors and support a binch of
 ioctls, allowing to do semaphores", it's "do we need semaphores
 to be kernel-level objects". Implementation with pipes allows to avoid
 the magic crap - they are real, normal pipes - nothing special from
 the kernel POV. read(), write(), etc. are nothing but reading and writing
 for pipes.




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-21 Thread Olaf Titz

> This is guaranteed behaviour of UNIX. You get file handles in order, or
> you don't get them at all.

You get the _next free_ file handle in order. What if your program
assumes they are all contiguous, and it is called by some other
program which forgot about FD_CLOEXEC and has some higher fds still
open? (xdm did this for ten years with its listening socket, just to
name a well-known example. So every program which asssumes contiguous
fd allocations would fail if started from an xdm session.)

If your program makes assumptions on its environment which are not
guaranteed it's broken.

What _is_ guaranteed is that after consecutive allocations of fds like
  for (i=0; i fd[j] if (i > j and fd[i]!=-1 and fd[j]!=-1).
What is absolutely nowhere guaranteed is that
fd[i+1] = fd[i]+1.
It is not possible to guarantee this since any fd may be already open
before main() starts.

Of course you can guarantee that the fds are available like this:
  for (i=getdtablesize(); i>=0; --i)
close(i);
and not calling library functions which may open fds.

>   pid = fork();
>   if (!pid) {
>   close(0);
>   close(1);
>   dup(pipe[0]);   /* input pipe */
>   dup(pipe[1]);   /* output pipe */
>   execve("child");
>   exit(1);
>   }
>
> The above is absolutely _standard_ behaviour. It's required to work.

The reason why it works is that (a) the target fds are 0 and 1, and
(b) you close them explicitly. For less trivial uses, there is always
dup2().

> And btw, it's _still_ required to work even if there happens to be a
> "malloc()" in between the close() and the dup() calls.

I wouldn't count on that. It's clearly not required to work if there's
a getpwnam() in between. (I already had my share of problems with
syslog() in exactly this situation.)

Do we need a list of library functions which may open fds, like the
infamous "list of functions which may move or purge memory" on the Mac
(which grew longer with every OS release and Inside Mac supplement
issue)? Do we need to know for each library routine how it is
implemented?

> Trust me. You're arguing for clearly broken behaviour. malloc() and
> friends MUST NOT open file descriptors. It _will_ break programs that
> rely on traditional and documented features.

Traditional and documented is, in my view, the description as of the
open(2) man page:

   When the
   call is successful, the file descriptor returned  will  be
   the lowest file descriptor not currently open for the pro­
   cess.

which of course is exactly how it is implemented in the kernel.

Olaf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-21 Thread Olaf Titz

 This is guaranteed behaviour of UNIX. You get file handles in order, or
 you don't get them at all.

You get the _next free_ file handle in order. What if your program
assumes they are all contiguous, and it is called by some other
program which forgot about FD_CLOEXEC and has some higher fds still
open? (xdm did this for ten years with its listening socket, just to
name a well-known example. So every program which asssumes contiguous
fd allocations would fail if started from an xdm session.)

If your program makes assumptions on its environment which are not
guaranteed it's broken.

What _is_ guaranteed is that after consecutive allocations of fds like
  for (i=0; in; ++i)
fd[i]=open(...);
the following property holds:
fd[i]  fd[j] if (i  j and fd[i]!=-1 and fd[j]!=-1).
What is absolutely nowhere guaranteed is that
fd[i+1] = fd[i]+1.
It is not possible to guarantee this since any fd may be already open
before main() starts.

Of course you can guarantee that the fds are available like this:
  for (i=getdtablesize(); i=0; --i)
close(i);
and not calling library functions which may open fds.

   pid = fork();
   if (!pid) {
   close(0);
   close(1);
   dup(pipe[0]);   /* input pipe */
   dup(pipe[1]);   /* output pipe */
   execve("child");
   exit(1);
   }

 The above is absolutely _standard_ behaviour. It's required to work.

The reason why it works is that (a) the target fds are 0 and 1, and
(b) you close them explicitly. For less trivial uses, there is always
dup2().

 And btw, it's _still_ required to work even if there happens to be a
 "malloc()" in between the close() and the dup() calls.

I wouldn't count on that. It's clearly not required to work if there's
a getpwnam() in between. (I already had my share of problems with
syslog() in exactly this situation.)

Do we need a list of library functions which may open fds, like the
infamous "list of functions which may move or purge memory" on the Mac
(which grew longer with every OS release and Inside Mac supplement
issue)? Do we need to know for each library routine how it is
implemented?

 Trust me. You're arguing for clearly broken behaviour. malloc() and
 friends MUST NOT open file descriptors. It _will_ break programs that
 rely on traditional and documented features.

Traditional and documented is, in my view, the description as of the
open(2) man page:

   When the
   call is successful, the file descriptor returned  will  be
   the lowest file descriptor not currently open for the pro
   cess.

which of course is exactly how it is implemented in the kernel.

Olaf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-20 Thread Linus Torvalds

In article <[EMAIL PROTECTED]>,
Olaf Titz  <[EMAIL PROTECTED]> wrote:
>> Ehh.. I will bet you $10 USD that if libc allocates the next file
>> descriptor on the first "malloc()" in user space (in order to use the
>> semaphores for mm protection), programs _will_ break.
>
>Of course, but this is a result from sloppy coding.

ABSOLUTELY NOT!

This is guaranteed behaviour of UNIX. You get file handles in order, or
you don't get them at all.

Sure, some library functions are allowed to use up file handles. But
most sure as hell are NOT.

>In general, open()
>can just return anything and about the only case where you can even
>think of ignoring its result is this:
> close(0); close(1); close(2);
> open("/dev/null", O_RDWR); dup(0); dup(0);

Which is quite common to do.

Imagine a server that starts up another process, which does exactly
something like the above: the _usual_ execve() case looks something like

pid = fork();
if (!pid) {
close(0);
close(1);
dup(pipe[0]);   /* input pipe */
dup(pipe[1]);   /* output pipe */
execve("child");
exit(1);
}

The above is absolutely _standard_ behaviour. It's required to work.

And btw, it's _still_ required to work even if there happens to be a
"malloc()" in between the close() and the dup() calls.

Trust me. You're arguing for clearly broken behaviour. malloc() and
friends MUST NOT open file descriptors. It _will_ break programs that
rely on traditional and documented features.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-20 Thread Olaf Titz

> Optimization use in select: If all "interesting" file id's are known
> to be below "n", then only the first "n" bits in a FD_ISSET need to
> be examined. As soon as the bits are scattered, it takes MUCH longer
> to check for activity

That's an optimization, not a correctness issue.

> for (i = 0; i <= MAX_LOG && i < argc; i++) {
> sprintf(str,"/tmp/%s",pnames[i]);
> mkfifo(str,0600);   /* assume it exists */
> inlogfd[i] = open(str,O_RDONLY | O_NDELAY);
> FD_SET(inlogfd[i],_files);
> }

This works regardless of what the open() returns. What does not work
is using MAX_LOG (assuming it is constant) later in the following form:

> while (select(MAX_LOG,,NULL,NULL,NULL) >= 0) {

I see no way around computing the maximum of the inlogfd[i] values +1.
(Which can of course be done just after the opens above. Note that the
last opened fd _is_ guaranteed to get the highest number; FD_SET is
one of the library routines where you can be pretty confident they
don't open fds...)

Btw. there are two problems even assuming you do get contiguous fds:
- an off by one error in the case of argc > MAX_LOG, the first
  argument of select() is maximum fd _plus one_
- from an optimization POV it is highly advisable to take only the
  real maximum anyway.

Olaf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-20 Thread Jesse Pollard

Olaf Titz <[EMAIL PROTECTED]>:
> > Ehh.. I will bet you $10 USD that if libc allocates the next file
> > descriptor on the first "malloc()" in user space (in order to use the
> > semaphores for mm protection), programs _will_ break.
> 
> Of course, but this is a result from sloppy coding. In general, open()
> can just return anything and about the only case where you can even
> think of ignoring its result is this:
>  close(0); close(1); close(2);
>  open("/dev/null", O_RDWR); dup(0); dup(0);
> (which is even not clean for other reasons).
> 
> I can't imagine depending on the "fact" that the first fd I open is 3,
> the next is 4, etc. And what if the routine in question is not
> malloc() but e.g. getpwuid()? Both are just arbitrary library
> functions, and one of them clearly does open file descriptors,
> depending on their implementation.
> 
> What would the reason[1] be for wanting contiguous fd space anyway?
> 
> Olaf
> 
> [1] apart from not having understood how poll() works of course.

Optimization use in select: If all "interesting" file id's are known
to be below "n", then only the first "n" bits in a FD_ISSET need to
be examined. As soon as the bits are scattered, it takes MUCH longer
to check for activity

It may not be the "best" way, but what I tend to do is:

 Umm - this is snipped from a multiplexed logger using FIFOs for
 and indeterminate amount of data from differet utilities sending
 text buffers (normally one line at a time but could be more).

static void fd_init(argc,argv)
int argc;   /* number of parameters */
char**argv; /* parameter list   */
{
int i,j;/* scratch counters */
static char str[50];

pnames = argv;
FD_ZERO(_files); /* init all file descriptor sets*/

for (i = 0; i <= MAX_LOG && i < argc; i++) {
sprintf(str,"/tmp/%s",pnames[i]);
mkfifo(str,0600);   /* assume it exists */
inlogfd[i] = open(str,O_RDONLY | O_NDELAY);
FD_SET(inlogfd[i],_files);
}
used = i;
}


Then I can scan for any activity by:

do {
while (select(MAX_LOG,,NULL,NULL,NULL) >= 0) {
for(i = 0; i <= used; i++) {
if (FD_ISSET(inlogfd[i],)) {
r=ioctl(inlogfd[i],FIONREAD,);
while (n > 0) {
r = (n > BUF_MAX - 1) ? BUF_MAX - 1: n;
read(inlogfd[i],buf,r);
printbuf(pnames[i],r);
n -= r;
}
}
}
active = in_files;
}
} while (errno == EINTR);

-
Jesse I Pollard, II
Email: [EMAIL PROTECTED]

Any opinions expressed are solely my own.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-20 Thread Olaf Titz

> Ehh.. I will bet you $10 USD that if libc allocates the next file
> descriptor on the first "malloc()" in user space (in order to use the
> semaphores for mm protection), programs _will_ break.

Of course, but this is a result from sloppy coding. In general, open()
can just return anything and about the only case where you can even
think of ignoring its result is this:
 close(0); close(1); close(2);
 open("/dev/null", O_RDWR); dup(0); dup(0);
(which is even not clean for other reasons).

I can't imagine depending on the "fact" that the first fd I open is 3,
the next is 4, etc. And what if the routine in question is not
malloc() but e.g. getpwuid()? Both are just arbitrary library
functions, and one of them clearly does open file descriptors,
depending on their implementation.

What would the reason[1] be for wanting contiguous fd space anyway?

Olaf

[1] apart from not having understood how poll() works of course.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-20 Thread Olaf Titz

 Ehh.. I will bet you $10 USD that if libc allocates the next file
 descriptor on the first "malloc()" in user space (in order to use the
 semaphores for mm protection), programs _will_ break.

Of course, but this is a result from sloppy coding. In general, open()
can just return anything and about the only case where you can even
think of ignoring its result is this:
 close(0); close(1); close(2);
 open("/dev/null", O_RDWR); dup(0); dup(0);
(which is even not clean for other reasons).

I can't imagine depending on the "fact" that the first fd I open is 3,
the next is 4, etc. And what if the routine in question is not
malloc() but e.g. getpwuid()? Both are just arbitrary library
functions, and one of them clearly does open file descriptors,
depending on their implementation.

What would the reason[1] be for wanting contiguous fd space anyway?

Olaf

[1] apart from not having understood how poll() works of course.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-20 Thread Jesse Pollard

Olaf Titz [EMAIL PROTECTED]:
  Ehh.. I will bet you $10 USD that if libc allocates the next file
  descriptor on the first "malloc()" in user space (in order to use the
  semaphores for mm protection), programs _will_ break.
 
 Of course, but this is a result from sloppy coding. In general, open()
 can just return anything and about the only case where you can even
 think of ignoring its result is this:
  close(0); close(1); close(2);
  open("/dev/null", O_RDWR); dup(0); dup(0);
 (which is even not clean for other reasons).
 
 I can't imagine depending on the "fact" that the first fd I open is 3,
 the next is 4, etc. And what if the routine in question is not
 malloc() but e.g. getpwuid()? Both are just arbitrary library
 functions, and one of them clearly does open file descriptors,
 depending on their implementation.
 
 What would the reason[1] be for wanting contiguous fd space anyway?
 
 Olaf
 
 [1] apart from not having understood how poll() works of course.

Optimization use in select: If all "interesting" file id's are known
to be below "n", then only the first "n" bits in a FD_ISSET need to
be examined. As soon as the bits are scattered, it takes MUCH longer
to check for activity

It may not be the "best" way, but what I tend to do is:

 Umm - this is snipped from a multiplexed logger using FIFOs for
 and indeterminate amount of data from differet utilities sending
 text buffers (normally one line at a time but could be more).

static void fd_init(argc,argv)
int argc;   /* number of parameters */
char**argv; /* parameter list   */
{
int i,j;/* scratch counters */
static char str[50];

pnames = argv;
FD_ZERO(in_files); /* init all file descriptor sets*/

for (i = 0; i = MAX_LOG  i  argc; i++) {
sprintf(str,"/tmp/%s",pnames[i]);
mkfifo(str,0600);   /* assume it exists */
inlogfd[i] = open(str,O_RDONLY | O_NDELAY);
FD_SET(inlogfd[i],in_files);
}
used = i;
}


Then I can scan for any activity by:

do {
while (select(MAX_LOG,active,NULL,NULL,NULL) = 0) {
for(i = 0; i = used; i++) {
if (FD_ISSET(inlogfd[i],active)) {
r=ioctl(inlogfd[i],FIONREAD,n);
while (n  0) {
r = (n  BUF_MAX - 1) ? BUF_MAX - 1: n;
read(inlogfd[i],buf,r);
printbuf(pnames[i],r);
n -= r;
}
}
}
active = in_files;
}
} while (errno == EINTR);

-
Jesse I Pollard, II
Email: [EMAIL PROTECTED]

Any opinions expressed are solely my own.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-20 Thread Linus Torvalds

In article [EMAIL PROTECTED],
Olaf Titz  [EMAIL PROTECTED] wrote:
 Ehh.. I will bet you $10 USD that if libc allocates the next file
 descriptor on the first "malloc()" in user space (in order to use the
 semaphores for mm protection), programs _will_ break.

Of course, but this is a result from sloppy coding.

ABSOLUTELY NOT!

This is guaranteed behaviour of UNIX. You get file handles in order, or
you don't get them at all.

Sure, some library functions are allowed to use up file handles. But
most sure as hell are NOT.

In general, open()
can just return anything and about the only case where you can even
think of ignoring its result is this:
 close(0); close(1); close(2);
 open("/dev/null", O_RDWR); dup(0); dup(0);

Which is quite common to do.

Imagine a server that starts up another process, which does exactly
something like the above: the _usual_ execve() case looks something like

pid = fork();
if (!pid) {
close(0);
close(1);
dup(pipe[0]);   /* input pipe */
dup(pipe[1]);   /* output pipe */
execve("child");
exit(1);
}

The above is absolutely _standard_ behaviour. It's required to work.

And btw, it's _still_ required to work even if there happens to be a
"malloc()" in between the close() and the dup() calls.

Trust me. You're arguing for clearly broken behaviour. malloc() and
friends MUST NOT open file descriptors. It _will_ break programs that
rely on traditional and documented features.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Alexander Viro <[EMAIL PROTECTED]> writes:

> > If the new interface can be useful for anything it must allow to
> > implement process-shared POSIX mutexes.
> 
> Pardon me the bluntness, but... Why?

Because otherwise there is no reason to even waste a second with this.
At least for me and everybody else who has interest in portable solutions.

I don't care how it's implemented.  Look at the code example I posted.
If you can provide an implementation which can implement anonymous
inter-process mutexes then ring again.  Until then I'll wait.  If you
implement something else I couldn't care less since it's useless for
me.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alexander Viro



On 19 Apr 2001, Ulrich Drepper wrote:

> Linus Torvalds <[EMAIL PROTECTED]> writes:
> 
> > I'm not interested in re-creating the idiocies of Sys IPC.
> 
> I'm not talking about sysv semaphores (couldn't care less).  And you
> haven't read any of the mails with examples I sent.
> 
> If the new interface can be useful for anything it must allow to
> implement process-shared POSIX mutexes.

Pardon me the bluntness, but... Why?
* on _any_ UNIX we can implement semaphore (object that has Dijkstra's
P and V operations, whatever) shared by processes that have access to pipe.
In a portable way. That's the part of pipe semantics that had been there
since way before v6. Pre-sysv, pre-POSIX, etc. When named pipes appeared
the same semantics had been carried to them. Agreed so far?
* if we have shared memory _and_ some implementation of semaphores
we can (on architectures that allow atomic_dec() and atomic_inc()) produce
semaphores that work via memory access in uncontended case and use slow
semaphores to handle contention side of the business. Nothing UNIX-specific
here.
* such objects _are_ useful. They are reasonably portable and
if they fit the task at hand and are cheaper than POSIX mutexes - that's
all rationale one could need for using them.

Sure, the variant I've posted was intra-process only, simply because it
uses normal pipes. Implementation with named pipes is also trivial -
when you map the shared area, allocate private one of the corresponding
size and keep descriptors there. End of story.

AFAICS mechanism is portable enough (and even on the architectures that
do not allow atomic userland operations we can survive - just fall back
to "slow" ones via read()/write() on pipes).  And excuse me, but when
one writes an application code the question is not "how to make it use
POSIX semaphores", it's "how to get the serialization I need in a
portable way".

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Linus Torvalds <[EMAIL PROTECTED]> writes:

> I'm not interested in re-creating the idiocies of Sys IPC.

I'm not talking about sysv semaphores (couldn't care less).  And you
haven't read any of the mails with examples I sent.

If the new interface can be useful for anything it must allow to
implement process-shared POSIX mutexes.  The user-level representation
of these mutexes are simple variables which in the case of
inter-process mutexes are placed in shared memory.  These variables
must be usable with the normal pthread_mutex_lock() functions and
perform whatever is needed.

Whether the pthread_mutex_init() function for shared mutexes is doing
a lot more work and allocates even more memory, I don't care.  The
standard certainly permits this and every pthread_mutex_init() must
have a pthread_mutex_destroy() which allows allocating and freeing
resources (no file descriptor, though).  So, yes, your FS_create
syscall can allocate something.

But the question is what handle to put in the pthread_mutex_t variable
so the different processes can use the mutex.  It cannot be a file
descriptor since it's not shared between processes.  It cannot be a
pointer to some other place in the virtual memory since the place
pointed to might not be (and probably isn't if FS_create is allocating
something in the process setting up the mutex).  You could put some
magic cookie in the pthread_mutex_t object the kernel can then use.


So, instead of repeating over and over again the same old story, fill
in the gaps here:


  int
  pthread_mutex_init (pthread_mutex_t *mutex,
  const pthread_mutexattr_t *mutex_attr)
  {
if (mutex_attr != NULL && mutex_attr->__pshared != 0)
  {
... FILL IN HERE ...
  }
else
  ...intra-process mutex, uninteresting here...
  }

  int
  pthread_mutex_lock (pthread_mutex_t *mutex)
  {
if (mutex_attr != NULL && mutex_attr->__pshared != 0)
  {
... FILL IN HERE ...
  }
else
  ...intra-process mutex, uninteresting here...
  }

  int
  pthread_mutex_destroy (pthread_mutex_t *mutex)
  {
if (mutex_attr != NULL && mutex_attr->__pshared != 0)
  {
... FILL IN HERE ...
  }
else
  ...intra-process mutex, uninteresting here...
  }


These functions must work with something like this:

~ cons.c ~~
#include 
#include 
#include 
#include 
#include 

int
main (int argc, char *argv[])
{
  char tmpl[] = "/tmp/fooXX";
  int fd = mkstemp (tmpl);
  pthread_mutexattr_t attr;
  pthread_mutex_t *m1;
  pthread_mutex_t *m2;
  void *addr;
  volatile int *i;

  pthread_mutexattr_init ();
  pthread_mutexattr_setpshared (, PTHREAD_PROCESS_SHARED);

  ftruncate (fd, 2 * sizeof (*m1) + sizeof (int));
  addr = mmap (NULL, sizeof (*m1), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
  m1 = addr;
  m2 = m1 + 1;
  i = (int *) (m2 + 1);
  *i = 0;

  pthread_mutex_init (m1, );
  pthread_mutex_lock (m1);

  pthread_mutex_init (m2, );
  pthread_mutex_lock (m2);

  if (fork () == 0)
{
  char buf[10];
  snprintf (buf, sizeof buf, "%d", fd);
  execl ("./prod", "prod", buf, NULL);
}

  while (1)
{
  pthread_mutex_lock (m1);
  printf ("*i = %d\n", *i);
  pthread_mutex_unlock (m2);
}

  return 0;
}
~~prod.c ~~
#include 
#include 
#include 
#include 
#include 

int
main (int argc, char *argv[])
{
  int fd = atoi (argv[1]);
  void *addr;
  pthread_mutex_t *m1;
  pthread_mutex_t *m2;
  volatile int *i;

  addr = mmap (NULL, sizeof (*m1), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
  m1 = addr;
  m2 = m1 + 1;
  i = (int *) (m2 + 1);

  while (1)
{
  ++*i;
  pthread_mutex_unlock (m1);
  pthread_mutex_lock (m2);
}

  return 0;
}
~~~

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Rogier Wolff

Alan Cox wrote:
> > > libc is entitled to, and most definitely does exactly that. Take a look at
> > > things like gethostent, getpwent etc etc.
> > 
> > Ehh.. I will bet you $10 USD that if libc allocates the next file
> > descriptor on the first "malloc()" in user space (in order to use the
> > semaphores for mm protection), programs _will_ break.
> > 
> > You want to take the bet?
> 
> Its not normally a good idea to take a Linus bet, but this time Im obviously
> missing something. fd0-2 will be passed in (and if not then shit already
> happens - see old bugtraq on the matter for setuid apps, glibc bugs)
> 
> So the C library gets fd 3
> My first fopen gets fd 4.

Code may 
close (0);
close (1);
close (2); 
... 
malloc (); 

/* Now open our controlling TTY/ stdin .. */ 
fd = open (... ) ;

After taking care of this (*), problem I find the fd trick WAY more
appealing than Linus' magic numbers. With file descriptors we have a
"small integer which can be validated quickly". We also have storage
for a private pointer somewhere in the fd structure. 

If people are TOO afraid of breaking something, creating a new set of
small integers handled similarly as "fds" would do fine. (Maybe here
we'd allocate just a few, and reallocate when neccesary).

Roger. 

(*) I bet that 
get_sem_fd () 
{
int rv;
int fd; 
fd = get_fd ();
if (fd < 5) {
rv = get_sem_fd ();
close(fd);
fd = rv; 
  } 
return fd; 
}

will not break much. (UGLY coding. Don't tell me.)

-- 
** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots. 
* There are also old, bald pilots. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On 19 Apr 2001, Ulrich Drepper wrote:
> Linus Torvalds <[EMAIL PROTECTED]> writes:
>
> > > I fail to see how this works across processes.
> >
> > It's up to FS_create() to create whatever shared mapping is needed.
>
> No, the point is that FS_create is *not* the one creating the shared
> mapping.  The user is explicitly doing this her/himself.

No.

Who creates the shared mapping is _irrelevant_, because it ends up being
entirely a function of what the chosen interface is.

For example, quote often you want semaphores for threading purposes only,
and then you don't need a shared mapping at all. So you'd use the proper
interfaces for that, and for that, your "thread_semaphore()" function
would just do a malloc() and initialize the memory to zero. Doing a mmap
or something like that would just be stupid, because you're protecting
only one VM space anyway.

In other cases, you may need to have process-wide semaphores, and you'd
use "process_semaphore(char *ID)" or something, which actually does a
mmap() on a shared file. Or you'd have "fork_semaphore()" that creates a
semaphore that is valid across forks, not not valid across execve's and
cannot be passed around.

So normally the user does NOT create the shared mapping himself. Normally
you'd just use the "proper interface" for your needs, nothing more.

Sure, you can have the option of saying "I've created this shared memory
region, please make it use the generic semaphore engine code", but quite
frankly I think that is a BAD IDEA. Why? Because it won't work portably
across architectures anyway. You don't know what the requirements of the
architecture are, so it should be done by a nice "semaphore library". NOT
by the user.

Remember: these semaphores are NOT a new SysV bogosity. These semaphores
are a new interface, with sane performance and sane design. And you can
have multiple external interfaces to the same "semaphore engine".

I'm not interested in re-creating the idiocies of Sys IPC.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alan Cox

> Are you sure, you can implement SMP-safe, atomic operations (which you need
> for all up()/down() in user space) WITHOUT using privileged
> instructions on ALL archs Linux supports?

You don't need to. For some architectures the semaphore code would always call
into the kernel. For those that allow fast locks in userspace it won't. The
API is the thing, and the public exposure would I assume be pthreads 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Linus Torvalds <[EMAIL PROTECTED]> writes:

> > I fail to see how this works across processes.
> 
> It's up to FS_create() to create whatever shared mapping is needed.

No, the point is that FS_create is *not* the one creating the shared
mapping.  The user is explicitly doing this her/himself.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Ingo Oeser wrote:

> On Thu, Apr 19, 2001 at 09:11:56AM -0700, Linus Torvalds wrote:
> > No, this is NOT what the UNIX dogmas are all about.
> >
> > When UNIX says "everything is a file", it really means that "everything is
> > a stream of bytes". Things like magic operations on file desciptors are
> > _anathema_ to UNIX. ioctl() is the worst wart of UNIX. Having magic
> > semantics of file descriptors is NOT Unix dogma at all, it is a horrible
> > corruption of the original UNIX cleanlyness.
>
> Right. And on semaphores, this stream is exactly 0 bytes long.
> This is perfectly normal and can be handled by all applications
> I'm aware of.

It's perfectly normal, but it does NOT conform to the idea "everything is
a file".

The fact that there are other ugly examples (ioctls and special files)
does not mean that adding a new one is a good idea.

When people say "everything is a file", they mean that it can be _used_ as
a file, not that it can passably return a valid error code.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Ingo Oeser wrote:
>
> Are you sure, you can implement SMP-safe, atomic operations (which you need
> for all up()/down() in user space) WITHOUT using privileged
> instructions on ALL archs Linux supports?

Why do you care?

Sure, there are broken architectures out there. They'd need system calls.
They'd be slow. That's THEIR problem.

No sane architecture has this limitation.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On 19 Apr 2001, Ulrich Drepper wrote:

> Linus Torvalds <[EMAIL PROTECTED]> writes:
>
> > Looks good to me. Anybody want to try this out and test some benchmarks?
>
> I fail to see how this works across processes.

It's up to FS_create() to create whatever shared mapping is needed.

For threads, you don't need anything special.

For fork()'d helper stuff, you'd use MAP_ANON | MAP_SHARED.

For execve(), you need shm shared memory or MAP_SHARED on a file.

It all depends on your needs.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ingo Oeser

On Thu, Apr 19, 2001 at 09:11:56AM -0700, Linus Torvalds wrote:
> No, this is NOT what the UNIX dogmas are all about.
> 
> When UNIX says "everything is a file", it really means that "everything is
> a stream of bytes". Things like magic operations on file desciptors are
> _anathema_ to UNIX. ioctl() is the worst wart of UNIX. Having magic
> semantics of file descriptors is NOT Unix dogma at all, it is a horrible
> corruption of the original UNIX cleanlyness.

Right. And on semaphores, this stream is exactly 0 bytes long.
This is perfectly normal and can be handled by all applications
I'm aware of.

My idea violates nothing here.

> Please don't excuse "semaphore file descriptors" with the "everything is a
> file" mantra. It is not at ALL applicable.
> 
> The "everything is a file" mantra is to make pipe etc meaningful -
> processes don't have to worry about whether the fd they have is from a
> file open, a pipe() system call, opening a special block device, or a
> socket()+connect() thing. They can just read and write. THAT is what UNIX
> is all about.
 
Right. And with my approach read() and write() with a buffer
pointer != NULL would either yield an return value of "0" or
-1 and set errno=EINVAL ("object not suitable for reading/writing").
Anyway they should return IMMIDIATELY in these cases.

We already have these special semantics with devices. Look at
/dev/sgX for an example how we pass even structured data via
normal read/write (instead of "stream of bytes").

> And this is obviously NOT true of a "magic file descriptors for
> semaphores". You can't pass it off as stdin to another process and expect
> anything useful from it unless the other process _knows_ it is a special
> semaphore thing and does mmap magic or something.

see above. NOTHING special about this idea. No magic handling
involved, unless the user of the fd knows what it is. For other
users it will be just a normal fd with normal operations, since
the special case is hidden well enough. 

This is even WAY simpler as all that tty-crap and similar
devices, which read/write very dependend on their actual ioctl
configuration.

But since stupid POSIX forbids using fds for semaphores
(according to Ulrich Drepper), this nice, simple and
non-intrusive solution is out.

Instead we should go with several new syscalls, user space
dependencies, strange error handling and yet-to-discuss
semantics.

Everybody else byt you would have been kicked out by the core
people for suggesting this ;-)

Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag 
  been there and had much fun   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Ingo Oeser <[EMAIL PROTECTED]> writes:

> Are you sure, you can implement SMP-safe, atomic operations (which you need
> for all up()/down() in user space) WITHOUT using privileged
> instructions on ALL archs Linux supports?

Which processors have no such instructions but are SMP-capable?

> How do we do this on nccNUMA machines later? How on clusters[1]?

Clusters are not my problem.  They require additional software.  And
NUMA machines maybe be requiring a certain sequence in which the
operations must be performed and the hardware should take care of the
rest.


I don't really care what the final implementation will be like.  For
UP and SMP machines I definitely want to have as much as possible at
user-level.  If you need a special libpthread for NUMA machines, so be
it.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Alan Cox <[EMAIL PROTECTED]> writes:

> > I don't want nor need file permissions.  A program would look like this:
> 
> Your example opens/mmaps so has file permissions. Which is what I was asking

There are no permissions on the mutex object.  It is the shared memory
which counts.  If you would implement the global mutexes as
independent objects in the filesystem hierarchy you would somehow
magically make the permissions match those of the object containing
the memory representation of the global semaphore.


   fd = open("somefile", O_CREAT|O_TRUNC, 0666)
   addr=mmap(fd)
   // assume attr is for a global mutex
   pthread_mutex_init((pthread_mutex_t*)addr, )
   fchmod(fd, 0600)
   fchown(fd, someuser, somegroup)

If pthread_mutex_attr() is allocating some kind of file, how do you
determine the permissions?  How are they changed if the permissions to
the file change?

The kernel representation of the mutex must not be disassociated from
the shared memory region.

Even if you all think very little about Solaris, look at the kernel
interface for semaphores.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ingo Oeser

On Thu, Apr 19, 2001 at 12:26:03PM -0700, Ulrich Drepper wrote:
> In any case all kinds of user-level operations are possible as well
> and all the schemes suggested for dealing with the common case without
> syscalls can be applied here as well.

Are you sure, you can implement SMP-safe, atomic operations (which you need
for all up()/down() in user space) WITHOUT using privileged
instructions on ALL archs Linux supports?

How do we do this on nccNUMA machines later? How on clusters[1]?

On what I can see in asm-*/atomic.h this is not possible, but I
probably miss sth. here ;-)

I didn't know that POSIX forbids using fds to implement a
semaphore. That's VERY bad.

Learning new APIs always means making a lot of mistakes and doing
this while we write production code, since nobody likes to pay for
experiments.

And I still see no point on speeding of creation and contention,
since these should be rare cases and the application overusing
these should be punished HARD.

Maybe someone can enlighten my on these aspects.

Regards

Ingo Oeser

[1] Ok, people already use other than Unix mechanisms for this
   stuff on massive parallel computing. So this might not be an
   issue. Only for libc internal sema4s
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag 
  been there and had much fun   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alan Cox

> I don't want nor need file permissions.  A program would look like this:

Your example opens/mmaps so has file permissions. Which is what I was asking

> The shared mem segment can be retrieved in whatever way.  The mutex in
> this case is anonymous.  Everybody who has access to the shared mem
> *must* have access to the mutex.

We agree 8)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Alan Cox <[EMAIL PROTECTED]> writes:

> mknod foo p. Or use sockets (although AF_UNIX sockets are higher latency)
> Thats why I suggested using flock - its name based. Whether you mkstemp()
> stuff and pass it around isnt something I care about
> 
> Files give you permissions for free too

I don't want nor need file permissions.  A program would look like this:


  process 1:


   fd = open("somefile")
   addr = mmap(fd);
   
   pthread_mutexattr_init();
   pthread_mutexattr_setpshared(, PTHREAD_PROCESS_SHARED);

   pthread_mutex_init ((pthread_mutex_t *) addr, );

   pthread_mutex_lock ((pthread_mutex_t *) addr);

   pthread_mutex_destroy((pthread_mutex_t *) addr);

  process 2:

   fd = open("somefile")
   addr = mmap(fd);

   pthread_mutex_lock ((pthread_mutex_t *) addr);


The shared mem segment can be retrieved in whatever way.  The mutex in
this case is anonymous.  Everybody who has access to the shared mem
*must* have access to the mutex.


For semaphores it looks similarly.  First the anonymous case:

 process 1:


   fd = open("somefile")
   addr = mmap(fd);

   sem_init ((sem_t *) addr, 1, 10);// 10 is arbitrary

   sem_wait ((sem_t *) addr);

   sem_destroy((sem_t *) addr);


  process 2:

   fd = open("somefile")
   addr = mmap(fd);

   sem_wait ((sem_t *) addr);

Note that POSIX semaphores could be implemented with global POSIX
mutexes.


Finally, named semaphores:

   semp = sem_open("somefile", O_CREAT|O_EXCL, 0600)

   sem_wait (semp);

   sem_close(semp);
   sem_unlink(semp);


This is the only semaphore kind which maps nicely to a pipe or socket.
All the others don't.  And even for named semaphores it is best to
have a separate name space like the shmfs.

> So you have unix file permissions on them ?

See above.  Permissions are only allowed for named semaphores.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Alan Cox <[EMAIL PROTECTED]> writes:

> > can libraries use fast semaphores behind the back of the user? They might
> > well want to use the semaphores exactly for things like memory allocator
> > locking etc. But libc certainly cant use fd's behind peoples backs.
> 
> libc is entitled to, and most definitely does exactly that. Take a look at
> things like gethostent, getpwent etc etc.

You are mixing two completely different things.

Functions like gethostent() and catopen() are explicitly allowed to be
implemented using file descriptors.  If this is allowed the standard
contains appropriate wording.

Other functions like setlocale() do use file descriptors, yes, but
these are not kept.  Before the function returns they are closed.
This can cause disruptions in other threads which find descriptors not
allocated sequentially but this has to be taken into account.  Rules
for multi-threaded applications are different.  A single-threaded
application will not see such a difference.

Now, the standards do not allow POSIX mutexes to be implemented using
file descriptors.  The same is true for unnamed POSIX semaphores.  So
Linus is right, though for a different reason than he thought.

The situation is a bit different for named POSIX semaphores.  These
can be implemented using file descriptors.  But they don't have to and
IMO they shouldn't.  A memory reference based semaphore implementation
would allow a named semaphore to be implemented using

  fd = open (name)
  addr = mmap (..fd..)
  close (fd)
  sem_syscall (addr)

i.e., it can be mapped to a memory reference again.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alan Cox

> I fail to see how this works across processes.  How can you generate a
> file descriptor for this pipe in a second process which simply shares
> some memory with the first one?  The first process is passive: no file
> descriptor passing must be necessary.

mknod foo p. Or use sockets (although AF_UNIX sockets are higher latency)
Thats why I suggested using flock - its name based. Whether you mkstemp()
stuff and pass it around isnt something I care about

Files give you permissions for free too

> Note that semaphores need not always be shared between processes.
> This is a property the user has to choose.  So the implementation can
> be easier in the normal intra-process case.

So you have unix file permissions on them ?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Linus Torvalds <[EMAIL PROTECTED]> writes:

> Looks good to me. Anybody want to try this out and test some benchmarks?

I fail to see how this works across processes.  How can you generate a
file descriptor for this pipe in a second process which simply shares
some memory with the first one?  The first process is passive: no file
descriptor passing must be necessary.

How these things are working elsewhere is that a memory address
(probably a physical address) is used as a token.  The semaphore
object is placed in the memory shared by the processes and the virtual
address is passed in the syscall.

Note that semaphores need not always be shared between processes.
This is a property the user has to choose.  So the implementation can
be easier in the normal intra-process case.

In any case all kinds of user-level operations are possible as well
and all the schemes suggested for dealing with the common case without
syscalls can be applied here as well.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Olaf Titz

> problems: just _how_ high woul dyou move it? Would it potentially disturb
> an application that opens thousands of files, and knows that they get
> consecutive file descriptors? Which is _legal_ and well-defined in UNIX.

Only if you close them before. The process may have been started with
arbitrary fds open.

> say "if you use fast semaphores, they use file descriptors and you should
> no longer depend on consecutive fd's".

Which you cannot anyway. Already some library routines can open fds
although they don't explicitly say so and don't have to in all
implementations, like openlog() or all the get*by*() stuff (or even
dlopen()), so you are never sure to know which or how many FDs you
actually have open.

Olaf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alexander Viro



On Thu, 19 Apr 2001, Linus Torvalds wrote:

> 
> 
> On Thu, 19 Apr 2001, Alexander Viro wrote:
> >
> > Ehh... Non-lazy variant is just read() and write() as down_failed() and
> > up_wakeup() Lazy... How about
> 
> Looks good to me. Anybody want to try this out and test some benchmarks?

Ugh. It doesn't look good for me. s/MAX_INT/MAX_INT>>1/ or we will
get into trouble on anything that goes into spin_and_lose. Window is
pretty narrow (notice that lost_it is OK - we only need to worry
about somebody coming in after winner drives Lock from 1 to 0
and before it gets it from 0 to MAX_INT), but we can get into serious
trouble if schedule() will hit that window.

MAX_INT/2 should be enough to deal with that, AFAICS.

However, I would _really_ like to get that code reviewed from the memory
access ordering POV. Warning: right now I'm half-asleep, so the thing can
very well be completely bogus in that area. Extra eyes would be certainly
welcome.

Al

PS: ->Lock should be set to 1 when we initialize semaphore. Destroying
semaphore should do
if (sem->Lock > 1) {
close(sem->writer);
close(sem->reader);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Alexander Viro wrote:
>
> Ehh... Non-lazy variant is just read() and write() as down_failed() and
> up_wakeup() Lazy... How about

Looks good to me. Anybody want to try this out and test some benchmarks?

There may be problems with large numbers of semaphores, but hopefully that
won't be an issue. And the ability to select/poll on these things might
come in handy for various implementation issues (ie locks with timeouts
etc).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alexander Viro



On Thu, 19 Apr 2001, Linus Torvalds wrote:

> 
> 
> On Thu, 19 Apr 2001, Alexander Viro wrote:
> >
> > I certainly agree that introducing ioctl() in _any_ API is a shootable
> > offense. However, I wonder whether we really need any kernel changes
> > at all.
> 
> I'd certainly be interested in seeing the pipe-based approach. Especially
> if you make the pipe allocation lazy. That isn'tr trivial (it needs to be
> done right with both up_failed() and down_failed() trying to allocate the
> pipe on contention and using an atomic cmpxchg-style setting if none
> existed before). It has the BIG advantage of working on old kernels, so
> that you don't need to have backwards compatibility cruft in the
> libraries.

Ehh... Non-lazy variant is just read() and write() as down_failed() and
up_wakeup() Lazy... How about

if (Lock <= 1)
goto must_open;
opened:
/* as in non-lazy case */


must_open:
pipe(fd);
lock decl Lock
jg lost_it  /* Already seriously positive - clean up and go */
jl spin_and_lose
/* Lock went from 1 to 0 - go ahead */
reader = fd[0];
writer = fd[1];
Lock = MAX_INT;
goto opened;
spin_and_lose:
/* Won't take long - another guy got to do 3 memory writes */
while (Lock <= 0)
;
lost_it:
lock incl Lock
close(fd[0]);
close(fd[1]);
goto opened;

Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alan Cox

> > libc is entitled to, and most definitely does exactly that. Take a look at
> > things like gethostent, getpwent etc etc.
> 
> Ehh.. I will bet you $10 USD that if libc allocates the next file
> descriptor on the first "malloc()" in user space (in order to use the
> semaphores for mm protection), programs _will_ break.
> 
> You want to take the bet?

Its not normally a good idea to take a Linus bet, but this time Im obviously
missing something. fd0-2 will be passed in (and if not then shit already
happens - see old bugtraq on the matter for setuid apps, glibc bugs)

So the C library gets fd 3
My first fopen gets fd 4.

That can already happen and isnt new. Several profiling libraries on Unix have
precisely this effect already. They dynamic link/loader will also open file
handles to do mmaps although generally you wont see those as they are closed
again after mapping. 

Internationalisation code in glibc will also open and map tables during startup


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Abramo Bagnara

Linus Torvalds wrote:
> 
> On Thu, 19 Apr 2001, Abramo Bagnara wrote:
> >
> > > [ Using file descriptors ]
> >
> > This would also permit:
> > - to have poll()
> > - to use mmap() to obtain the userspace area
> >
> > It would become something very near to sacred Unix dogmas ;-)
> 
> No, this is NOT what the UNIX dogmas are all about.
> 
> When UNIX says "everything is a file", it really means that "everything is
> a stream of bytes". Things like magic operations on file desciptors are
> _anathema_ to UNIX. ioctl() is the worst wart of UNIX. Having magic
> semantics of file descriptors is NOT Unix dogma at all, it is a horrible
> corruption of the original UNIX cleanlyness.

Nice outpouring indeed, it seems taken from L'Ouvre au Noir by
Marguerite Yourcenar ;-)))

You're perfectly right but the file descriptor solution appeared to me a
nice way to work around the Unix limitation to have poll(2) working only
on file descriptor.

Said this, I've no doubt that a better poll-like syscall would solve all
that in a more elegant way.

You understand that sometime we've no other choice that to design
workarounds to minimize needed changes (and then often to maximize
acceptance probability).

OTOH you may always decide to do things in the elegant way, you've such
a responsibility for linux kernel.

-- 
Abramo Bagnara   mailto:[EMAIL PROTECTED]

Opera Unica  Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project   http://www.alsa-project.org
It sounds good!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Alan Cox wrote:
> > can libraries use fast semaphores behind the back of the user? They might
> > well want to use the semaphores exactly for things like memory allocator
> > locking etc. But libc certainly cant use fd's behind peoples backs.
>
> libc is entitled to, and most definitely does exactly that. Take a look at
> things like gethostent, getpwent etc etc.

Ehh.. I will bet you $10 USD that if libc allocates the next file
descriptor on the first "malloc()" in user space (in order to use the
semaphores for mm protection), programs _will_ break.

You want to take the bet?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Abramo Bagnara wrote:
>
> > [ Using file descriptors ]
>
> This would also permit:
> - to have poll()
> - to use mmap() to obtain the userspace area
>
> It would become something very near to sacred Unix dogmas ;-)

No, this is NOT what the UNIX dogmas are all about.

When UNIX says "everything is a file", it really means that "everything is
a stream of bytes". Things like magic operations on file desciptors are
_anathema_ to UNIX. ioctl() is the worst wart of UNIX. Having magic
semantics of file descriptors is NOT Unix dogma at all, it is a horrible
corruption of the original UNIX cleanlyness.

Please don't excuse "semaphore file descriptors" with the "everything is a
file" mantra. It is not at ALL applicable.

The "everything is a file" mantra is to make pipe etc meaningful -
processes don't have to worry about whether the fd they have is from a
file open, a pipe() system call, opening a special block device, or a
socket()+connect() thing. They can just read and write. THAT is what UNIX
is all about.

And this is obviously NOT true of a "magic file descriptors for
semaphores". You can't pass it off as stdin to another process and expect
anything useful from it unless the other process _knows_ it is a special
semaphore thing and does mmap magic or something.

The greatness of UNIX comes from "everything is a stream of bytes". That's
something that almost nobody got right before UNIX. Remember VMS
structured files? Did anybody ever realize what an absolutely _idiotic_
crock the NT "CopyFile()" thing is for the same reason?

Don't confuse that with "everything should be a file descriptor". The two
have nothing to do with each other.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Alon Ziv wrote:
>
> * the userspace struct was just a signed count and a file handle.

The main reason I wanted to avoid a filehandle is just because it's
another name space that people already use, and that people know what the
semantics are for (ie "open()" is _defined_ to return the "lowest
available file descriptor", and people depend on that).

So if you use a file handle, you'd need to do magic - open it, and then
use dup2() to move it up high, or something. Which has its own set of
problems: just _how_ high woul dyou move it? Would it potentially disturb
an application that opens thousands of files, and knows that they get
consecutive file descriptors? Which is _legal_ and well-defined in UNIX.

However, I'm not married to the secure hash version - you could certainly
use another name-space, and something more akin to file descriptors. You
should be aware of issues like the above, though. Maybe it would be ok to
say "if you use fast semaphores, they use file descriptors and you should
no longer depend on consecutive fd's".

But note how that might make it really nasty for things like libraries:
can libraries use fast semaphores behind the back of the user? They might
well want to use the semaphores exactly for things like memory allocator
locking etc. But libc certainly cant use fd's behind peoples backs.

So personally, I actually think that you must _not_ use file descriptors.
But that doesn't mean that you couldn't have a more "file-desciptor-like"
approach.

Side note: the design _should_ allow for "lazy initialization". In
particular, it should be ok for FS_create() to not actually do a system
call at all, but just initialize the count and set a "uninitialized" flag.
And then the actual initialization would be done at "FS_down()" time, and
only if contention happens.

Why? Note that there are many cases where contention simply _cannot_
happen. The classic one is a thread-safe library that is used both by
threaded applications and by single-threaded ones, where the
single-threaded one would never actually trigger contention.

For these kinds of reasons it would actually be best to make try to
abstract the interfaces (notably the system call interface) as much as
possible, so that you can change the implementation inside the kernel
without having to recompile applications that use it. So the sanest
implementation might be one where

 - FS_create is a system call that just gets a 128-byte area and an ID.
 - the contention cases are plain system calls with no user-mode part to
   them at all.

This allows people to modify the behaviour of the semaphores later,
_without_ having any real coupling between user-mode expectations and
kernel implementation.

For example, if the user-mode library actually does a physical "open()" or
plays games with file descriptors itself, we will -always- be stuck with
the fd approach, and we can never fix it. But if you have opaque system
calls, you mist start out with a system call that internally just does the
equivalent of the "open a file descriptor and hide it in the semaphore",
and later on the thing can be changed to do whatever else without the user
program ever even realizing..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread George Talbot

On Tue, 17 Apr 2001 12:48:48 -0700 (PDT) Linus Torvalds wrote:

[deletia]

> /*
>  * a fast semaphore is a 128-byte opaque thing,
>  * aligned on a 128-byte boundary. This is partly
>  * to minimize false sharing in the L1 (we assume
>  * that 128-byte cache-lines are going to be fairly
>  * common), but also to allow the kernel to hide
>  * data there
>  */
> struct fast_semaphore {
> unsigned int opaque[32];
> } __attribute__((aligned, 64));
> 
> struct fast_semaphore *FS_create(char *ID);
> int FS_down(struct fast_semaphore *, unsigned long timeout);
> void FS_up(struct fast_semaphore *);

[deletia]

These are counting semaphores, right?  If so, would this make sense?

struct fast_semaphore *FS_create(char *ID, int initial_count);

[FS_down/FS_up the same]

int FS_get_count(struct fast_semaphore *);

The FS_get_count() is less useful, but being able to pass the initial
count to the semaphore is pretty important.

--George

begin:vcard 
n:Talbot;George
tel;fax:732-615-4526
tel;work:732-615-5099
x-mozilla-html:FALSE
org:Lucent, Inc.;Inter-Networking Systems
adr:;;480 Red Hill Road, Building 1;Middletown;NJ;07748;USA
version:2.1
email;internet:[EMAIL PROTECTED]
title:Senior Software Engineer
x-mozilla-cpt:;0
fn:George Talbot
end:vcard



Re: light weight user level semaphores

2001-04-19 Thread Alan Cox

> My lightweight-semaphores were actually even simpler in userspace:
> * the userspace struct was just a signed count and a file handle.
> * Uncontended case is exactly like Linus' version (i.e., down() is decl +
> js, up() is incl()).
> * The contention syscall was (in my implementation) an ioctl on the FH; the
> FH was a special one, from a private syscall (although with the new VFS I'd
> have written it as just another specialized FS, or even referred into the
> SysVsem FS).

Which raises an even more interesting question. Suppose your semaphore function
wanst a magic file system but was flock on a standard file ? The contention
overhead is rather less nice than Linus proposal but it ought 8) to work 
without any kernel patches

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Abramo Bagnara

Alexander Viro wrote:
> 
> I suspect that simple pipe with would be sufficient to handle contention
> case - nothing fancy needed (read when you need to block, write upon up()
> when you have contenders)
> 
> Would something along the lines of (inline as needed, etc.)
> 
> down:
> lock decl count
> js __down_failed
> down_done:
> ret
> 
> up:
> lock incl count
> jle __up_waking
> up_done:
> ret
> 
> __down_failed:
> call down_failed
> jmp down_done
> __up_waking:
> call up_waking
> jmp up_done
> 
> down_failed()
> {
> read(pipe_fd, , 1);
> }
> 
> up_waking()
> {
> write(pipe_fd, , 1);
> }
> 
> be enough?

There is something wonderful in this simple solution.

However I've a few doubts:
- choice policy for thread to wake is not selectable
- we separate shared memory area from file descriptor
- the implementation of down_try has neither been discussed nor
excluded, but I don't see how to implement it

The implementation of a specific filesystem seems to me more flexyble.

-- 
Abramo Bagnara   mailto:[EMAIL PROTECTED]

Opera Unica  Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project   http://www.alsa-project.org
It sounds good!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ingo Oeser

On Thu, Apr 19, 2001 at 10:20:48AM +0200, Alon Ziv wrote:
> My lightweight-semaphores were actually even simpler in userspace:
> * the userspace struct was just a signed count and a file handle.
> * Uncontended case is exactly like Linus' version (i.e., down() is decl +
> js, up() is incl()).
> * The contention syscall was (in my implementation) an ioctl on the FH; the
> FH was a special one, from a private syscall (although with the new VFS I'd
> have written it as just another specialized FS, or even referred into the
> SysVsem FS).

This is roughly the way I would prefer it. 

But I would dedicate a whole page to this struct, since this is
the granularity we can decide sharing on. This also has the
advantage, that we can include a lot of debugging info into this
page, too. Some people would like to know current contenders,
up/down ratio per second and contender etc.

Why? We have the infrastructure and all the semantics already in
place and it is well known to the programmers. We know how we
inherit this stuff, what will happen on process termination and
so on.

I thought about this myself a lot, but just didn't like the idea
to trust user space for up/down. I thought about abusing read() and
write() for down() and up(). Just doing it partially in user
space would be an significant speedup, once you got it right.

Maybe we can even combine both of it like this:

Then user space can do:

   /* open or create sema4 with normal open semantics */
   fd=open("/dev/sema4/myone");
   sema4=mmap(NULL,getpagesize(),,,fd,0);

   /* up */
   atomic_inc_and_test_for_zero(sema4) && ioctl(fd,WAKE_SLEEPERS,NULL);

   /* down */
   atomic_dec_and_test_negative(sema4) && ioctl(fd,SLEEP_NOW,NULL);

or 
   /* open or create sema4 with normal open semantics */
   fd=open("/dev/sema4/myone");

   /* up */
   write(fd, NULL,0); /* do the atomic stuff and wakeup in kernel */

   /* this might be stupid, but COULD be implemented */
   /* add 4 items to counter */
   write(fd, NULL, 4); 

   /* down */
   read(sama4, NULL, 0);

We could even do trylock() by default, if we open O_NONBLOCK. Or
we could do trylock sometimes using select() and poll(). This
also makes it easy to add it to existing select() loops like
Motif.

This differences could even be hidden by the libc. IIRC there are
some archs, which cannot do atomic operations without privileged
instructions, which is not acceptable in user space. Also there
are archs, which are not cache coherent (think NUMA) and where
flushing these caches to the other CPUs is privileged. Last but
not least there are clusters with process migration. 

My twofold approach would solve all these problems rather simply.

It would be a libc decision on what to use now. And the libc
knows enough about the application to handle all these cases.

The only thing we still need, is what we do if a contender or
waiter ist killed. Should we send SIGPIPE? Should we simply wake
all the waiters?

And we are not creating a new namespace again, but just use the
standard UN*X one: File name space.

Hopes this "fit into namespace" solution will be considered,
because I don't like to have a new linux-only API with completely
new semantics and things to care in wrappers, even if you don't
use this stuff. 

I also don't like the "kill me if I do a mistake"
that Linus proposed in the "bad_sem" label.

Comments? Flames? Overengineered?

Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag 
  been there and had much fun   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alexander Viro



On Thu, 19 Apr 2001, Abramo Bagnara wrote:

> Alon Ziv wrote:
> > 
> > Hmm...
> > I already started (long ago, and abandoned since due to lack of time :-( )
> > down another path; I'd like to resurrect it...
> > 
> > My lightweight-semaphores were actually even simpler in userspace:
> > * the userspace struct was just a signed count and a file handle.
> > * Uncontended case is exactly like Linus' version (i.e., down() is decl +
> > js, up() is incl()).
> > * The contention syscall was (in my implementation) an ioctl on the FH; the
> > FH was a special one, from a private syscall (although with the new VFS I'd
> > have written it as just another specialized FS, or even referred into the
> > SysVsem FS).
> > 
> > So, there is no chance for user corruption of kernel data (as it just ain't
> > there...); and the contended-case cost is probably equivalent (VFS cost vs.
> > validation).
> 
> This would also permit:
> - to have poll()
> - to use mmap() to obtain the userspace area
> 
> It would become something very near to sacred Unix dogmas ;-)

I suspect that simple pipe with would be sufficient to handle contention
case - nothing fancy needed (read when you need to block, write upon up()
when you have contenders)

Would something along the lines of (inline as needed, etc.)

down:
lock decl count
js __down_failed
down_done:
ret

up:
lock incl count
jle __up_waking
up_done:
ret

__down_failed:
call down_failed
jmp down_done
__up_waking:
call up_waking
jmp up_done

down_failed()
{
read(pipe_fd, , 1);
}

up_waking()
{
write(pipe_fd, , 1);
}

be enough?
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Abramo Bagnara

Alon Ziv wrote:
> 
> Hmm...
> I already started (long ago, and abandoned since due to lack of time :-( )
> down another path; I'd like to resurrect it...
> 
> My lightweight-semaphores were actually even simpler in userspace:
> * the userspace struct was just a signed count and a file handle.
> * Uncontended case is exactly like Linus' version (i.e., down() is decl +
> js, up() is incl()).
> * The contention syscall was (in my implementation) an ioctl on the FH; the
> FH was a special one, from a private syscall (although with the new VFS I'd
> have written it as just another specialized FS, or even referred into the
> SysVsem FS).
> 
> So, there is no chance for user corruption of kernel data (as it just ain't
> there...); and the contended-case cost is probably equivalent (VFS cost vs.
> validation).

This would also permit:
- to have poll()
- to use mmap() to obtain the userspace area

It would become something very near to sacred Unix dogmas ;-)

-- 
Abramo Bagnara   mailto:[EMAIL PROTECTED]

Opera Unica  Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project   http://www.alsa-project.org
It sounds good!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alon Ziv

Hmm...
I already started (long ago, and abandoned since due to lack of time :-( )
down another path; I'd like to resurrect it...

My lightweight-semaphores were actually even simpler in userspace:
* the userspace struct was just a signed count and a file handle.
* Uncontended case is exactly like Linus' version (i.e., down() is decl +
js, up() is incl()).
* The contention syscall was (in my implementation) an ioctl on the FH; the
FH was a special one, from a private syscall (although with the new VFS I'd
have written it as just another specialized FS, or even referred into the
SysVsem FS).

So, there is no chance for user corruption of kernel data (as it just ain't
there...); and the contended-case cost is probably equivalent (VFS cost vs.
validation).

Hope I inspired someone...

-az

- Original Message -
From: "Ulrich Drepper" <[EMAIL PROTECTED]>
To: "Linus Torvalds" <[EMAIL PROTECTED]>
Cc: "Mike Kravetz" <[EMAIL PROTECTED]>; "Kernel Mailing List"
<[EMAIL PROTECTED]>
Sent: Wednesday, April 18, 2001 21:35
Subject: Re: light weight user level semaphores


> Linus Torvalds <[EMAIL PROTECTED]> writes:
>
> Sounds good so far.  Some comments.
>
> >  - FS_create is responsible for allocating a shared memory region
> >at "FS_create()" time.
>
> This is not so great.  The POSIX shared semaphores require that an
> pthread_mutex_t object placed in a shared memory region can be
> initialized to work across process boundaries.  I.e., the FS_create
> function would actually be FS_init.  There is no problem with the
> kernel or the helper code at user level allocating more storage (for
> the waitlist of whatever) but it must not be necessary for the user to
> know about them and place them in share memory themselves.
>
> The situation for non-shared (i.e. intra-process) semaphores are
> easier.  What I didn't understand is your remark about fork.  The
> semaphores should be cloned.  Unless the shared flag is set there
> should be no sharing among processes.
>
>
> The rest seems OK.  Thanks,
>
> --
> ---.  ,-.   1325 Chesapeake Terrace
> Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
> Red Hat  `--' drepper at redhat.com   `
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alon Ziv

Hmm...
I already started (long ago, and abandoned since due to lack of time :-( )
down another path; I'd like to resurrect it...

My lightweight-semaphores were actually even simpler in userspace:
* the userspace struct was just a signed count and a file handle.
* Uncontended case is exactly like Linus' version (i.e., down() is decl +
js, up() is incl()).
* The contention syscall was (in my implementation) an ioctl on the FH; the
FH was a special one, from a private syscall (although with the new VFS I'd
have written it as just another specialized FS, or even referred into the
SysVsem FS).

So, there is no chance for user corruption of kernel data (as it just ain't
there...); and the contended-case cost is probably equivalent (VFS cost vs.
validation).

Hope I inspired someone...

-az

- Original Message -
From: "Ulrich Drepper" [EMAIL PROTECTED]
To: "Linus Torvalds" [EMAIL PROTECTED]
Cc: "Mike Kravetz" [EMAIL PROTECTED]; "Kernel Mailing List"
[EMAIL PROTECTED]
Sent: Wednesday, April 18, 2001 21:35
Subject: Re: light weight user level semaphores


 Linus Torvalds [EMAIL PROTECTED] writes:

 Sounds good so far.  Some comments.

   - FS_create is responsible for allocating a shared memory region
 at "FS_create()" time.

 This is not so great.  The POSIX shared semaphores require that an
 pthread_mutex_t object placed in a shared memory region can be
 initialized to work across process boundaries.  I.e., the FS_create
 function would actually be FS_init.  There is no problem with the
 kernel or the helper code at user level allocating more storage (for
 the waitlist of whatever) but it must not be necessary for the user to
 know about them and place them in share memory themselves.

 The situation for non-shared (i.e. intra-process) semaphores are
 easier.  What I didn't understand is your remark about fork.  The
 semaphores should be cloned.  Unless the shared flag is set there
 should be no sharing among processes.


 The rest seems OK.  Thanks,

 --
 ---.  ,-.   1325 Chesapeake Terrace
 Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
 Red Hat  `--' drepper at redhat.com   `
 -
 To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Abramo Bagnara

Alon Ziv wrote:
 
 Hmm...
 I already started (long ago, and abandoned since due to lack of time :-( )
 down another path; I'd like to resurrect it...
 
 My lightweight-semaphores were actually even simpler in userspace:
 * the userspace struct was just a signed count and a file handle.
 * Uncontended case is exactly like Linus' version (i.e., down() is decl +
 js, up() is incl()).
 * The contention syscall was (in my implementation) an ioctl on the FH; the
 FH was a special one, from a private syscall (although with the new VFS I'd
 have written it as just another specialized FS, or even referred into the
 SysVsem FS).
 
 So, there is no chance for user corruption of kernel data (as it just ain't
 there...); and the contended-case cost is probably equivalent (VFS cost vs.
 validation).

This would also permit:
- to have poll()
- to use mmap() to obtain the userspace area

It would become something very near to sacred Unix dogmas ;-)

-- 
Abramo Bagnara   mailto:[EMAIL PROTECTED]

Opera Unica  Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project   http://www.alsa-project.org
It sounds good!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ingo Oeser

On Thu, Apr 19, 2001 at 10:20:48AM +0200, Alon Ziv wrote:
 My lightweight-semaphores were actually even simpler in userspace:
 * the userspace struct was just a signed count and a file handle.
 * Uncontended case is exactly like Linus' version (i.e., down() is decl +
 js, up() is incl()).
 * The contention syscall was (in my implementation) an ioctl on the FH; the
 FH was a special one, from a private syscall (although with the new VFS I'd
 have written it as just another specialized FS, or even referred into the
 SysVsem FS).

This is roughly the way I would prefer it. 

But I would dedicate a whole page to this struct, since this is
the granularity we can decide sharing on. This also has the
advantage, that we can include a lot of debugging info into this
page, too. Some people would like to know current contenders,
up/down ratio per second and contender etc.

Why? We have the infrastructure and all the semantics already in
place and it is well known to the programmers. We know how we
inherit this stuff, what will happen on process termination and
so on.

I thought about this myself a lot, but just didn't like the idea
to trust user space for up/down. I thought about abusing read() and
write() for down() and up(). Just doing it partially in user
space would be an significant speedup, once you got it right.

Maybe we can even combine both of it like this:

Then user space can do:

   /* open or create sema4 with normal open semantics */
   fd=open("/dev/sema4/myone");
   sema4=mmap(NULL,getpagesize(),,,fd,0);

   /* up */
   atomic_inc_and_test_for_zero(sema4)  ioctl(fd,WAKE_SLEEPERS,NULL);

   /* down */
   atomic_dec_and_test_negative(sema4)  ioctl(fd,SLEEP_NOW,NULL);

or 
   /* open or create sema4 with normal open semantics */
   fd=open("/dev/sema4/myone");

   /* up */
   write(fd, NULL,0); /* do the atomic stuff and wakeup in kernel */

   /* this might be stupid, but COULD be implemented */
   /* add 4 items to counter */
   write(fd, NULL, 4); 

   /* down */
   read(sama4, NULL, 0);

We could even do trylock() by default, if we open O_NONBLOCK. Or
we could do trylock sometimes using select() and poll(). This
also makes it easy to add it to existing select() loops like
Motif.

This differences could even be hidden by the libc. IIRC there are
some archs, which cannot do atomic operations without privileged
instructions, which is not acceptable in user space. Also there
are archs, which are not cache coherent (think NUMA) and where
flushing these caches to the other CPUs is privileged. Last but
not least there are clusters with process migration. 

My twofold approach would solve all these problems rather simply.

It would be a libc decision on what to use now. And the libc
knows enough about the application to handle all these cases.

The only thing we still need, is what we do if a contender or
waiter ist killed. Should we send SIGPIPE? Should we simply wake
all the waiters?

And we are not creating a new namespace again, but just use the
standard UN*X one: File name space.

Hopes this "fit into namespace" solution will be considered,
because I don't like to have a new linux-only API with completely
new semantics and things to care in wrappers, even if you don't
use this stuff. 

I also don't like the "kill me if I do a mistake"
that Linus proposed in the "bad_sem" label.

Comments? Flames? Overengineered?

Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag http://www.tu-chemnitz.de/linux/tag
  been there and had much fun   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alexander Viro



On Thu, 19 Apr 2001, Abramo Bagnara wrote:

 Alon Ziv wrote:
  
  Hmm...
  I already started (long ago, and abandoned since due to lack of time :-( )
  down another path; I'd like to resurrect it...
  
  My lightweight-semaphores were actually even simpler in userspace:
  * the userspace struct was just a signed count and a file handle.
  * Uncontended case is exactly like Linus' version (i.e., down() is decl +
  js, up() is incl()).
  * The contention syscall was (in my implementation) an ioctl on the FH; the
  FH was a special one, from a private syscall (although with the new VFS I'd
  have written it as just another specialized FS, or even referred into the
  SysVsem FS).
  
  So, there is no chance for user corruption of kernel data (as it just ain't
  there...); and the contended-case cost is probably equivalent (VFS cost vs.
  validation).
 
 This would also permit:
 - to have poll()
 - to use mmap() to obtain the userspace area
 
 It would become something very near to sacred Unix dogmas ;-)

I suspect that simple pipe with would be sufficient to handle contention
case - nothing fancy needed (read when you need to block, write upon up()
when you have contenders)

Would something along the lines of (inline as needed, etc.)

down:
lock decl count
js __down_failed
down_done:
ret

up:
lock incl count
jle __up_waking
up_done:
ret

__down_failed:
call down_failed
jmp down_done
__up_waking:
call up_waking
jmp up_done

down_failed()
{
read(pipe_fd, dummy, 1);
}

up_waking()
{
write(pipe_fd, dummy, 1);
}

be enough?
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Abramo Bagnara

Alexander Viro wrote:
 
 I suspect that simple pipe with would be sufficient to handle contention
 case - nothing fancy needed (read when you need to block, write upon up()
 when you have contenders)
 
 Would something along the lines of (inline as needed, etc.)
 
 down:
 lock decl count
 js __down_failed
 down_done:
 ret
 
 up:
 lock incl count
 jle __up_waking
 up_done:
 ret
 
 __down_failed:
 call down_failed
 jmp down_done
 __up_waking:
 call up_waking
 jmp up_done
 
 down_failed()
 {
 read(pipe_fd, dummy, 1);
 }
 
 up_waking()
 {
 write(pipe_fd, dummy, 1);
 }
 
 be enough?

There is something wonderful in this simple solution.

However I've a few doubts:
- choice policy for thread to wake is not selectable
- we separate shared memory area from file descriptor
- the implementation of down_try has neither been discussed nor
excluded, but I don't see how to implement it

The implementation of a specific filesystem seems to me more flexyble.

-- 
Abramo Bagnara   mailto:[EMAIL PROTECTED]

Opera Unica  Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project   http://www.alsa-project.org
It sounds good!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alan Cox

 My lightweight-semaphores were actually even simpler in userspace:
 * the userspace struct was just a signed count and a file handle.
 * Uncontended case is exactly like Linus' version (i.e., down() is decl +
 js, up() is incl()).
 * The contention syscall was (in my implementation) an ioctl on the FH; the
 FH was a special one, from a private syscall (although with the new VFS I'd
 have written it as just another specialized FS, or even referred into the
 SysVsem FS).

Which raises an even more interesting question. Suppose your semaphore function
wanst a magic file system but was flock on a standard file ? The contention
overhead is rather less nice than Linus proposal but it ought 8) to work 
without any kernel patches

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread George Talbot

On Tue, 17 Apr 2001 12:48:48 -0700 (PDT) Linus Torvalds wrote:

[deletia]

 /*
  * a fast semaphore is a 128-byte opaque thing,
  * aligned on a 128-byte boundary. This is partly
  * to minimize false sharing in the L1 (we assume
  * that 128-byte cache-lines are going to be fairly
  * common), but also to allow the kernel to hide
  * data there
  */
 struct fast_semaphore {
 unsigned int opaque[32];
 } __attribute__((aligned, 64));
 
 struct fast_semaphore *FS_create(char *ID);
 int FS_down(struct fast_semaphore *, unsigned long timeout);
 void FS_up(struct fast_semaphore *);

[deletia]

These are counting semaphores, right?  If so, would this make sense?

struct fast_semaphore *FS_create(char *ID, int initial_count);

[FS_down/FS_up the same]

int FS_get_count(struct fast_semaphore *);

The FS_get_count() is less useful, but being able to pass the initial
count to the semaphore is pretty important.

--George

begin:vcard 
n:Talbot;George
tel;fax:732-615-4526
tel;work:732-615-5099
x-mozilla-html:FALSE
org:Lucent, Inc.;Inter-Networking Systems
adr:;;480 Red Hill Road, Building 1;Middletown;NJ;07748;USA
version:2.1
email;internet:[EMAIL PROTECTED]
title:Senior Software Engineer
x-mozilla-cpt:;0
fn:George Talbot
end:vcard



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Alon Ziv wrote:

 * the userspace struct was just a signed count and a file handle.

The main reason I wanted to avoid a filehandle is just because it's
another name space that people already use, and that people know what the
semantics are for (ie "open()" is _defined_ to return the "lowest
available file descriptor", and people depend on that).

So if you use a file handle, you'd need to do magic - open it, and then
use dup2() to move it up high, or something. Which has its own set of
problems: just _how_ high woul dyou move it? Would it potentially disturb
an application that opens thousands of files, and knows that they get
consecutive file descriptors? Which is _legal_ and well-defined in UNIX.

However, I'm not married to the secure hash version - you could certainly
use another name-space, and something more akin to file descriptors. You
should be aware of issues like the above, though. Maybe it would be ok to
say "if you use fast semaphores, they use file descriptors and you should
no longer depend on consecutive fd's".

But note how that might make it really nasty for things like libraries:
can libraries use fast semaphores behind the back of the user? They might
well want to use the semaphores exactly for things like memory allocator
locking etc. But libc certainly cant use fd's behind peoples backs.

So personally, I actually think that you must _not_ use file descriptors.
But that doesn't mean that you couldn't have a more "file-desciptor-like"
approach.

Side note: the design _should_ allow for "lazy initialization". In
particular, it should be ok for FS_create() to not actually do a system
call at all, but just initialize the count and set a "uninitialized" flag.
And then the actual initialization would be done at "FS_down()" time, and
only if contention happens.

Why? Note that there are many cases where contention simply _cannot_
happen. The classic one is a thread-safe library that is used both by
threaded applications and by single-threaded ones, where the
single-threaded one would never actually trigger contention.

For these kinds of reasons it would actually be best to make try to
abstract the interfaces (notably the system call interface) as much as
possible, so that you can change the implementation inside the kernel
without having to recompile applications that use it. So the sanest
implementation might be one where

 - FS_create is a system call that just gets a 128-byte area and an ID.
 - the contention cases are plain system calls with no user-mode part to
   them at all.

This allows people to modify the behaviour of the semaphores later,
_without_ having any real coupling between user-mode expectations and
kernel implementation.

For example, if the user-mode library actually does a physical "open()" or
plays games with file descriptors itself, we will -always- be stuck with
the fd approach, and we can never fix it. But if you have opaque system
calls, you mist start out with a system call that internally just does the
equivalent of the "open a file descriptor and hide it in the semaphore",
and later on the thing can be changed to do whatever else without the user
program ever even realizing..

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Abramo Bagnara wrote:

  [ Using file descriptors ]

 This would also permit:
 - to have poll()
 - to use mmap() to obtain the userspace area

 It would become something very near to sacred Unix dogmas ;-)

No, this is NOT what the UNIX dogmas are all about.

When UNIX says "everything is a file", it really means that "everything is
a stream of bytes". Things like magic operations on file desciptors are
_anathema_ to UNIX. ioctl() is the worst wart of UNIX. Having magic
semantics of file descriptors is NOT Unix dogma at all, it is a horrible
corruption of the original UNIX cleanlyness.

Please don't excuse "semaphore file descriptors" with the "everything is a
file" mantra. It is not at ALL applicable.

The "everything is a file" mantra is to make pipe etc meaningful -
processes don't have to worry about whether the fd they have is from a
file open, a pipe() system call, opening a special block device, or a
socket()+connect() thing. They can just read and write. THAT is what UNIX
is all about.

And this is obviously NOT true of a "magic file descriptors for
semaphores". You can't pass it off as stdin to another process and expect
anything useful from it unless the other process _knows_ it is a special
semaphore thing and does mmap magic or something.

The greatness of UNIX comes from "everything is a stream of bytes". That's
something that almost nobody got right before UNIX. Remember VMS
structured files? Did anybody ever realize what an absolutely _idiotic_
crock the NT "CopyFile()" thing is for the same reason?

Don't confuse that with "everything should be a file descriptor". The two
have nothing to do with each other.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Alan Cox wrote:
  can libraries use fast semaphores behind the back of the user? They might
  well want to use the semaphores exactly for things like memory allocator
  locking etc. But libc certainly cant use fd's behind peoples backs.

 libc is entitled to, and most definitely does exactly that. Take a look at
 things like gethostent, getpwent etc etc.

Ehh.. I will bet you $10 USD that if libc allocates the next file
descriptor on the first "malloc()" in user space (in order to use the
semaphores for mm protection), programs _will_ break.

You want to take the bet?

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Abramo Bagnara

Linus Torvalds wrote:
 
 On Thu, 19 Apr 2001, Abramo Bagnara wrote:
 
   [ Using file descriptors ]
 
  This would also permit:
  - to have poll()
  - to use mmap() to obtain the userspace area
 
  It would become something very near to sacred Unix dogmas ;-)
 
 No, this is NOT what the UNIX dogmas are all about.
 
 When UNIX says "everything is a file", it really means that "everything is
 a stream of bytes". Things like magic operations on file desciptors are
 _anathema_ to UNIX. ioctl() is the worst wart of UNIX. Having magic
 semantics of file descriptors is NOT Unix dogma at all, it is a horrible
 corruption of the original UNIX cleanlyness.

Nice outpouring indeed, it seems taken from L'Ouvre au Noir by
Marguerite Yourcenar ;-)))

You're perfectly right but the file descriptor solution appeared to me a
nice way to work around the Unix limitation to have poll(2) working only
on file descriptor.

Said this, I've no doubt that a better poll-like syscall would solve all
that in a more elegant way.

You understand that sometime we've no other choice that to design
workarounds to minimize needed changes (and then often to maximize
acceptance probability).

OTOH you may always decide to do things in the elegant way, you've such
a responsibility for linux kernel.

-- 
Abramo Bagnara   mailto:[EMAIL PROTECTED]

Opera Unica  Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project   http://www.alsa-project.org
It sounds good!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alan Cox

  libc is entitled to, and most definitely does exactly that. Take a look at
  things like gethostent, getpwent etc etc.
 
 Ehh.. I will bet you $10 USD that if libc allocates the next file
 descriptor on the first "malloc()" in user space (in order to use the
 semaphores for mm protection), programs _will_ break.
 
 You want to take the bet?

Its not normally a good idea to take a Linus bet, but this time Im obviously
missing something. fd0-2 will be passed in (and if not then shit already
happens - see old bugtraq on the matter for setuid apps, glibc bugs)

So the C library gets fd 3
My first fopen gets fd 4.

That can already happen and isnt new. Several profiling libraries on Unix have
precisely this effect already. They dynamic link/loader will also open file
handles to do mmaps although generally you wont see those as they are closed
again after mapping. 

Internationalisation code in glibc will also open and map tables during startup


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alexander Viro



On Thu, 19 Apr 2001, Linus Torvalds wrote:

 
 
 On Thu, 19 Apr 2001, Alexander Viro wrote:
 
  I certainly agree that introducing ioctl() in _any_ API is a shootable
  offense. However, I wonder whether we really need any kernel changes
  at all.
 
 I'd certainly be interested in seeing the pipe-based approach. Especially
 if you make the pipe allocation lazy. That isn'tr trivial (it needs to be
 done right with both up_failed() and down_failed() trying to allocate the
 pipe on contention and using an atomic cmpxchg-style setting if none
 existed before). It has the BIG advantage of working on old kernels, so
 that you don't need to have backwards compatibility cruft in the
 libraries.

Ehh... Non-lazy variant is just read() and write() as down_failed() and
up_wakeup() Lazy... How about

if (Lock = 1)
goto must_open;
opened:
/* as in non-lazy case */


must_open:
pipe(fd);
lock decl Lock
jg lost_it  /* Already seriously positive - clean up and go */
jl spin_and_lose
/* Lock went from 1 to 0 - go ahead */
reader = fd[0];
writer = fd[1];
Lock = MAX_INT;
goto opened;
spin_and_lose:
/* Won't take long - another guy got to do 3 memory writes */
while (Lock = 0)
;
lost_it:
lock incl Lock
close(fd[0]);
close(fd[1]);
goto opened;

Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Alexander Viro wrote:

 Ehh... Non-lazy variant is just read() and write() as down_failed() and
 up_wakeup() Lazy... How about

Looks good to me. Anybody want to try this out and test some benchmarks?

There may be problems with large numbers of semaphores, but hopefully that
won't be an issue. And the ability to select/poll on these things might
come in handy for various implementation issues (ie locks with timeouts
etc).

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alexander Viro



On Thu, 19 Apr 2001, Linus Torvalds wrote:

 
 
 On Thu, 19 Apr 2001, Alexander Viro wrote:
 
  Ehh... Non-lazy variant is just read() and write() as down_failed() and
  up_wakeup() Lazy... How about
 
 Looks good to me. Anybody want to try this out and test some benchmarks?

Ugh. It doesn't look good for me. s/MAX_INT/MAX_INT1/ or we will
get into trouble on anything that goes into spin_and_lose. Window is
pretty narrow (notice that lost_it is OK - we only need to worry
about somebody coming in after winner drives Lock from 1 to 0
and before it gets it from 0 to MAX_INT), but we can get into serious
trouble if schedule() will hit that window.

MAX_INT/2 should be enough to deal with that, AFAICS.

However, I would _really_ like to get that code reviewed from the memory
access ordering POV. Warning: right now I'm half-asleep, so the thing can
very well be completely bogus in that area. Extra eyes would be certainly
welcome.

Al

PS: -Lock should be set to 1 when we initialize semaphore. Destroying
semaphore should do
if (sem-Lock  1) {
close(sem-writer);
close(sem-reader);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Olaf Titz

 problems: just _how_ high woul dyou move it? Would it potentially disturb
 an application that opens thousands of files, and knows that they get
 consecutive file descriptors? Which is _legal_ and well-defined in UNIX.

Only if you close them before. The process may have been started with
arbitrary fds open.

 say "if you use fast semaphores, they use file descriptors and you should
 no longer depend on consecutive fd's".

Which you cannot anyway. Already some library routines can open fds
although they don't explicitly say so and don't have to in all
implementations, like openlog() or all the get*by*() stuff (or even
dlopen()), so you are never sure to know which or how many FDs you
actually have open.

Olaf
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Linus Torvalds [EMAIL PROTECTED] writes:

 Looks good to me. Anybody want to try this out and test some benchmarks?

I fail to see how this works across processes.  How can you generate a
file descriptor for this pipe in a second process which simply shares
some memory with the first one?  The first process is passive: no file
descriptor passing must be necessary.

How these things are working elsewhere is that a memory address
(probably a physical address) is used as a token.  The semaphore
object is placed in the memory shared by the processes and the virtual
address is passed in the syscall.

Note that semaphores need not always be shared between processes.
This is a property the user has to choose.  So the implementation can
be easier in the normal intra-process case.

In any case all kinds of user-level operations are possible as well
and all the schemes suggested for dealing with the common case without
syscalls can be applied here as well.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Alan Cox [EMAIL PROTECTED] writes:

  can libraries use fast semaphores behind the back of the user? They might
  well want to use the semaphores exactly for things like memory allocator
  locking etc. But libc certainly cant use fd's behind peoples backs.
 
 libc is entitled to, and most definitely does exactly that. Take a look at
 things like gethostent, getpwent etc etc.

You are mixing two completely different things.

Functions like gethostent() and catopen() are explicitly allowed to be
implemented using file descriptors.  If this is allowed the standard
contains appropriate wording.

Other functions like setlocale() do use file descriptors, yes, but
these are not kept.  Before the function returns they are closed.
This can cause disruptions in other threads which find descriptors not
allocated sequentially but this has to be taken into account.  Rules
for multi-threaded applications are different.  A single-threaded
application will not see such a difference.

Now, the standards do not allow POSIX mutexes to be implemented using
file descriptors.  The same is true for unnamed POSIX semaphores.  So
Linus is right, though for a different reason than he thought.

The situation is a bit different for named POSIX semaphores.  These
can be implemented using file descriptors.  But they don't have to and
IMO they shouldn't.  A memory reference based semaphore implementation
would allow a named semaphore to be implemented using

  fd = open (name)
  addr = mmap (..fd..)
  close (fd)
  sem_syscall (addr)

i.e., it can be mapped to a memory reference again.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alan Cox

 I fail to see how this works across processes.  How can you generate a
 file descriptor for this pipe in a second process which simply shares
 some memory with the first one?  The first process is passive: no file
 descriptor passing must be necessary.

mknod foo p. Or use sockets (although AF_UNIX sockets are higher latency)
Thats why I suggested using flock - its name based. Whether you mkstemp()
stuff and pass it around isnt something I care about

Files give you permissions for free too

 Note that semaphores need not always be shared between processes.
 This is a property the user has to choose.  So the implementation can
 be easier in the normal intra-process case.

So you have unix file permissions on them ?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Alan Cox [EMAIL PROTECTED] writes:

 mknod foo p. Or use sockets (although AF_UNIX sockets are higher latency)
 Thats why I suggested using flock - its name based. Whether you mkstemp()
 stuff and pass it around isnt something I care about
 
 Files give you permissions for free too

I don't want nor need file permissions.  A program would look like this:


  process 1:


   fd = open("somefile")
   addr = mmap(fd);
   
   pthread_mutexattr_init(attr);
   pthread_mutexattr_setpshared(attr, PTHREAD_PROCESS_SHARED);

   pthread_mutex_init ((pthread_mutex_t *) addr, attr);

   pthread_mutex_lock ((pthread_mutex_t *) addr);

   pthread_mutex_destroy((pthread_mutex_t *) addr);

  process 2:

   fd = open("somefile")
   addr = mmap(fd);

   pthread_mutex_lock ((pthread_mutex_t *) addr);


The shared mem segment can be retrieved in whatever way.  The mutex in
this case is anonymous.  Everybody who has access to the shared mem
*must* have access to the mutex.


For semaphores it looks similarly.  First the anonymous case:

 process 1:


   fd = open("somefile")
   addr = mmap(fd);

   sem_init ((sem_t *) addr, 1, 10);// 10 is arbitrary

   sem_wait ((sem_t *) addr);

   sem_destroy((sem_t *) addr);


  process 2:

   fd = open("somefile")
   addr = mmap(fd);

   sem_wait ((sem_t *) addr);

Note that POSIX semaphores could be implemented with global POSIX
mutexes.


Finally, named semaphores:

   semp = sem_open("somefile", O_CREAT|O_EXCL, 0600)

   sem_wait (semp);

   sem_close(semp);
   sem_unlink(semp);


This is the only semaphore kind which maps nicely to a pipe or socket.
All the others don't.  And even for named semaphores it is best to
have a separate name space like the shmfs.

 So you have unix file permissions on them ?

See above.  Permissions are only allowed for named semaphores.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alan Cox

 I don't want nor need file permissions.  A program would look like this:

Your example opens/mmaps so has file permissions. Which is what I was asking

 The shared mem segment can be retrieved in whatever way.  The mutex in
 this case is anonymous.  Everybody who has access to the shared mem
 *must* have access to the mutex.

We agree 8)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ingo Oeser

On Thu, Apr 19, 2001 at 12:26:03PM -0700, Ulrich Drepper wrote:
 In any case all kinds of user-level operations are possible as well
 and all the schemes suggested for dealing with the common case without
 syscalls can be applied here as well.

Are you sure, you can implement SMP-safe, atomic operations (which you need
for all up()/down() in user space) WITHOUT using privileged
instructions on ALL archs Linux supports?

How do we do this on nccNUMA machines later? How on clusters[1]?

On what I can see in asm-*/atomic.h this is not possible, but I
probably miss sth. here ;-)

I didn't know that POSIX forbids using fds to implement a
semaphore. That's VERY bad.

Learning new APIs always means making a lot of mistakes and doing
this while we write production code, since nobody likes to pay for
experiments.

And I still see no point on speeding of creation and contention,
since these should be rare cases and the application overusing
these should be punished HARD.

Maybe someone can enlighten my on these aspects.

Regards

Ingo Oeser

[1] Ok, people already use other than Unix mechanisms for this
   stuff on massive parallel computing. So this might not be an
   issue. Only for libc internal sema4s
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag http://www.tu-chemnitz.de/linux/tag
  been there and had much fun   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Alan Cox [EMAIL PROTECTED] writes:

  I don't want nor need file permissions.  A program would look like this:
 
 Your example opens/mmaps so has file permissions. Which is what I was asking

There are no permissions on the mutex object.  It is the shared memory
which counts.  If you would implement the global mutexes as
independent objects in the filesystem hierarchy you would somehow
magically make the permissions match those of the object containing
the memory representation of the global semaphore.


   fd = open("somefile", O_CREAT|O_TRUNC, 0666)
   addr=mmap(fd)
   // assume attr is for a global mutex
   pthread_mutex_init((pthread_mutex_t*)addr, attr)
   fchmod(fd, 0600)
   fchown(fd, someuser, somegroup)

If pthread_mutex_attr() is allocating some kind of file, how do you
determine the permissions?  How are they changed if the permissions to
the file change?

The kernel representation of the mutex must not be disassociated from
the shared memory region.

Even if you all think very little about Solaris, look at the kernel
interface for semaphores.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Ingo Oeser [EMAIL PROTECTED] writes:

 Are you sure, you can implement SMP-safe, atomic operations (which you need
 for all up()/down() in user space) WITHOUT using privileged
 instructions on ALL archs Linux supports?

Which processors have no such instructions but are SMP-capable?

 How do we do this on nccNUMA machines later? How on clusters[1]?

Clusters are not my problem.  They require additional software.  And
NUMA machines maybe be requiring a certain sequence in which the
operations must be performed and the hardware should take care of the
rest.


I don't really care what the final implementation will be like.  For
UP and SMP machines I definitely want to have as much as possible at
user-level.  If you need a special libpthread for NUMA machines, so be
it.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ingo Oeser

On Thu, Apr 19, 2001 at 09:11:56AM -0700, Linus Torvalds wrote:
 No, this is NOT what the UNIX dogmas are all about.
 
 When UNIX says "everything is a file", it really means that "everything is
 a stream of bytes". Things like magic operations on file desciptors are
 _anathema_ to UNIX. ioctl() is the worst wart of UNIX. Having magic
 semantics of file descriptors is NOT Unix dogma at all, it is a horrible
 corruption of the original UNIX cleanlyness.

Right. And on semaphores, this stream is exactly 0 bytes long.
This is perfectly normal and can be handled by all applications
I'm aware of.

My idea violates nothing here.

 Please don't excuse "semaphore file descriptors" with the "everything is a
 file" mantra. It is not at ALL applicable.
 
 The "everything is a file" mantra is to make pipe etc meaningful -
 processes don't have to worry about whether the fd they have is from a
 file open, a pipe() system call, opening a special block device, or a
 socket()+connect() thing. They can just read and write. THAT is what UNIX
 is all about.
 
Right. And with my approach read() and write() with a buffer
pointer != NULL would either yield an return value of "0" or
-1 and set errno=EINVAL ("object not suitable for reading/writing").
Anyway they should return IMMIDIATELY in these cases.

We already have these special semantics with devices. Look at
/dev/sgX for an example how we pass even structured data via
normal read/write (instead of "stream of bytes").

 And this is obviously NOT true of a "magic file descriptors for
 semaphores". You can't pass it off as stdin to another process and expect
 anything useful from it unless the other process _knows_ it is a special
 semaphore thing and does mmap magic or something.

see above. NOTHING special about this idea. No magic handling
involved, unless the user of the fd knows what it is. For other
users it will be just a normal fd with normal operations, since
the special case is hidden well enough. 

This is even WAY simpler as all that tty-crap and similar
devices, which read/write very dependend on their actual ioctl
configuration.

But since stupid POSIX forbids using fds for semaphores
(according to Ulrich Drepper), this nice, simple and
non-intrusive solution is out.

Instead we should go with several new syscalls, user space
dependencies, strange error handling and yet-to-discuss
semantics.

Everybody else byt you would have been kicked out by the core
people for suggesting this ;-)

Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag http://www.tu-chemnitz.de/linux/tag
  been there and had much fun   
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On 19 Apr 2001, Ulrich Drepper wrote:

 Linus Torvalds [EMAIL PROTECTED] writes:

  Looks good to me. Anybody want to try this out and test some benchmarks?

 I fail to see how this works across processes.

It's up to FS_create() to create whatever shared mapping is needed.

For threads, you don't need anything special.

For fork()'d helper stuff, you'd use MAP_ANON | MAP_SHARED.

For execve(), you need shm shared memory or MAP_SHARED on a file.

It all depends on your needs.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Ingo Oeser wrote:

 Are you sure, you can implement SMP-safe, atomic operations (which you need
 for all up()/down() in user space) WITHOUT using privileged
 instructions on ALL archs Linux supports?

Why do you care?

Sure, there are broken architectures out there. They'd need system calls.
They'd be slow. That's THEIR problem.

No sane architecture has this limitation.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On Thu, 19 Apr 2001, Ingo Oeser wrote:

 On Thu, Apr 19, 2001 at 09:11:56AM -0700, Linus Torvalds wrote:
  No, this is NOT what the UNIX dogmas are all about.
 
  When UNIX says "everything is a file", it really means that "everything is
  a stream of bytes". Things like magic operations on file desciptors are
  _anathema_ to UNIX. ioctl() is the worst wart of UNIX. Having magic
  semantics of file descriptors is NOT Unix dogma at all, it is a horrible
  corruption of the original UNIX cleanlyness.

 Right. And on semaphores, this stream is exactly 0 bytes long.
 This is perfectly normal and can be handled by all applications
 I'm aware of.

It's perfectly normal, but it does NOT conform to the idea "everything is
a file".

The fact that there are other ugly examples (ioctls and special files)
does not mean that adding a new one is a good idea.

When people say "everything is a file", they mean that it can be _used_ as
a file, not that it can passably return a valid error code.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Linus Torvalds [EMAIL PROTECTED] writes:

  I fail to see how this works across processes.
 
 It's up to FS_create() to create whatever shared mapping is needed.

No, the point is that FS_create is *not* the one creating the shared
mapping.  The user is explicitly doing this her/himself.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alan Cox

 Are you sure, you can implement SMP-safe, atomic operations (which you need
 for all up()/down() in user space) WITHOUT using privileged
 instructions on ALL archs Linux supports?

You don't need to. For some architectures the semaphore code would always call
into the kernel. For those that allow fast locks in userspace it won't. The
API is the thing, and the public exposure would I assume be pthreads 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Linus Torvalds



On 19 Apr 2001, Ulrich Drepper wrote:
 Linus Torvalds [EMAIL PROTECTED] writes:

   I fail to see how this works across processes.
 
  It's up to FS_create() to create whatever shared mapping is needed.

 No, the point is that FS_create is *not* the one creating the shared
 mapping.  The user is explicitly doing this her/himself.

No.

Who creates the shared mapping is _irrelevant_, because it ends up being
entirely a function of what the chosen interface is.

For example, quote often you want semaphores for threading purposes only,
and then you don't need a shared mapping at all. So you'd use the proper
interfaces for that, and for that, your "thread_semaphore()" function
would just do a malloc() and initialize the memory to zero. Doing a mmap
or something like that would just be stupid, because you're protecting
only one VM space anyway.

In other cases, you may need to have process-wide semaphores, and you'd
use "process_semaphore(char *ID)" or something, which actually does a
mmap() on a shared file. Or you'd have "fork_semaphore()" that creates a
semaphore that is valid across forks, not not valid across execve's and
cannot be passed around.

So normally the user does NOT create the shared mapping himself. Normally
you'd just use the "proper interface" for your needs, nothing more.

Sure, you can have the option of saying "I've created this shared memory
region, please make it use the generic semaphore engine code", but quite
frankly I think that is a BAD IDEA. Why? Because it won't work portably
across architectures anyway. You don't know what the requirements of the
architecture are, so it should be done by a nice "semaphore library". NOT
by the user.

Remember: these semaphores are NOT a new SysV bogosity. These semaphores
are a new interface, with sane performance and sane design. And you can
have multiple external interfaces to the same "semaphore engine".

I'm not interested in re-creating the idiocies of Sys IPC.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Rogier Wolff

Alan Cox wrote:
   libc is entitled to, and most definitely does exactly that. Take a look at
   things like gethostent, getpwent etc etc.
  
  Ehh.. I will bet you $10 USD that if libc allocates the next file
  descriptor on the first "malloc()" in user space (in order to use the
  semaphores for mm protection), programs _will_ break.
  
  You want to take the bet?
 
 Its not normally a good idea to take a Linus bet, but this time Im obviously
 missing something. fd0-2 will be passed in (and if not then shit already
 happens - see old bugtraq on the matter for setuid apps, glibc bugs)
 
 So the C library gets fd 3
 My first fopen gets fd 4.

Code may 
close (0);
close (1);
close (2); 
... 
malloc (); 

/* Now open our controlling TTY/ stdin .. */ 
fd = open (... ) ;

After taking care of this (*), problem I find the fd trick WAY more
appealing than Linus' magic numbers. With file descriptors we have a
"small integer which can be validated quickly". We also have storage
for a private pointer somewhere in the fd structure. 

If people are TOO afraid of breaking something, creating a new set of
small integers handled similarly as "fds" would do fine. (Maybe here
we'd allocate just a few, and reallocate when neccesary).

Roger. 

(*) I bet that 
get_sem_fd () 
{
int rv;
int fd; 
fd = get_fd ();
if (fd  5) {
rv = get_sem_fd ();
close(fd);
fd = rv; 
  } 
return fd; 
}

will not break much. (UGLY coding. Don't tell me.)

-- 
** [EMAIL PROTECTED] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots. 
* There are also old, bald pilots. 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Linus Torvalds [EMAIL PROTECTED] writes:

 I'm not interested in re-creating the idiocies of Sys IPC.

I'm not talking about sysv semaphores (couldn't care less).  And you
haven't read any of the mails with examples I sent.

If the new interface can be useful for anything it must allow to
implement process-shared POSIX mutexes.  The user-level representation
of these mutexes are simple variables which in the case of
inter-process mutexes are placed in shared memory.  These variables
must be usable with the normal pthread_mutex_lock() functions and
perform whatever is needed.

Whether the pthread_mutex_init() function for shared mutexes is doing
a lot more work and allocates even more memory, I don't care.  The
standard certainly permits this and every pthread_mutex_init() must
have a pthread_mutex_destroy() which allows allocating and freeing
resources (no file descriptor, though).  So, yes, your FS_create
syscall can allocate something.

But the question is what handle to put in the pthread_mutex_t variable
so the different processes can use the mutex.  It cannot be a file
descriptor since it's not shared between processes.  It cannot be a
pointer to some other place in the virtual memory since the place
pointed to might not be (and probably isn't if FS_create is allocating
something in the process setting up the mutex).  You could put some
magic cookie in the pthread_mutex_t object the kernel can then use.


So, instead of repeating over and over again the same old story, fill
in the gaps here:


  int
  pthread_mutex_init (pthread_mutex_t *mutex,
  const pthread_mutexattr_t *mutex_attr)
  {
if (mutex_attr != NULL  mutex_attr-__pshared != 0)
  {
... FILL IN HERE ...
  }
else
  ...intra-process mutex, uninteresting here...
  }

  int
  pthread_mutex_lock (pthread_mutex_t *mutex)
  {
if (mutex_attr != NULL  mutex_attr-__pshared != 0)
  {
... FILL IN HERE ...
  }
else
  ...intra-process mutex, uninteresting here...
  }

  int
  pthread_mutex_destroy (pthread_mutex_t *mutex)
  {
if (mutex_attr != NULL  mutex_attr-__pshared != 0)
  {
... FILL IN HERE ...
  }
else
  ...intra-process mutex, uninteresting here...
  }


These functions must work with something like this:

~ cons.c ~~
#include pthread.h
#include stdio.h
#include stdlib.h
#include unistd.h
#include sys/mman.h

int
main (int argc, char *argv[])
{
  char tmpl[] = "/tmp/fooXX";
  int fd = mkstemp (tmpl);
  pthread_mutexattr_t attr;
  pthread_mutex_t *m1;
  pthread_mutex_t *m2;
  void *addr;
  volatile int *i;

  pthread_mutexattr_init (attr);
  pthread_mutexattr_setpshared (attr, PTHREAD_PROCESS_SHARED);

  ftruncate (fd, 2 * sizeof (*m1) + sizeof (int));
  addr = mmap (NULL, sizeof (*m1), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
  m1 = addr;
  m2 = m1 + 1;
  i = (int *) (m2 + 1);
  *i = 0;

  pthread_mutex_init (m1, attr);
  pthread_mutex_lock (m1);

  pthread_mutex_init (m2, attr);
  pthread_mutex_lock (m2);

  if (fork () == 0)
{
  char buf[10];
  snprintf (buf, sizeof buf, "%d", fd);
  execl ("./prod", "prod", buf, NULL);
}

  while (1)
{
  pthread_mutex_lock (m1);
  printf ("*i = %d\n", *i);
  pthread_mutex_unlock (m2);
}

  return 0;
}
~~prod.c ~~
#include pthread.h
#include stdio.h
#include stdlib.h
#include unistd.h
#include sys/mman.h

int
main (int argc, char *argv[])
{
  int fd = atoi (argv[1]);
  void *addr;
  pthread_mutex_t *m1;
  pthread_mutex_t *m2;
  volatile int *i;

  addr = mmap (NULL, sizeof (*m1), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
  m1 = addr;
  m2 = m1 + 1;
  i = (int *) (m2 + 1);

  while (1)
{
  ++*i;
  pthread_mutex_unlock (m1);
  pthread_mutex_lock (m2);
}

  return 0;
}
~~~

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Alexander Viro



On 19 Apr 2001, Ulrich Drepper wrote:

 Linus Torvalds [EMAIL PROTECTED] writes:
 
  I'm not interested in re-creating the idiocies of Sys IPC.
 
 I'm not talking about sysv semaphores (couldn't care less).  And you
 haven't read any of the mails with examples I sent.
 
 If the new interface can be useful for anything it must allow to
 implement process-shared POSIX mutexes.

Pardon me the bluntness, but... Why?
* on _any_ UNIX we can implement semaphore (object that has Dijkstra's
P and V operations, whatever) shared by processes that have access to pipe.
In a portable way. That's the part of pipe semantics that had been there
since way before v6. Pre-sysv, pre-POSIX, etc. When named pipes appeared
the same semantics had been carried to them. Agreed so far?
* if we have shared memory _and_ some implementation of semaphores
we can (on architectures that allow atomic_dec() and atomic_inc()) produce
semaphores that work via memory access in uncontended case and use slow
semaphores to handle contention side of the business. Nothing UNIX-specific
here.
* such objects _are_ useful. They are reasonably portable and
if they fit the task at hand and are cheaper than POSIX mutexes - that's
all rationale one could need for using them.

Sure, the variant I've posted was intra-process only, simply because it
uses normal pipes. Implementation with named pipes is also trivial -
when you map the shared area, allocate private one of the corresponding
size and keep descriptors there. End of story.

AFAICS mechanism is portable enough (and even on the architectures that
do not allow atomic userland operations we can survive - just fall back
to "slow" ones via read()/write() on pipes).  And excuse me, but when
one writes an application code the question is not "how to make it use
POSIX semaphores", it's "how to get the serialization I need in a
portable way".

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-19 Thread Ulrich Drepper

Alexander Viro [EMAIL PROTECTED] writes:

  If the new interface can be useful for anything it must allow to
  implement process-shared POSIX mutexes.
 
 Pardon me the bluntness, but... Why?

Because otherwise there is no reason to even waste a second with this.
At least for me and everybody else who has interest in portable solutions.

I don't care how it's implemented.  Look at the code example I posted.
If you can provide an implementation which can implement anonymous
inter-process mutexes then ring again.  Until then I'll wait.  If you
implement something else I couldn't care less since it's useless for
me.

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: light weight user level semaphores

2001-04-18 Thread Ulrich Drepper

Linus Torvalds <[EMAIL PROTECTED]> writes:

Sounds good so far.  Some comments.

>  - FS_create is responsible for allocating a shared memory region
>at "FS_create()" time.

This is not so great.  The POSIX shared semaphores require that an
pthread_mutex_t object placed in a shared memory region can be
initialized to work across process boundaries.  I.e., the FS_create
function would actually be FS_init.  There is no problem with the
kernel or the helper code at user level allocating more storage (for
the waitlist of whatever) but it must not be necessary for the user to
know about them and place them in share memory themselves.

The situation for non-shared (i.e. intra-process) semaphores are
easier.  What I didn't understand is your remark about fork.  The
semaphores should be cloned.  Unless the shared flag is set there
should be no sharing among processes.


The rest seems OK.  Thanks,

-- 
---.  ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \,---'   \  Sunnyvale, CA 94089 USA
Red Hat  `--' drepper at redhat.com   `
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



  1   2   >