Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-30 Thread Denis Vlasenko
On Thursday 08 March 2007 18:28, Linus Torvalds wrote:
> The sad part is that there really is no reason why the BSD crowd couldn't 
> have done recvmsg() as an "extended read with per-system call flags", 
> which would have made things like O_NONBLOCK etc unnecessary, because you 
> could do it just with MSG_DONTWAIT..

Wait a second here... O_NONBLOCK is not just unnecessary - it's buggy!

Try to do nonblocking read from stdin (fd #0) -
* setting O_NONBLOCK with fcntl will set it for all other processes
  which has the same stdin!
* trying to reset O_NONBLOCK after the read doesn't help (think kill -9)
* duping fd #0 doesn't help because O_NONBLOCK is not per-fd,
  it's shared just like filepos.

I really like that trick with recvmsg + MSG_DONTWAIT instead.
--
vda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-30 Thread Denis Vlasenko
On Thursday 08 March 2007 18:28, Linus Torvalds wrote:
 The sad part is that there really is no reason why the BSD crowd couldn't 
 have done recvmsg() as an extended read with per-system call flags, 
 which would have made things like O_NONBLOCK etc unnecessary, because you 
 could do it just with MSG_DONTWAIT..

Wait a second here... O_NONBLOCK is not just unnecessary - it's buggy!

Try to do nonblocking read from stdin (fd #0) -
* setting O_NONBLOCK with fcntl will set it for all other processes
  which has the same stdin!
* trying to reset O_NONBLOCK after the read doesn't help (think kill -9)
* duping fd #0 doesn't help because O_NONBLOCK is not per-fd,
  it's shared just like filepos.

I really like that trick with recvmsg + MSG_DONTWAIT instead.
--
vda
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-09 Thread Davide Libenzi
On Thu, 8 Mar 2007, David M. Lloyd wrote:

> On Wed, 2007-03-07 at 17:21 -0800, Davide Libenzi wrote:
> > int signalfd_dequeue(int fd, siginfo_t *info, long timeo);
> > 
> > The "fd" parameter must ba a signalfd file descriptor. The "info" parameter
> > is a pointer to the siginfo that will receive the dequeued signal, and
> > "timeo" is a timeout in milliseconds, or -1 for infinite.
> > The signalfd_dequeue function returns 0 if successfull.
> 
> Does this support non-blocking mode?  It doesn't seem to at my level of
> understanding anyway.  If I use this with EPOLLET for example, I'd
> expect to get a single EPOLLIN when a signal arrives, which would
> indicate to me that I must call signalfd_dequeue() in a loop until I get
> EAGAIN in order to be sure I've consumed all the outstanding signals so
> that the edge-triggered notification can be "re-armed".
> 
> Make sense?

Yes, the new version can use the O_NONBLOCK flag and read(2) will return 
EAGAIN.
Will post a newer version later, together with timerfd ...



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-09 Thread Kent Overstreet

On 3/8/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:

Which is why you introduced a new system call, but that leads to all the
problems with the file descriptor no longer being *usable*.

Think scripts. It's easy to do reads in perl scripts, and parse the
output. In contrast, making perl use a new system call is quite
challenging.


If you're going to allow reads from the fd, then it makes sense to let processes
get the fd by opening a file; if you've got to call siginfo_create()
or what have
you, the fact that you can do reads isn't terribly useful to perl or
what have you.

Inotify even makes you get events with read() today; but you have to use system
calls to get the fd and to send events (inotify_add_watch, inotify_rm_watch),
which is something I've never understood.

It would, to me, be very cool if everything that was a stream of bytes
really was
a file; this is something I've always thought plan 9 got very right. Epoll and
inotify could work this way, and I'm sure there's others I'm forgetting.

If people are interested in this sort of thing, I'll start working on
a patch. Are there
any thoughts on where these sorts of pseudodevices should be these days?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-09 Thread Kent Overstreet

On 3/8/07, Linus Torvalds [EMAIL PROTECTED] wrote:

Which is why you introduced a new system call, but that leads to all the
problems with the file descriptor no longer being *usable*.

Think scripts. It's easy to do reads in perl scripts, and parse the
output. In contrast, making perl use a new system call is quite
challenging.


If you're going to allow reads from the fd, then it makes sense to let processes
get the fd by opening a file; if you've got to call siginfo_create()
or what have
you, the fact that you can do reads isn't terribly useful to perl or
what have you.

Inotify even makes you get events with read() today; but you have to use system
calls to get the fd and to send events (inotify_add_watch, inotify_rm_watch),
which is something I've never understood.

It would, to me, be very cool if everything that was a stream of bytes
really was
a file; this is something I've always thought plan 9 got very right. Epoll and
inotify could work this way, and I'm sure there's others I'm forgetting.

If people are interested in this sort of thing, I'll start working on
a patch. Are there
any thoughts on where these sorts of pseudodevices should be these days?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-09 Thread Davide Libenzi
On Thu, 8 Mar 2007, David M. Lloyd wrote:

 On Wed, 2007-03-07 at 17:21 -0800, Davide Libenzi wrote:
  int signalfd_dequeue(int fd, siginfo_t *info, long timeo);
  
  The fd parameter must ba a signalfd file descriptor. The info parameter
  is a pointer to the siginfo that will receive the dequeued signal, and
  timeo is a timeout in milliseconds, or -1 for infinite.
  The signalfd_dequeue function returns 0 if successfull.
 
 Does this support non-blocking mode?  It doesn't seem to at my level of
 understanding anyway.  If I use this with EPOLLET for example, I'd
 expect to get a single EPOLLIN when a signal arrives, which would
 indicate to me that I must call signalfd_dequeue() in a loop until I get
 EAGAIN in order to be sure I've consumed all the outstanding signals so
 that the edge-triggered notification can be re-armed.
 
 Make sense?

Yes, the new version can use the O_NONBLOCK flag and read(2) will return 
EAGAIN.
Will post a newer version later, together with timerfd ...



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Fri, 9 Mar 2007, Oleg Nesterov wrote:

> Ugh. Still can't understand, probably I missed something or misread this 
> patch.
> 
> If we shift signalfd_notify() from 
> specific_send_sig_info/__group_send_sig_info
> to send_signal(), we have the same "list_empty()" fastpath if no signalfds are
> attached to the sighand. The difference is that we don't count sig_ignored()
> signals, which looks right to me.

Now I shifted the check into send_signal(), since signalfd now uses the 
standard dequeue_signal, and hence compete with standard signal delivery 
against the queue. This was initial Linus design.




- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Oleg Nesterov
On 03/08, Davide Libenzi wrote:
>
> On Fri, 9 Mar 2007, Oleg Nesterov wrote:
> 
> > > Logic is, if it's not an RT signal, queue only one, otherwise multiple.
> > > The bit on the ->pending mask is clealer only when the queue slot becomes 
> > > empty.
> > 
> > Yes, I see what the code does, but I don't undestand why. For example, 
> > SIGCHLD was
> > delivered to the process _and_ handled several times, then 
> > sys_signalfd_dequeue()
> > comes and finds only one siginfo. Isn't this strange?
> 
> That's the same logic the kernel folows for non-RT signals. 

Yes, specific_send_sig_info/__group_send_sig_info "ignores" the non-RT signal

- if we already have a pending one (blocked, or not yet handled)

- the only reason is backward compatibility

signalfd:

- if the same signr was not fetched via sys_signalfd_dequeue()

- the reason is 

Suppose we are doing a simple application which just logs all signals
which were sent to this process. Now, we can miss the signal if the
logging blocks.

Think about sub-threads. We can send the same non-RT signal to T1, T2, T3
via specific_send_sig_info(). All 3 signals will be delivered, but signalfd
(which is process wide) will see only the first.

> > > The two trasports can rely on different masks. The signalfd_notify() does 
> > > not even go in signalfd_deliver() if no signalfds are attached to the 
> > > sighand.
> > 
> > Sorry, I don't understand. The masks are different, yes, but ->sighand is 
> > the
> > same? How this can make any difference for "if no signalfds are attached" ?
> 
> The list_empty() che would not make you fall inside signalfd_deliver(), 
> hence the fast path really lives up to its name ;)

Ugh. Still can't understand, probably I missed something or misread this patch.

If we shift signalfd_notify() from specific_send_sig_info/__group_send_sig_info
to send_signal(), we have the same "list_empty()" fastpath if no signalfds are
attached to the sighand. The difference is that we don't count sig_ignored()
signals, which looks right to me.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Linus Torvalds


On Thu, 8 Mar 2007, Jeremy Fitzhardinge wrote:
> 
> The difficulty is that there are 4 different formats of signal structure
> you could get: (traditional|siginfo) x (32bit|64bit).

See my suggestion of a fixed-format (and much cleaned up) pseudo-siginfo 
thing earlier in this thread, and also actually mentioned as a "good to 
do" in my original email from 2003 that did the original patch ;)

> (Hey, can you send signals by writing into the signalfd?  Very plan9...)

It's an obvious extension. Whether there is any real point to it or not, I 
dunno. After all, you could just send the signal instead.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Jeremy Fitzhardinge
Linus Torvalds wrote:
> So I think you should get rid of signalfd_dequeue(), and just replace it 
> with a "read()" function.
>   

The difficulty is that there are 4 different formats of signal structure
you could get: (traditional|siginfo) x (32bit|64bit).

What happens if you're a 32 bit process, you fork and exec a 64bit
process who inherits the signalfd, and they start reading?  What format
of signal structure do they get?  What do you get?  What if you start
doing partial reads?

I think signalfd_dequeue() is warty, but read() has has a number of
details to sort out.

(Hey, can you send signals by writing into the signalfd?  Very plan9...)

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Fri, 9 Mar 2007, Oleg Nesterov wrote:

> > Logic is, if it's not an RT signal, queue only one, otherwise multiple.
> > The bit on the ->pending mask is clealer only when the queue slot becomes 
> > empty.
> 
> Yes, I see what the code does, but I don't undestand why. For example, 
> SIGCHLD was
> delivered to the process _and_ handled several times, then 
> sys_signalfd_dequeue()
> comes and finds only one siginfo. Isn't this strange?

That's the same logic the kernel folows for non-RT signals. 


> > The two trasports can rely on different masks. The signalfd_notify() does 
> > not even go in signalfd_deliver() if no signalfds are attached to the 
> > sighand.
> 
> Sorry, I don't understand. The masks are different, yes, but ->sighand is the
> same? How this can make any difference for "if no signalfds are attached" ?

The list_empty() che would not make you fall inside signalfd_deliver(), 
hence the fast path really lives up to its name ;)


> Also. A malicious user can eat all memory, 
> signalfd_deliver()->kmem_cache_alloc()
> doesn't check any limits.

I'll make that use the std dequeu_signal, so everything is handled in there.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, Linus Torvalds wrote:

> 
> 
> On Fri, 9 Mar 2007, Oleg Nesterov wrote:
> > 
> > Also. A malicious user can eat all memory, 
> > signalfd_deliver()->kmem_cache_alloc()
> > doesn't check any limits.
> 
> This, btw, is one reason I *really* think signalfd() should just use the 
> same old signal queue, and not try to make its own.
> 
> Signal queueing and unqueuing simply isn't that simple. 

Alrighty. Back to the dequeue_signal ...



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Linus Torvalds


On Fri, 9 Mar 2007, Oleg Nesterov wrote:
> 
> Also. A malicious user can eat all memory, 
> signalfd_deliver()->kmem_cache_alloc()
> doesn't check any limits.

This, btw, is one reason I *really* think signalfd() should just use the 
same old signal queue, and not try to make its own.

Signal queueing and unqueuing simply isn't that simple. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Oleg Nesterov
On 03/08, Davide Libenzi wrote:
>
> On Thu, 8 Mar 2007, Oleg Nesterov wrote:
> 
> > Davide Libenzi wrote:
> > >
> > > +int signalfd_deliver(struct sighand_struct *sighand, int sig, struct 
> > > siginfo *info)
> > > +{
> > > + int nsig = 0;
> > > + struct list_head *pos;
> > > + struct signalfd_ctx *ctx;
> > > + struct signalfd_sq *sq;
> > > +
> > > + list_for_each(pos, >sfdlist) {
> > > + ctx = list_entry(pos, struct signalfd_ctx, lnk);
> > > + /*
> > > +  * We use a negative signal value as a way to broadcast that the
> > > +  * sighand has been orphaned, so that we can notify all the
> > > +  * listeners about this.
> > > +  */
> > > + if (sig < 0)
> > > + __wake_up_locked(>wqh, TASK_UNINTERRUPTIBLE | 
> > > TASK_INTERRUPTIBLE);
> > > + else if (sigismember(>sigmask, sig) &&
> > > +  (sig >= SIGRTMIN || !sigismember(>pending, sig))) 
> > > {
> > > + sigaddset(>pending, sig);
> > 
> > I don't understand the "(sig >= SIGRTMIN || !sigismember(>pending, 
> > sig))"
> > check. This mimics the LEGACY_QUEUE() check, but seems strange. The signal 
> > may
> > be pending in ctx->pending just because it was not signalfd_fetchsig()ed, 
> > yes?
> 
> Logic is, if it's not an RT signal, queue only one, otherwise multiple.
> The bit on the ->pending mask is clealer only when the queue slot becomes 
> empty.

Yes, I see what the code does, but I don't undestand why. For example, SIGCHLD 
was
delivered to the process _and_ handled several times, then 
sys_signalfd_dequeue()
comes and finds only one siginfo. Isn't this strange?

> > > @@ -780,6 +785,11 @@
> > >   BUG_ON(!irqs_disabled());
> > >   assert_spin_locked(>sighand->siglock);
> > >  
> > > + /*
> > > +  * Deliver the signal to listening signalfds ...
> > > +  */
> > > + signalfd_notify(t->sighand, sig, info);
> > > +
> > >   /* Short-circuit ignored signals.  */
> > >   if (sig_ignored(t, sig))
> > >   goto out;
> > > @@ -968,6 +978,11 @@
> > >   assert_spin_locked(>sighand->siglock);
> > >   handle_stop_signal(sig, p);
> > >  
> > > + /*
> > > +  * Deliver the signal to listening signalfds ...
> > > +  */
> > > + signalfd_notify(p->sighand, sig, info);
> > > +
> > >   /* Short-circuit ignored signals.  */
> > >   if (sig_ignored(p, sig))
> > >   return ret;
> > 
> > It is strange that we are doing signalfd_notify() even if the signal is 
> > ignored.
> > Isn't it better to shift signalfd_notify() into send_signal() ? This way we 
> > do
> > not need the special check in signalfd_deliver() above.
> 
> The two trasports can rely on different masks. The signalfd_notify() does 
> not even go in signalfd_deliver() if no signalfds are attached to the 
> sighand.

Sorry, I don't understand. The masks are different, yes, but ->sighand is the
same? How this can make any difference for "if no signalfds are attached" ?

OK. What is the purpose of signalfd? Should it record the signals which were
sent to the process, or only those which were delivered? The latter looks
more natural for me. But inn any case, I don't see a reason to check ->pending
mask in signalfd_deliver().

BTW, sys_signalfd_dequeue() re-queues signalfd_sq if copy_siginfo_to_user()
fails. Why not -EFAULT?

Also. A malicious user can eat all memory, 
signalfd_deliver()->kmem_cache_alloc()
doesn't check any limits.

> > Also, this patch doesn't take send_sigqueue/send_group_sigqueue into 
> > account.
> 
> I added that too. but I noticed something strange, dunno if intentional or 
> not. In send_sigqueue and send_group_sigqueue, the check for the 
> timer-special and the ignored is inverted. This lead to two different 
> behaviours. Is there a reason for that?

I was wondering about that too.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Marko Macek

Linus Torvalds wrote:


On Thu, 8 Mar 2007, Davide Libenzi wrote:
So, to cut it short, I can do the pseudo-siginfo read(2), but I don't 
like it too much (little, actually). The siginfo, as bad as it is, is a 
standard used in many POSIX APIs (hence even in kernel), and IMO if we 
want to send that back, a struct siginfo should be.

No?


I think it's perfectly fine if you make it "struct siginfo" (even though I 
think it's a singularly ugly struct). It's just that then you'd have to 
make your read() know whether it's a compat-read or not, which you really 
can't.


Which is why you introduced a new system call, but that leads to all the 
problems with the file descriptor no longer being *usable*.


Think scripts. It's easy to do reads in perl scripts, and parse the 
output. In contrast, making perl use a new system call is quite 
challenging.


Probably, but someone will have to add the 'signalfd' system call anyway.

And *that* is why "everything is a stream of bytes" is so important. You 
don't know where the file descriptor has been, or who uses it. Special 
system calls for special file descriptors are just *wrong*.


After all, that's why we'd have a signalfd() in the first place: exactly 
so that you do *not* have to use special system calls, but can just pass 
it on to any event waiting mechanism like select, poll, epoll. The same is 
just *even*more*true* when it comes to reading the data!


The problem with read() returning arbitrary unstructured data is that 
there is hard to do versioning/extensibility, since the userspace can't 
specify the requested/expected format. The only way it could be done is 
by the (nbytes) parameter to read() which is not very nice (and useless 
for scripts).


This is the same problem that makes sysfs/procfs fragile unless the
file format is very well specified for extensibility (and it's easy to
f-it up, since there seems to be little consistency there... using
the XML horror would actually be an improvement). Breaking sysfs/procfs
might be acceptable once every few years, but signal handling will be
part of every application event loop and there is no room for breaking 
anything.


(although, one could to the versioning the ugly way by creating the new 
'signalfd' syscall instead).


I'd say: make read() return the signal number (probably as 4-byte int, 
in network byte order), but for everything else, use the system call.


Mark

P.S. I'm currently worried if the fact that FUTEX_FD is being deprecated 
is a problem :)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, Oleg Nesterov wrote:

> Davide Libenzi wrote:
> >
> > +int signalfd_deliver(struct sighand_struct *sighand, int sig, struct 
> > siginfo *info)
> > +{
> > +   int nsig = 0;
> > +   struct list_head *pos;
> > +   struct signalfd_ctx *ctx;
> > +   struct signalfd_sq *sq;
> > +
> > +   list_for_each(pos, >sfdlist) {
> > +   ctx = list_entry(pos, struct signalfd_ctx, lnk);
> > +   /*
> > +* We use a negative signal value as a way to broadcast that the
> > +* sighand has been orphaned, so that we can notify all the
> > +* listeners about this.
> > +*/
> > +   if (sig < 0)
> > +   __wake_up_locked(>wqh, TASK_UNINTERRUPTIBLE | 
> > TASK_INTERRUPTIBLE);
> > +   else if (sigismember(>sigmask, sig) &&
> > +(sig >= SIGRTMIN || !sigismember(>pending, sig))) 
> > {
> > +   sigaddset(>pending, sig);
> 
> I don't understand the "(sig >= SIGRTMIN || !sigismember(>pending, sig))"
> check. This mimics the LEGACY_QUEUE() check, but seems strange. The signal may
> be pending in ctx->pending just because it was not signalfd_fetchsig()ed, yes?

Logic is, if it's not an RT signal, queue only one, otherwise multiple.
The bit on the ->pending mask is clealer only when the queue slot becomes empty.



> Please also see below.
> 
> > +asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t 
> > sizemask)
> > +{
> >
> > [...snip...]
> >
> > +   } else {
> > +   error = -EBADF;
> > +   file = fget(ufd);
> > +   if (!file)
> > +   goto err_exit;
> > +   ctx = file->private_data;
> > +   error = -EINVAL;
> > +   if (file->f_op != _fops) {
> > +   fput(file);
> > +   goto err_exit;
> > +   }
> > +   spin_lock_irq(>sighand->siglock);
> > +   ctx->sigmask = sigmask;
> > +   spin_unlock_irq(>sighand->siglock);
> > +   wake_up(>wqh);
> 
> Can't this race with sys_signalfd_dequeue() which use lockless 
> __add_wait_queue()?
> Looks like we should do __wake_up_locked() under ctx->sighand->siglock.

Yes, good catch. Fixed.



> > --- linux-2.6.20.ep2.orig/kernel/signal.c   2007-03-07 15:55:43.0 
> > -0800
> > +++ linux-2.6.20.ep2/kernel/signal.c2007-03-07 15:59:01.0 
> > -0800
> >
> > [...snip...]
> >
> > @@ -780,6 +785,11 @@
> > BUG_ON(!irqs_disabled());
> > assert_spin_locked(>sighand->siglock);
> >  
> > +   /*
> > +* Deliver the signal to listening signalfds ...
> > +*/
> > +   signalfd_notify(t->sighand, sig, info);
> > +
> > /* Short-circuit ignored signals.  */
> > if (sig_ignored(t, sig))
> > goto out;
> > @@ -968,6 +978,11 @@
> > assert_spin_locked(>sighand->siglock);
> > handle_stop_signal(sig, p);
> >  
> > +   /*
> > +* Deliver the signal to listening signalfds ...
> > +*/
> > +   signalfd_notify(p->sighand, sig, info);
> > +
> > /* Short-circuit ignored signals.  */
> > if (sig_ignored(p, sig))
> > return ret;
> 
> It is strange that we are doing signalfd_notify() even if the signal is 
> ignored.
> Isn't it better to shift signalfd_notify() into send_signal() ? This way we do
> not need the special check in signalfd_deliver() above.

The two trasports can rely on different masks. The signalfd_notify() does 
not even go in signalfd_deliver() if no signalfds are attached to the 
sighand.



> Also, this patch doesn't take send_sigqueue/send_group_sigqueue into account.

I added that too. but I noticed something strange, dunno if intentional or 
not. In send_sigqueue and send_group_sigqueue, the check for the 
timer-special and the ignored is inverted. This lead to two different 
behaviours. Is there a reason for that?




- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Marko Macek

Linus Torvalds wrote:


On Thu, 8 Mar 2007, Davide Libenzi wrote:
So, to cut it short, I can do the pseudo-siginfo read(2), but I don't 
like it too much (little, actually). The siginfo, as bad as it is, is a 
standard used in many POSIX APIs (hence even in kernel), and IMO if we 
want to send that back, a struct siginfo should be.

No?


I think it's perfectly fine if you make it "struct siginfo" (even though I 
think it's a singularly ugly struct). It's just that then you'd have to 
make your read() know whether it's a compat-read or not, which you really 
can't.


Which is why you introduced a new system call, but that leads to all the 
problems with the file descriptor no longer being *usable*.


Think scripts. It's easy to do reads in perl scripts, and parse the 
output. In contrast, making perl use a new system call is quite 
challenging.


Probably, but someone will have to add the 'signalfd' system call anyway.

And *that* is why "everything is a stream of bytes" is so important. You 
don't know where the file descriptor has been, or who uses it. Special 
system calls for special file descriptors are just *wrong*.


After all, that's why we'd have a signalfd() in the first place: exactly 
so that you do *not* have to use special system calls, but can just pass 
it on to any event waiting mechanism like select, poll, epoll. The same is 
just *even*more*true* when it comes to reading the data!


The problem with read() returning arbitrary unstructured data is that 
there is hard to do versioning/extensibility, since the userspace can't 
specify the requested/expected format. The only way it could be done is 
by the (nbytes) parameter to read() which is not very nice (and useless 
for scripts).


This is the same problem that makes sysfs/procfs fragile unless the
file format is very well specified for extensibility (and it's easy to
f-it up, since there seems to be little consistency there... using
the XML horror would actually be an improvement). Breaking sysfs/procfs
might be acceptable once every few years, but signal handling will be
part of every application event loop and there is no room for breaking 
anything.


(although, one could to the versioning the ugly way by creating the new 
'signalfd' syscall instead).


I'd say: make read() return the signal number (probably as 4-byte int, 
in network byte order), but for everything else, use the system call.


Mark

P.S. I'm currently worried if the fact that FUTEX_FD is being deprecated 
is a problem :)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Michael K. Edwards

On 3/8/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:

So anybody who would "go with the Berkeley crowd" really shows a lot of
bad taste, I'm afraid. The Berkeley crowd really messed up here, and it's
so long ago that I don't think there is any point in us trying to fix it
any more.


Well, they did invent the socket, which sucks less than a lot of other
things.  You would prefer STREAMS, perhaps?  :-)  My point is that
this is a message-oriented problem, not a stream-oriented problem, and
read() just isn't a very good interface for passing structured
messages around.  I agree that the details of recvmsg() ancillary data
are fairly grotty (optimization based on micro-benchmarks, as usual,
back in the PDP/VAX days), but the concept isn't that bad; you let
netlink sockets into the kernel, didn't you?


(But if somebody makes recvmgs a general VFS interface and makes it just
work for everything, I'd probably take the patch as a cleanup. I really
think it should have been a "struct file_operations" thing rather than
being a socket-only thing.. But since you couldn't portably use it
anyway, the thing is pretty moot)


sendmsg()/recvmsg() to a file makes perfect sense, if you put the
equivalent of an llseek tuple into ancillary data.  And it's also IMHO
the right way to do file AIO -- open up a netlink socket to the entity
that's scheduling the IOs for a given filesystem, use the SCM_RIGHTS
mechanism to do indirect opens (your credentials could come over an
AF_UNIX socket from a completely separate "open server" process), and
multiplex all the AIO to that filesystem across the netlink socket.

If adding this to VFS is something you would seriously consider, I'll
do the work.  But I will need some coaching to work around my relative
inexperience with the Linux core code and my lack of taste.  :-)

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Oleg Nesterov
Davide Libenzi wrote:
>
> +int signalfd_deliver(struct sighand_struct *sighand, int sig, struct siginfo 
> *info)
> +{
> + int nsig = 0;
> + struct list_head *pos;
> + struct signalfd_ctx *ctx;
> + struct signalfd_sq *sq;
> +
> + list_for_each(pos, >sfdlist) {
> + ctx = list_entry(pos, struct signalfd_ctx, lnk);
> + /*
> +  * We use a negative signal value as a way to broadcast that the
> +  * sighand has been orphaned, so that we can notify all the
> +  * listeners about this.
> +  */
> + if (sig < 0)
> + __wake_up_locked(>wqh, TASK_UNINTERRUPTIBLE | 
> TASK_INTERRUPTIBLE);
> + else if (sigismember(>sigmask, sig) &&
> +  (sig >= SIGRTMIN || !sigismember(>pending, sig))) 
> {
> + sigaddset(>pending, sig);

I don't understand the "(sig >= SIGRTMIN || !sigismember(>pending, sig))"
check. This mimics the LEGACY_QUEUE() check, but seems strange. The signal may
be pending in ctx->pending just because it was not signalfd_fetchsig()ed, yes?

Please also see below.

> +asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t 
> sizemask)
> +{
>
> [...snip...]
>
> + } else {
> + error = -EBADF;
> + file = fget(ufd);
> + if (!file)
> + goto err_exit;
> + ctx = file->private_data;
> + error = -EINVAL;
> + if (file->f_op != _fops) {
> + fput(file);
> + goto err_exit;
> + }
> + spin_lock_irq(>sighand->siglock);
> + ctx->sigmask = sigmask;
> + spin_unlock_irq(>sighand->siglock);
> + wake_up(>wqh);

Can't this race with sys_signalfd_dequeue() which use lockless 
__add_wait_queue()?
Looks like we should do __wake_up_locked() under ctx->sighand->siglock.

> --- linux-2.6.20.ep2.orig/kernel/signal.c 2007-03-07 15:55:43.0 
> -0800
> +++ linux-2.6.20.ep2/kernel/signal.c  2007-03-07 15:59:01.0 -0800
>
> [...snip...]
>
> @@ -780,6 +785,11 @@
>   BUG_ON(!irqs_disabled());
>   assert_spin_locked(>sighand->siglock);
>  
> + /*
> +  * Deliver the signal to listening signalfds ...
> +  */
> + signalfd_notify(t->sighand, sig, info);
> +
>   /* Short-circuit ignored signals.  */
>   if (sig_ignored(t, sig))
>   goto out;
> @@ -968,6 +978,11 @@
>   assert_spin_locked(>sighand->siglock);
>   handle_stop_signal(sig, p);
>  
> + /*
> +  * Deliver the signal to listening signalfds ...
> +  */
> + signalfd_notify(p->sighand, sig, info);
> +
>   /* Short-circuit ignored signals.  */
>   if (sig_ignored(p, sig))
>   return ret;

It is strange that we are doing signalfd_notify() even if the signal is ignored.
Isn't it better to shift signalfd_notify() into send_signal() ? This way we do
not need the special check in signalfd_deliver() above.

Also, this patch doesn't take send_sigqueue/send_group_sigqueue into account.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, Avi Kivity wrote:

> Davide Libenzi wrote:
> > So, to cut it short, I can do the pseudo-siginfo read(2), but I don't like
> > it too much (little, actually). The siginfo, as bad as it is, is a standard
> > used in many POSIX APIs (hence even in kernel), and IMO if we want to send
> > that back, a struct siginfo should be.
> >   
> 
> You can send the data in the 32/64 neutral format, and have glibc convert it
> to a siginfo, and everybody's happy.

Except Uli :)
The problem with the siginfo is that it is a brain-dead structure with 
unions and data dependent on types (and pointers). Pretty fugly.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Avi Kivity

Davide Libenzi wrote:
So, to cut it short, I can do the pseudo-siginfo read(2), but I don't 
like it too much (little, actually). The siginfo, as bad as it is, is a 
standard used in many POSIX APIs (hence even in kernel), and IMO if we 
want to send that back, a struct siginfo should be.
  


You can send the data in the 32/64 neutral format, and have glibc 
convert it to a siginfo, and everybody's happy.



--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, Linus Torvalds wrote:

> 
> 
> On Thu, 8 Mar 2007, Davide Libenzi wrote:
> >
> > So, to cut it short, I can do the pseudo-siginfo read(2), but I don't 
> > like it too much (little, actually). The siginfo, as bad as it is, is a 
> > standard used in many POSIX APIs (hence even in kernel), and IMO if we 
> > want to send that back, a struct siginfo should be.
> > No?
> 
> I think it's perfectly fine if you make it "struct siginfo" (even though I 
> think it's a singularly ugly struct). It's just that then you'd have to 
> make your read() know whether it's a compat-read or not, which you really 
> can't.
> 
> Which is why you introduced a new system call, but that leads to all the 
> problems with the file descriptor no longer being *usable*.
> 
> Think scripts. It's easy to do reads in perl scripts, and parse the 
> output. In contrast, making perl use a new system call is quite 
> challenging.
> 
> And *that* is why "everything is a stream of bytes" is so important. You 
> don't know where the file descriptor has been, or who uses it. Special 
> system calls for special file descriptors are just *wrong*.
> 
> After all, that's why we'd have a signalfd() in the first place: exactly 
> so that you do *not* have to use special system calls, but can just pass 
> it on to any event waiting mechanism like select, poll, epoll. The same is 
> just *even*more*true* when it comes to reading the data!

"Cheeseburger it is!" ;)
I'll revert back to read(2) with pseudo-siginfo and O_NONBLOCK handling...



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Linus Torvalds


On Thu, 8 Mar 2007, Davide Libenzi wrote:
>
> So, to cut it short, I can do the pseudo-siginfo read(2), but I don't 
> like it too much (little, actually). The siginfo, as bad as it is, is a 
> standard used in many POSIX APIs (hence even in kernel), and IMO if we 
> want to send that back, a struct siginfo should be.
> No?

I think it's perfectly fine if you make it "struct siginfo" (even though I 
think it's a singularly ugly struct). It's just that then you'd have to 
make your read() know whether it's a compat-read or not, which you really 
can't.

Which is why you introduced a new system call, but that leads to all the 
problems with the file descriptor no longer being *usable*.

Think scripts. It's easy to do reads in perl scripts, and parse the 
output. In contrast, making perl use a new system call is quite 
challenging.

And *that* is why "everything is a stream of bytes" is so important. You 
don't know where the file descriptor has been, or who uses it. Special 
system calls for special file descriptors are just *wrong*.

After all, that's why we'd have a signalfd() in the first place: exactly 
so that you do *not* have to use special system calls, but can just pass 
it on to any event waiting mechanism like select, poll, epoll. The same is 
just *even*more*true* when it comes to reading the data!

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, Linus Torvalds wrote:

> 
> 
> On Thu, 8 Mar 2007, Davide Libenzi wrote:
> > 
> > The reason for the special function, was not to provide a non-blocking 
> > behaviour with zero timeout (that just a side effect), but to read the 
> > siginfo. I was all about using read(2) (and v1 used it), but when you have 
> > to transfer complex structures over it, it becomes hell. How do you 
> > cleanly compat over a f_op->read callback for example?
> 
> I agree that it gets a bit "interesting", and one option might be that the 
> "read()" interface just gets the signal number and the minimal siginfo 
> information, which is, after all, what 99% of all apps actually care
> about.
> 
> But "siginfo_t" is really a *horrible* structure. Nobody sane should ever 
> use siginfo_t, and the designer of that thing was so high on LSD that it's 
> not even funny. Re-using fields in a union? Values that depend on other 
> bits in the thing in random manners? 
> 
> In other words, I bet that we could just make it a *lot* better by making 
> the read structure be:
> 
>  - 16 4-byte fields (fixed 64-byte packet), where each field is an 
>uint32_t (we could even do it in network byte order if we care and if 
>you want to just pipe the information from one machine to another, but 
>that sounds a bit excessive ;)
> 
>  - Just put the fields people actually use at fixed offsets: si_signo, 
>si_errno, si_pid, si_uid, si_band, si_fd.
> 
>  - that still leaves room for the other cases if anybody ever wants them 
>(but I doubt it - things like si_addr are really only useful for 
>synchronous signals that are actually done as *signals*, since you 
>cannot defer a SIGBUS/SIGSEGV/SIGILL *anyway*).
> 
> So I bet 99% of users actually just want si_signo, while some small subset 
> might want the SIGCHLD info and some of the special cases (eg we migth 
> want to add si_addr as a 64-bit thing just because the USB stack sends a 
> SI_ASYNCIO thing for completed URB's, so a libusb might want it, but 
> that's probably the only such user).
> 
> And it would be *cleaner* than the mess that is siginfo_t..
> 
> (I realize that siginfo_t is ugly because it built up over time, using the 
> same buffer for many different things. I'm just saying that we can 
> probably do better by *not* using it, and just laying things out in a 
> cleaner manner to begin with, which also solves any compatibility issues)

I can do that, no problem. But isn't it better to recognize that this kind 
of data just can't be shipped through a non compat-able function?
Like, for example, the current trend to say "just use u64 for a pointer, 
it'll be fine". I remeber, a long time ago when the i386 architecture came 
out, to say "Wow! 4GB is gonna last *forever*!", let's use u32 for 
pointers. Well, forever is almost here in my watches. And all the 
userspace code using APIs assuming to cleanly store a pointer in a u32 
will have to be re-factored.
So, to cut it short, I can do the pseudo-siginfo read(2), but I don't 
like it too much (little, actually). The siginfo, as bad as it is, is a 
standard used in many POSIX APIs (hence even in kernel), and IMO if we 
want to send that back, a struct siginfo should be.
No?



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Linus Torvalds


On Thu, 8 Mar 2007, Michael K. Edwards wrote:
> 
> Make it a netlink socket and fetch your structures using recvmsg().
> siginfo_t belongs in ancillary data.

Gaah. That interface is horrible.

> The UNIX philosophy is "everything's a file".  The Berkeley philosophy
> is "everything's a socket, except for files, which are feeble
> mini-sockets".  I'd go with the Berkeley crowd here.

No, the berkeley crowd is totally out to lunch.

I might agree with you *if* you could actually do "recvmsg()" on arbitrary 
file descriptors, but you cannot. 

We could fix that in Linux, of course, but the fact is, "recvmsg()" is 
*not* a superset of "read()". In general, it's a *subset*, exactly because 
very few file descriptors support it.

So the normal way to read from a file descriptor (and the *only* way in 
any generic select loop) is to use "read()". That's the only thing that 
works for everything. And we shouldn't break that.

The sad part is that there really is no reason why the BSD crowd couldn't 
have done recvmsg() as an "extended read with per-system call flags", 
which would have made things like O_NONBLOCK etc unnecessary, because you 
could do it just with MSG_DONTWAIT..

So anybody who would "go with the Berkeley crowd" really shows a lot of 
bad taste, I'm afraid. The Berkeley crowd really messed up here, and it's 
so long ago that I don't think there is any point in us trying to fix it 
any more.

(But if somebody makes recvmgs a general VFS interface and makes it just 
work for everything, I'd probably take the patch as a cleanup. I really 
think it should have been a "struct file_operations" thing rather than 
being a socket-only thing.. But since you couldn't portably use it 
anyway, the thing is pretty moot)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Linus Torvalds


On Thu, 8 Mar 2007, Davide Libenzi wrote:
> 
> The reason for the special function, was not to provide a non-blocking 
> behaviour with zero timeout (that just a side effect), but to read the 
> siginfo. I was all about using read(2) (and v1 used it), but when you have 
> to transfer complex structures over it, it becomes hell. How do you 
> cleanly compat over a f_op->read callback for example?

I agree that it gets a bit "interesting", and one option might be that the 
"read()" interface just gets the signal number and the minimal siginfo 
information, which is, after all, what 99% of all apps actually care
about.

But "siginfo_t" is really a *horrible* structure. Nobody sane should ever 
use siginfo_t, and the designer of that thing was so high on LSD that it's 
not even funny. Re-using fields in a union? Values that depend on other 
bits in the thing in random manners? 

In other words, I bet that we could just make it a *lot* better by making 
the read structure be:

 - 16 4-byte fields (fixed 64-byte packet), where each field is an 
   uint32_t (we could even do it in network byte order if we care and if 
   you want to just pipe the information from one machine to another, but 
   that sounds a bit excessive ;)

 - Just put the fields people actually use at fixed offsets: si_signo, 
   si_errno, si_pid, si_uid, si_band, si_fd.

 - that still leaves room for the other cases if anybody ever wants them 
   (but I doubt it - things like si_addr are really only useful for 
   synchronous signals that are actually done as *signals*, since you 
   cannot defer a SIGBUS/SIGSEGV/SIGILL *anyway*).

So I bet 99% of users actually just want si_signo, while some small subset 
might want the SIGCHLD info and some of the special cases (eg we migth 
want to add si_addr as a 64-bit thing just because the USB stack sends a 
SI_ASYNCIO thing for completed URB's, so a libusb might want it, but 
that's probably the only such user).

And it would be *cleaner* than the mess that is siginfo_t..

(I realize that siginfo_t is ugly because it built up over time, using the 
same buffer for many different things. I'm just saying that we can 
probably do better by *not* using it, and just laying things out in a 
cleaner manner to begin with, which also solves any compatibility issues)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Michael K. Edwards

On 3/8/07, Davide Libenzi  wrote:

The reason for the special function, was not to provide a non-blocking
behaviour with zero timeout (that just a side effect), but to read the
siginfo. I was all about using read(2) (and v1 used it), but when you have
to transfer complex structures over it, it becomes hell. How do you
cleanly compat over a f_op->read callback for example?


Make it a netlink socket and fetch your structures using recvmsg().
siginfo_t belongs in ancillary data.

The UNIX philosophy is "everything's a file".  The Berkeley philosophy
is "everything's a socket, except for files, which are feeble
mini-sockets".  I'd go with the Berkeley crowd here.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, Linus Torvalds wrote:

> You missed David's worry, I think.
> 
> Not only is POLLIN potentially an edge event (depending on the interface 
> you use to fetch it), but even as a level-triggered one you generally want 
> to read as much as possible per POLLIN event, and go back to the event 
> loop only when you get EAGAIN.
> 
> So that's in addition to the read/signal race with other 
> threads/processes.
> 
> You solved it by having a separate system call, but since it's a regular 
> file descriptor, why have a new system call at all, and not just make it 
> be a "read()"? In which case you definitely want O_NONBLOCK support.
> 
> The UNIX philosophy is often quoted as "everything is a file", but that 
> really means "everything is a stream of bytes".
> 
> In Windows, you have 15 different versions of "read()" with sockets and 
> files and pipes all having strange special cases and special system calls. 
> That's not the UNIX way. It should be just a "read()", and then people can 
> use general libraries and treat all sources the same.
> 
> For example, the main select/poll/epoll loop may be the one doing all the 
> reading, and then pass off "full buffers only" to the individual per-fd 
> "action routines". And that kind of model really very fundamentally wants 
> an fd to be an fd to be an fd - not "some file descriptors need 
> 'read_from_sigfd()', and some file descriptors need 'read()', and some 
> file descriptors need 'recvmsg()'" etc.
> 
> So I think you should get rid of signalfd_dequeue(), and just replace it 
> with a "read()" function.

The reason for the special function, was not to provide a non-blocking 
behaviour with zero timeout (that just a side effect), but to read the 
siginfo. I was all about using read(2) (and v1 used it), but when you have 
to transfer complex structures over it, it becomes hell. How do you 
cleanly compat over a f_op->read callback for example?



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Linus Torvalds


On Thu, 8 Mar 2007, Davide Libenzi wrote:
> 
> This patch, if you get a POLLIN, you have a signal to read for sure (well, 
> unless you another thread/task reads it before you - but that's just 
> somthing you have to take care). There is not explicit check for 
> O_NONBLOCK now, but a zero timeout would do exactly the same thing.

You missed David's worry, I think.

Not only is POLLIN potentially an edge event (depending on the interface 
you use to fetch it), but even as a level-triggered one you generally want 
to read as much as possible per POLLIN event, and go back to the event 
loop only when you get EAGAIN.

So that's in addition to the read/signal race with other 
threads/processes.

You solved it by having a separate system call, but since it's a regular 
file descriptor, why have a new system call at all, and not just make it 
be a "read()"? In which case you definitely want O_NONBLOCK support.

The UNIX philosophy is often quoted as "everything is a file", but that 
really means "everything is a stream of bytes".

In Windows, you have 15 different versions of "read()" with sockets and 
files and pipes all having strange special cases and special system calls. 
That's not the UNIX way. It should be just a "read()", and then people can 
use general libraries and treat all sources the same.

For example, the main select/poll/epoll loop may be the one doing all the 
reading, and then pass off "full buffers only" to the individual per-fd 
"action routines". And that kind of model really very fundamentally wants 
an fd to be an fd to be an fd - not "some file descriptors need 
'read_from_sigfd()', and some file descriptors need 'read()', and some 
file descriptors need 'recvmsg()'" etc.

So I think you should get rid of signalfd_dequeue(), and just replace it 
with a "read()" function.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, David M. Lloyd wrote:

> On Wed, 2007-03-07 at 17:21 -0800, Davide Libenzi wrote:
> > int signalfd_dequeue(int fd, siginfo_t *info, long timeo);
> > 
> > The "fd" parameter must ba a signalfd file descriptor. The "info" parameter
> > is a pointer to the siginfo that will receive the dequeued signal, and
> > "timeo" is a timeout in milliseconds, or -1 for infinite.
> > The signalfd_dequeue function returns 0 if successfull.
> 
> Does this support non-blocking mode?  It doesn't seem to at my level of
> understanding anyway.  If I use this with EPOLLET for example, I'd
> expect to get a single EPOLLIN when a signal arrives, which would
> indicate to me that I must call signalfd_dequeue() in a loop until I get
> EAGAIN in order to be sure I've consumed all the outstanding signals so
> that the edge-triggered notification can be "re-armed".

This patch, if you get a POLLIN, you have a signal to read for sure (well, 
unless you another thread/task reads it before you - but that's just 
somthing you have to take care). There is not explicit check for 
O_NONBLOCK now, but a zero timeout would do exactly the same thing.


- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread David M. Lloyd
On Wed, 2007-03-07 at 17:21 -0800, Davide Libenzi wrote:
> int signalfd_dequeue(int fd, siginfo_t *info, long timeo);
> 
> The "fd" parameter must ba a signalfd file descriptor. The "info" parameter
> is a pointer to the siginfo that will receive the dequeued signal, and
> "timeo" is a timeout in milliseconds, or -1 for infinite.
> The signalfd_dequeue function returns 0 if successfull.

Does this support non-blocking mode?  It doesn't seem to at my level of
understanding anyway.  If I use this with EPOLLET for example, I'd
expect to get a single EPOLLIN when a signal arrives, which would
indicate to me that I must call signalfd_dequeue() in a loop until I get
EAGAIN in order to be sure I've consumed all the outstanding signals so
that the edge-triggered notification can be "re-armed".

Make sense?

- DML

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, David M. Lloyd wrote:

 On Wed, 2007-03-07 at 17:21 -0800, Davide Libenzi wrote:
  int signalfd_dequeue(int fd, siginfo_t *info, long timeo);
  
  The fd parameter must ba a signalfd file descriptor. The info parameter
  is a pointer to the siginfo that will receive the dequeued signal, and
  timeo is a timeout in milliseconds, or -1 for infinite.
  The signalfd_dequeue function returns 0 if successfull.
 
 Does this support non-blocking mode?  It doesn't seem to at my level of
 understanding anyway.  If I use this with EPOLLET for example, I'd
 expect to get a single EPOLLIN when a signal arrives, which would
 indicate to me that I must call signalfd_dequeue() in a loop until I get
 EAGAIN in order to be sure I've consumed all the outstanding signals so
 that the edge-triggered notification can be re-armed.

This patch, if you get a POLLIN, you have a signal to read for sure (well, 
unless you another thread/task reads it before you - but that's just 
somthing you have to take care). There is not explicit check for 
O_NONBLOCK now, but a zero timeout would do exactly the same thing.


- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Linus Torvalds


On Thu, 8 Mar 2007, Davide Libenzi wrote:
 
 This patch, if you get a POLLIN, you have a signal to read for sure (well, 
 unless you another thread/task reads it before you - but that's just 
 somthing you have to take care). There is not explicit check for 
 O_NONBLOCK now, but a zero timeout would do exactly the same thing.

You missed David's worry, I think.

Not only is POLLIN potentially an edge event (depending on the interface 
you use to fetch it), but even as a level-triggered one you generally want 
to read as much as possible per POLLIN event, and go back to the event 
loop only when you get EAGAIN.

So that's in addition to the read/signal race with other 
threads/processes.

You solved it by having a separate system call, but since it's a regular 
file descriptor, why have a new system call at all, and not just make it 
be a read()? In which case you definitely want O_NONBLOCK support.

The UNIX philosophy is often quoted as everything is a file, but that 
really means everything is a stream of bytes.

In Windows, you have 15 different versions of read() with sockets and 
files and pipes all having strange special cases and special system calls. 
That's not the UNIX way. It should be just a read(), and then people can 
use general libraries and treat all sources the same.

For example, the main select/poll/epoll loop may be the one doing all the 
reading, and then pass off full buffers only to the individual per-fd 
action routines. And that kind of model really very fundamentally wants 
an fd to be an fd to be an fd - not some file descriptors need 
'read_from_sigfd()', and some file descriptors need 'read()', and some 
file descriptors need 'recvmsg()' etc.

So I think you should get rid of signalfd_dequeue(), and just replace it 
with a read() function.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, Linus Torvalds wrote:

 You missed David's worry, I think.
 
 Not only is POLLIN potentially an edge event (depending on the interface 
 you use to fetch it), but even as a level-triggered one you generally want 
 to read as much as possible per POLLIN event, and go back to the event 
 loop only when you get EAGAIN.
 
 So that's in addition to the read/signal race with other 
 threads/processes.
 
 You solved it by having a separate system call, but since it's a regular 
 file descriptor, why have a new system call at all, and not just make it 
 be a read()? In which case you definitely want O_NONBLOCK support.
 
 The UNIX philosophy is often quoted as everything is a file, but that 
 really means everything is a stream of bytes.
 
 In Windows, you have 15 different versions of read() with sockets and 
 files and pipes all having strange special cases and special system calls. 
 That's not the UNIX way. It should be just a read(), and then people can 
 use general libraries and treat all sources the same.
 
 For example, the main select/poll/epoll loop may be the one doing all the 
 reading, and then pass off full buffers only to the individual per-fd 
 action routines. And that kind of model really very fundamentally wants 
 an fd to be an fd to be an fd - not some file descriptors need 
 'read_from_sigfd()', and some file descriptors need 'read()', and some 
 file descriptors need 'recvmsg()' etc.
 
 So I think you should get rid of signalfd_dequeue(), and just replace it 
 with a read() function.

The reason for the special function, was not to provide a non-blocking 
behaviour with zero timeout (that just a side effect), but to read the 
siginfo. I was all about using read(2) (and v1 used it), but when you have 
to transfer complex structures over it, it becomes hell. How do you 
cleanly compat over a f_op-read callback for example?



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Michael K. Edwards

On 3/8/07, Davide Libenzi davidel@xmailserver.org wrote:

The reason for the special function, was not to provide a non-blocking
behaviour with zero timeout (that just a side effect), but to read the
siginfo. I was all about using read(2) (and v1 used it), but when you have
to transfer complex structures over it, it becomes hell. How do you
cleanly compat over a f_op-read callback for example?


Make it a netlink socket and fetch your structures using recvmsg().
siginfo_t belongs in ancillary data.

The UNIX philosophy is everything's a file.  The Berkeley philosophy
is everything's a socket, except for files, which are feeble
mini-sockets.  I'd go with the Berkeley crowd here.

Cheers,
- Michael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Linus Torvalds


On Thu, 8 Mar 2007, Davide Libenzi wrote:
 
 The reason for the special function, was not to provide a non-blocking 
 behaviour with zero timeout (that just a side effect), but to read the 
 siginfo. I was all about using read(2) (and v1 used it), but when you have 
 to transfer complex structures over it, it becomes hell. How do you 
 cleanly compat over a f_op-read callback for example?

I agree that it gets a bit interesting, and one option might be that the 
read() interface just gets the signal number and the minimal siginfo 
information, which is, after all, what 99% of all apps actually care
about.

But siginfo_t is really a *horrible* structure. Nobody sane should ever 
use siginfo_t, and the designer of that thing was so high on LSD that it's 
not even funny. Re-using fields in a union? Values that depend on other 
bits in the thing in random manners? 

In other words, I bet that we could just make it a *lot* better by making 
the read structure be:

 - 16 4-byte fields (fixed 64-byte packet), where each field is an 
   uint32_t (we could even do it in network byte order if we care and if 
   you want to just pipe the information from one machine to another, but 
   that sounds a bit excessive ;)

 - Just put the fields people actually use at fixed offsets: si_signo, 
   si_errno, si_pid, si_uid, si_band, si_fd.

 - that still leaves room for the other cases if anybody ever wants them 
   (but I doubt it - things like si_addr are really only useful for 
   synchronous signals that are actually done as *signals*, since you 
   cannot defer a SIGBUS/SIGSEGV/SIGILL *anyway*).

So I bet 99% of users actually just want si_signo, while some small subset 
might want the SIGCHLD info and some of the special cases (eg we migth 
want to add si_addr as a 64-bit thing just because the USB stack sends a 
SI_ASYNCIO thing for completed URB's, so a libusb might want it, but 
that's probably the only such user).

And it would be *cleaner* than the mess that is siginfo_t..

(I realize that siginfo_t is ugly because it built up over time, using the 
same buffer for many different things. I'm just saying that we can 
probably do better by *not* using it, and just laying things out in a 
cleaner manner to begin with, which also solves any compatibility issues)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Linus Torvalds


On Thu, 8 Mar 2007, Michael K. Edwards wrote:
 
 Make it a netlink socket and fetch your structures using recvmsg().
 siginfo_t belongs in ancillary data.

Gaah. That interface is horrible.

 The UNIX philosophy is everything's a file.  The Berkeley philosophy
 is everything's a socket, except for files, which are feeble
 mini-sockets.  I'd go with the Berkeley crowd here.

No, the berkeley crowd is totally out to lunch.

I might agree with you *if* you could actually do recvmsg() on arbitrary 
file descriptors, but you cannot. 

We could fix that in Linux, of course, but the fact is, recvmsg() is 
*not* a superset of read(). In general, it's a *subset*, exactly because 
very few file descriptors support it.

So the normal way to read from a file descriptor (and the *only* way in 
any generic select loop) is to use read(). That's the only thing that 
works for everything. And we shouldn't break that.

The sad part is that there really is no reason why the BSD crowd couldn't 
have done recvmsg() as an extended read with per-system call flags, 
which would have made things like O_NONBLOCK etc unnecessary, because you 
could do it just with MSG_DONTWAIT..

So anybody who would go with the Berkeley crowd really shows a lot of 
bad taste, I'm afraid. The Berkeley crowd really messed up here, and it's 
so long ago that I don't think there is any point in us trying to fix it 
any more.

(But if somebody makes recvmgs a general VFS interface and makes it just 
work for everything, I'd probably take the patch as a cleanup. I really 
think it should have been a struct file_operations thing rather than 
being a socket-only thing.. But since you couldn't portably use it 
anyway, the thing is pretty moot)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, Linus Torvalds wrote:

 
 
 On Thu, 8 Mar 2007, Davide Libenzi wrote:
  
  The reason for the special function, was not to provide a non-blocking 
  behaviour with zero timeout (that just a side effect), but to read the 
  siginfo. I was all about using read(2) (and v1 used it), but when you have 
  to transfer complex structures over it, it becomes hell. How do you 
  cleanly compat over a f_op-read callback for example?
 
 I agree that it gets a bit interesting, and one option might be that the 
 read() interface just gets the signal number and the minimal siginfo 
 information, which is, after all, what 99% of all apps actually care
 about.
 
 But siginfo_t is really a *horrible* structure. Nobody sane should ever 
 use siginfo_t, and the designer of that thing was so high on LSD that it's 
 not even funny. Re-using fields in a union? Values that depend on other 
 bits in the thing in random manners? 
 
 In other words, I bet that we could just make it a *lot* better by making 
 the read structure be:
 
  - 16 4-byte fields (fixed 64-byte packet), where each field is an 
uint32_t (we could even do it in network byte order if we care and if 
you want to just pipe the information from one machine to another, but 
that sounds a bit excessive ;)
 
  - Just put the fields people actually use at fixed offsets: si_signo, 
si_errno, si_pid, si_uid, si_band, si_fd.
 
  - that still leaves room for the other cases if anybody ever wants them 
(but I doubt it - things like si_addr are really only useful for 
synchronous signals that are actually done as *signals*, since you 
cannot defer a SIGBUS/SIGSEGV/SIGILL *anyway*).
 
 So I bet 99% of users actually just want si_signo, while some small subset 
 might want the SIGCHLD info and some of the special cases (eg we migth 
 want to add si_addr as a 64-bit thing just because the USB stack sends a 
 SI_ASYNCIO thing for completed URB's, so a libusb might want it, but 
 that's probably the only such user).
 
 And it would be *cleaner* than the mess that is siginfo_t..
 
 (I realize that siginfo_t is ugly because it built up over time, using the 
 same buffer for many different things. I'm just saying that we can 
 probably do better by *not* using it, and just laying things out in a 
 cleaner manner to begin with, which also solves any compatibility issues)

I can do that, no problem. But isn't it better to recognize that this kind 
of data just can't be shipped through a non compat-able function?
Like, for example, the current trend to say just use u64 for a pointer, 
it'll be fine. I remeber, a long time ago when the i386 architecture came 
out, to say Wow! 4GB is gonna last *forever*!, let's use u32 for 
pointers. Well, forever is almost here in my watches. And all the 
userspace code using APIs assuming to cleanly store a pointer in a u32 
will have to be re-factored.
So, to cut it short, I can do the pseudo-siginfo read(2), but I don't 
like it too much (little, actually). The siginfo, as bad as it is, is a 
standard used in many POSIX APIs (hence even in kernel), and IMO if we 
want to send that back, a struct siginfo should be.
No?



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Linus Torvalds


On Thu, 8 Mar 2007, Davide Libenzi wrote:

 So, to cut it short, I can do the pseudo-siginfo read(2), but I don't 
 like it too much (little, actually). The siginfo, as bad as it is, is a 
 standard used in many POSIX APIs (hence even in kernel), and IMO if we 
 want to send that back, a struct siginfo should be.
 No?

I think it's perfectly fine if you make it struct siginfo (even though I 
think it's a singularly ugly struct). It's just that then you'd have to 
make your read() know whether it's a compat-read or not, which you really 
can't.

Which is why you introduced a new system call, but that leads to all the 
problems with the file descriptor no longer being *usable*.

Think scripts. It's easy to do reads in perl scripts, and parse the 
output. In contrast, making perl use a new system call is quite 
challenging.

And *that* is why everything is a stream of bytes is so important. You 
don't know where the file descriptor has been, or who uses it. Special 
system calls for special file descriptors are just *wrong*.

After all, that's why we'd have a signalfd() in the first place: exactly 
so that you do *not* have to use special system calls, but can just pass 
it on to any event waiting mechanism like select, poll, epoll. The same is 
just *even*more*true* when it comes to reading the data!

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, Linus Torvalds wrote:

 
 
 On Thu, 8 Mar 2007, Davide Libenzi wrote:
 
  So, to cut it short, I can do the pseudo-siginfo read(2), but I don't 
  like it too much (little, actually). The siginfo, as bad as it is, is a 
  standard used in many POSIX APIs (hence even in kernel), and IMO if we 
  want to send that back, a struct siginfo should be.
  No?
 
 I think it's perfectly fine if you make it struct siginfo (even though I 
 think it's a singularly ugly struct). It's just that then you'd have to 
 make your read() know whether it's a compat-read or not, which you really 
 can't.
 
 Which is why you introduced a new system call, but that leads to all the 
 problems with the file descriptor no longer being *usable*.
 
 Think scripts. It's easy to do reads in perl scripts, and parse the 
 output. In contrast, making perl use a new system call is quite 
 challenging.
 
 And *that* is why everything is a stream of bytes is so important. You 
 don't know where the file descriptor has been, or who uses it. Special 
 system calls for special file descriptors are just *wrong*.
 
 After all, that's why we'd have a signalfd() in the first place: exactly 
 so that you do *not* have to use special system calls, but can just pass 
 it on to any event waiting mechanism like select, poll, epoll. The same is 
 just *even*more*true* when it comes to reading the data!

Cheeseburger it is! ;)
I'll revert back to read(2) with pseudo-siginfo and O_NONBLOCK handling...



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Avi Kivity

Davide Libenzi wrote:
So, to cut it short, I can do the pseudo-siginfo read(2), but I don't 
like it too much (little, actually). The siginfo, as bad as it is, is a 
standard used in many POSIX APIs (hence even in kernel), and IMO if we 
want to send that back, a struct siginfo should be.
  


You can send the data in the 32/64 neutral format, and have glibc 
convert it to a siginfo, and everybody's happy.



--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, Avi Kivity wrote:

 Davide Libenzi wrote:
  So, to cut it short, I can do the pseudo-siginfo read(2), but I don't like
  it too much (little, actually). The siginfo, as bad as it is, is a standard
  used in many POSIX APIs (hence even in kernel), and IMO if we want to send
  that back, a struct siginfo should be.

 
 You can send the data in the 32/64 neutral format, and have glibc convert it
 to a siginfo, and everybody's happy.

Except Uli :)
The problem with the siginfo is that it is a brain-dead structure with 
unions and data dependent on types (and pointers). Pretty fugly.



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Oleg Nesterov
Davide Libenzi wrote:

 +int signalfd_deliver(struct sighand_struct *sighand, int sig, struct siginfo 
 *info)
 +{
 + int nsig = 0;
 + struct list_head *pos;
 + struct signalfd_ctx *ctx;
 + struct signalfd_sq *sq;
 +
 + list_for_each(pos, sighand-sfdlist) {
 + ctx = list_entry(pos, struct signalfd_ctx, lnk);
 + /*
 +  * We use a negative signal value as a way to broadcast that the
 +  * sighand has been orphaned, so that we can notify all the
 +  * listeners about this.
 +  */
 + if (sig  0)
 + __wake_up_locked(ctx-wqh, TASK_UNINTERRUPTIBLE | 
 TASK_INTERRUPTIBLE);
 + else if (sigismember(ctx-sigmask, sig) 
 +  (sig = SIGRTMIN || !sigismember(ctx-pending, sig))) 
 {
 + sigaddset(ctx-pending, sig);

I don't understand the (sig = SIGRTMIN || !sigismember(ctx-pending, sig))
check. This mimics the LEGACY_QUEUE() check, but seems strange. The signal may
be pending in ctx-pending just because it was not signalfd_fetchsig()ed, yes?

Please also see below.

 +asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t 
 sizemask)
 +{

 [...snip...]

 + } else {
 + error = -EBADF;
 + file = fget(ufd);
 + if (!file)
 + goto err_exit;
 + ctx = file-private_data;
 + error = -EINVAL;
 + if (file-f_op != signalfd_fops) {
 + fput(file);
 + goto err_exit;
 + }
 + spin_lock_irq(ctx-sighand-siglock);
 + ctx-sigmask = sigmask;
 + spin_unlock_irq(ctx-sighand-siglock);
 + wake_up(ctx-wqh);

Can't this race with sys_signalfd_dequeue() which use lockless 
__add_wait_queue()?
Looks like we should do __wake_up_locked() under ctx-sighand-siglock.

 --- linux-2.6.20.ep2.orig/kernel/signal.c 2007-03-07 15:55:43.0 
 -0800
 +++ linux-2.6.20.ep2/kernel/signal.c  2007-03-07 15:59:01.0 -0800

 [...snip...]

 @@ -780,6 +785,11 @@
   BUG_ON(!irqs_disabled());
   assert_spin_locked(t-sighand-siglock);
  
 + /*
 +  * Deliver the signal to listening signalfds ...
 +  */
 + signalfd_notify(t-sighand, sig, info);
 +
   /* Short-circuit ignored signals.  */
   if (sig_ignored(t, sig))
   goto out;
 @@ -968,6 +978,11 @@
   assert_spin_locked(p-sighand-siglock);
   handle_stop_signal(sig, p);
  
 + /*
 +  * Deliver the signal to listening signalfds ...
 +  */
 + signalfd_notify(p-sighand, sig, info);
 +
   /* Short-circuit ignored signals.  */
   if (sig_ignored(p, sig))
   return ret;

It is strange that we are doing signalfd_notify() even if the signal is ignored.
Isn't it better to shift signalfd_notify() into send_signal() ? This way we do
not need the special check in signalfd_deliver() above.

Also, this patch doesn't take send_sigqueue/send_group_sigqueue into account.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Michael K. Edwards

On 3/8/07, Linus Torvalds [EMAIL PROTECTED] wrote:

So anybody who would go with the Berkeley crowd really shows a lot of
bad taste, I'm afraid. The Berkeley crowd really messed up here, and it's
so long ago that I don't think there is any point in us trying to fix it
any more.


Well, they did invent the socket, which sucks less than a lot of other
things.  You would prefer STREAMS, perhaps?  :-)  My point is that
this is a message-oriented problem, not a stream-oriented problem, and
read() just isn't a very good interface for passing structured
messages around.  I agree that the details of recvmsg() ancillary data
are fairly grotty (optimization based on micro-benchmarks, as usual,
back in the PDP/VAX days), but the concept isn't that bad; you let
netlink sockets into the kernel, didn't you?


(But if somebody makes recvmgs a general VFS interface and makes it just
work for everything, I'd probably take the patch as a cleanup. I really
think it should have been a struct file_operations thing rather than
being a socket-only thing.. But since you couldn't portably use it
anyway, the thing is pretty moot)


sendmsg()/recvmsg() to a file makes perfect sense, if you put the
equivalent of an llseek tuple into ancillary data.  And it's also IMHO
the right way to do file AIO -- open up a netlink socket to the entity
that's scheduling the IOs for a given filesystem, use the SCM_RIGHTS
mechanism to do indirect opens (your credentials could come over an
AF_UNIX socket from a completely separate open server process), and
multiplex all the AIO to that filesystem across the netlink socket.

If adding this to VFS is something you would seriously consider, I'll
do the work.  But I will need some coaching to work around my relative
inexperience with the Linux core code and my lack of taste.  :-)

Cheers,
- Michael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Marko Macek

Linus Torvalds wrote:


On Thu, 8 Mar 2007, Davide Libenzi wrote:
So, to cut it short, I can do the pseudo-siginfo read(2), but I don't 
like it too much (little, actually). The siginfo, as bad as it is, is a 
standard used in many POSIX APIs (hence even in kernel), and IMO if we 
want to send that back, a struct siginfo should be.

No?


I think it's perfectly fine if you make it struct siginfo (even though I 
think it's a singularly ugly struct). It's just that then you'd have to 
make your read() know whether it's a compat-read or not, which you really 
can't.


Which is why you introduced a new system call, but that leads to all the 
problems with the file descriptor no longer being *usable*.


Think scripts. It's easy to do reads in perl scripts, and parse the 
output. In contrast, making perl use a new system call is quite 
challenging.


Probably, but someone will have to add the 'signalfd' system call anyway.

And *that* is why everything is a stream of bytes is so important. You 
don't know where the file descriptor has been, or who uses it. Special 
system calls for special file descriptors are just *wrong*.


After all, that's why we'd have a signalfd() in the first place: exactly 
so that you do *not* have to use special system calls, but can just pass 
it on to any event waiting mechanism like select, poll, epoll. The same is 
just *even*more*true* when it comes to reading the data!


The problem with read() returning arbitrary unstructured data is that 
there is hard to do versioning/extensibility, since the userspace can't 
specify the requested/expected format. The only way it could be done is 
by the (nbytes) parameter to read() which is not very nice (and useless 
for scripts).


This is the same problem that makes sysfs/procfs fragile unless the
file format is very well specified for extensibility (and it's easy to
f-it up, since there seems to be little consistency there... using
the XML horror would actually be an improvement). Breaking sysfs/procfs
might be acceptable once every few years, but signal handling will be
part of every application event loop and there is no room for breaking 
anything.


(although, one could to the versioning the ugly way by creating the new 
'signalfd' syscall instead).


I'd say: make read() return the signal number (probably as 4-byte int, 
in network byte order), but for everything else, use the system call.


Mark

P.S. I'm currently worried if the fact that FUTEX_FD is being deprecated 
is a problem :)


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, Oleg Nesterov wrote:

 Davide Libenzi wrote:
 
  +int signalfd_deliver(struct sighand_struct *sighand, int sig, struct 
  siginfo *info)
  +{
  +   int nsig = 0;
  +   struct list_head *pos;
  +   struct signalfd_ctx *ctx;
  +   struct signalfd_sq *sq;
  +
  +   list_for_each(pos, sighand-sfdlist) {
  +   ctx = list_entry(pos, struct signalfd_ctx, lnk);
  +   /*
  +* We use a negative signal value as a way to broadcast that the
  +* sighand has been orphaned, so that we can notify all the
  +* listeners about this.
  +*/
  +   if (sig  0)
  +   __wake_up_locked(ctx-wqh, TASK_UNINTERRUPTIBLE | 
  TASK_INTERRUPTIBLE);
  +   else if (sigismember(ctx-sigmask, sig) 
  +(sig = SIGRTMIN || !sigismember(ctx-pending, sig))) 
  {
  +   sigaddset(ctx-pending, sig);
 
 I don't understand the (sig = SIGRTMIN || !sigismember(ctx-pending, sig))
 check. This mimics the LEGACY_QUEUE() check, but seems strange. The signal may
 be pending in ctx-pending just because it was not signalfd_fetchsig()ed, yes?

Logic is, if it's not an RT signal, queue only one, otherwise multiple.
The bit on the -pending mask is clealer only when the queue slot becomes empty.



 Please also see below.
 
  +asmlinkage long sys_signalfd(int ufd, sigset_t __user *user_mask, size_t 
  sizemask)
  +{
 
  [...snip...]
 
  +   } else {
  +   error = -EBADF;
  +   file = fget(ufd);
  +   if (!file)
  +   goto err_exit;
  +   ctx = file-private_data;
  +   error = -EINVAL;
  +   if (file-f_op != signalfd_fops) {
  +   fput(file);
  +   goto err_exit;
  +   }
  +   spin_lock_irq(ctx-sighand-siglock);
  +   ctx-sigmask = sigmask;
  +   spin_unlock_irq(ctx-sighand-siglock);
  +   wake_up(ctx-wqh);
 
 Can't this race with sys_signalfd_dequeue() which use lockless 
 __add_wait_queue()?
 Looks like we should do __wake_up_locked() under ctx-sighand-siglock.

Yes, good catch. Fixed.



  --- linux-2.6.20.ep2.orig/kernel/signal.c   2007-03-07 15:55:43.0 
  -0800
  +++ linux-2.6.20.ep2/kernel/signal.c2007-03-07 15:59:01.0 
  -0800
 
  [...snip...]
 
  @@ -780,6 +785,11 @@
  BUG_ON(!irqs_disabled());
  assert_spin_locked(t-sighand-siglock);
   
  +   /*
  +* Deliver the signal to listening signalfds ...
  +*/
  +   signalfd_notify(t-sighand, sig, info);
  +
  /* Short-circuit ignored signals.  */
  if (sig_ignored(t, sig))
  goto out;
  @@ -968,6 +978,11 @@
  assert_spin_locked(p-sighand-siglock);
  handle_stop_signal(sig, p);
   
  +   /*
  +* Deliver the signal to listening signalfds ...
  +*/
  +   signalfd_notify(p-sighand, sig, info);
  +
  /* Short-circuit ignored signals.  */
  if (sig_ignored(p, sig))
  return ret;
 
 It is strange that we are doing signalfd_notify() even if the signal is 
 ignored.
 Isn't it better to shift signalfd_notify() into send_signal() ? This way we do
 not need the special check in signalfd_deliver() above.

The two trasports can rely on different masks. The signalfd_notify() does 
not even go in signalfd_deliver() if no signalfds are attached to the 
sighand.



 Also, this patch doesn't take send_sigqueue/send_group_sigqueue into account.

I added that too. but I noticed something strange, dunno if intentional or 
not. In send_sigqueue and send_group_sigqueue, the check for the 
timer-special and the ignored is inverted. This lead to two different 
behaviours. Is there a reason for that?




- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Marko Macek

Linus Torvalds wrote:


On Thu, 8 Mar 2007, Davide Libenzi wrote:
So, to cut it short, I can do the pseudo-siginfo read(2), but I don't 
like it too much (little, actually). The siginfo, as bad as it is, is a 
standard used in many POSIX APIs (hence even in kernel), and IMO if we 
want to send that back, a struct siginfo should be.

No?


I think it's perfectly fine if you make it struct siginfo (even though I 
think it's a singularly ugly struct). It's just that then you'd have to 
make your read() know whether it's a compat-read or not, which you really 
can't.


Which is why you introduced a new system call, but that leads to all the 
problems with the file descriptor no longer being *usable*.


Think scripts. It's easy to do reads in perl scripts, and parse the 
output. In contrast, making perl use a new system call is quite 
challenging.


Probably, but someone will have to add the 'signalfd' system call anyway.

And *that* is why everything is a stream of bytes is so important. You 
don't know where the file descriptor has been, or who uses it. Special 
system calls for special file descriptors are just *wrong*.


After all, that's why we'd have a signalfd() in the first place: exactly 
so that you do *not* have to use special system calls, but can just pass 
it on to any event waiting mechanism like select, poll, epoll. The same is 
just *even*more*true* when it comes to reading the data!


The problem with read() returning arbitrary unstructured data is that 
there is hard to do versioning/extensibility, since the userspace can't 
specify the requested/expected format. The only way it could be done is 
by the (nbytes) parameter to read() which is not very nice (and useless 
for scripts).


This is the same problem that makes sysfs/procfs fragile unless the
file format is very well specified for extensibility (and it's easy to
f-it up, since there seems to be little consistency there... using
the XML horror would actually be an improvement). Breaking sysfs/procfs
might be acceptable once every few years, but signal handling will be
part of every application event loop and there is no room for breaking 
anything.


(although, one could to the versioning the ugly way by creating the new 
'signalfd' syscall instead).


I'd say: make read() return the signal number (probably as 4-byte int, 
in network byte order), but for everything else, use the system call.


Mark

P.S. I'm currently worried if the fact that FUTEX_FD is being deprecated 
is a problem :)


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Oleg Nesterov
On 03/08, Davide Libenzi wrote:

 On Thu, 8 Mar 2007, Oleg Nesterov wrote:
 
  Davide Libenzi wrote:
  
   +int signalfd_deliver(struct sighand_struct *sighand, int sig, struct 
   siginfo *info)
   +{
   + int nsig = 0;
   + struct list_head *pos;
   + struct signalfd_ctx *ctx;
   + struct signalfd_sq *sq;
   +
   + list_for_each(pos, sighand-sfdlist) {
   + ctx = list_entry(pos, struct signalfd_ctx, lnk);
   + /*
   +  * We use a negative signal value as a way to broadcast that the
   +  * sighand has been orphaned, so that we can notify all the
   +  * listeners about this.
   +  */
   + if (sig  0)
   + __wake_up_locked(ctx-wqh, TASK_UNINTERRUPTIBLE | 
   TASK_INTERRUPTIBLE);
   + else if (sigismember(ctx-sigmask, sig) 
   +  (sig = SIGRTMIN || !sigismember(ctx-pending, sig))) 
   {
   + sigaddset(ctx-pending, sig);
  
  I don't understand the (sig = SIGRTMIN || !sigismember(ctx-pending, 
  sig))
  check. This mimics the LEGACY_QUEUE() check, but seems strange. The signal 
  may
  be pending in ctx-pending just because it was not signalfd_fetchsig()ed, 
  yes?
 
 Logic is, if it's not an RT signal, queue only one, otherwise multiple.
 The bit on the -pending mask is clealer only when the queue slot becomes 
 empty.

Yes, I see what the code does, but I don't undestand why. For example, SIGCHLD 
was
delivered to the process _and_ handled several times, then 
sys_signalfd_dequeue()
comes and finds only one siginfo. Isn't this strange?

   @@ -780,6 +785,11 @@
 BUG_ON(!irqs_disabled());
 assert_spin_locked(t-sighand-siglock);

   + /*
   +  * Deliver the signal to listening signalfds ...
   +  */
   + signalfd_notify(t-sighand, sig, info);
   +
 /* Short-circuit ignored signals.  */
 if (sig_ignored(t, sig))
 goto out;
   @@ -968,6 +978,11 @@
 assert_spin_locked(p-sighand-siglock);
 handle_stop_signal(sig, p);

   + /*
   +  * Deliver the signal to listening signalfds ...
   +  */
   + signalfd_notify(p-sighand, sig, info);
   +
 /* Short-circuit ignored signals.  */
 if (sig_ignored(p, sig))
 return ret;
  
  It is strange that we are doing signalfd_notify() even if the signal is 
  ignored.
  Isn't it better to shift signalfd_notify() into send_signal() ? This way we 
  do
  not need the special check in signalfd_deliver() above.
 
 The two trasports can rely on different masks. The signalfd_notify() does 
 not even go in signalfd_deliver() if no signalfds are attached to the 
 sighand.

Sorry, I don't understand. The masks are different, yes, but -sighand is the
same? How this can make any difference for if no signalfds are attached ?

OK. What is the purpose of signalfd? Should it record the signals which were
sent to the process, or only those which were delivered? The latter looks
more natural for me. But inn any case, I don't see a reason to check -pending
mask in signalfd_deliver().

BTW, sys_signalfd_dequeue() re-queues signalfd_sq if copy_siginfo_to_user()
fails. Why not -EFAULT?

Also. A malicious user can eat all memory, 
signalfd_deliver()-kmem_cache_alloc()
doesn't check any limits.

  Also, this patch doesn't take send_sigqueue/send_group_sigqueue into 
  account.
 
 I added that too. but I noticed something strange, dunno if intentional or 
 not. In send_sigqueue and send_group_sigqueue, the check for the 
 timer-special and the ignored is inverted. This lead to two different 
 behaviours. Is there a reason for that?

I was wondering about that too.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Linus Torvalds


On Fri, 9 Mar 2007, Oleg Nesterov wrote:
 
 Also. A malicious user can eat all memory, 
 signalfd_deliver()-kmem_cache_alloc()
 doesn't check any limits.

This, btw, is one reason I *really* think signalfd() should just use the 
same old signal queue, and not try to make its own.

Signal queueing and unqueuing simply isn't that simple. 

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Thu, 8 Mar 2007, Linus Torvalds wrote:

 
 
 On Fri, 9 Mar 2007, Oleg Nesterov wrote:
  
  Also. A malicious user can eat all memory, 
  signalfd_deliver()-kmem_cache_alloc()
  doesn't check any limits.
 
 This, btw, is one reason I *really* think signalfd() should just use the 
 same old signal queue, and not try to make its own.
 
 Signal queueing and unqueuing simply isn't that simple. 

Alrighty. Back to the dequeue_signal ...



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Fri, 9 Mar 2007, Oleg Nesterov wrote:

  Logic is, if it's not an RT signal, queue only one, otherwise multiple.
  The bit on the -pending mask is clealer only when the queue slot becomes 
  empty.
 
 Yes, I see what the code does, but I don't undestand why. For example, 
 SIGCHLD was
 delivered to the process _and_ handled several times, then 
 sys_signalfd_dequeue()
 comes and finds only one siginfo. Isn't this strange?

That's the same logic the kernel folows for non-RT signals. 


  The two trasports can rely on different masks. The signalfd_notify() does 
  not even go in signalfd_deliver() if no signalfds are attached to the 
  sighand.
 
 Sorry, I don't understand. The masks are different, yes, but -sighand is the
 same? How this can make any difference for if no signalfds are attached ?

The list_empty() che would not make you fall inside signalfd_deliver(), 
hence the fast path really lives up to its name ;)


 Also. A malicious user can eat all memory, 
 signalfd_deliver()-kmem_cache_alloc()
 doesn't check any limits.

I'll make that use the std dequeu_signal, so everything is handled in there.



- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Jeremy Fitzhardinge
Linus Torvalds wrote:
 So I think you should get rid of signalfd_dequeue(), and just replace it 
 with a read() function.
   

The difficulty is that there are 4 different formats of signal structure
you could get: (traditional|siginfo) x (32bit|64bit).

What happens if you're a 32 bit process, you fork and exec a 64bit
process who inherits the signalfd, and they start reading?  What format
of signal structure do they get?  What do you get?  What if you start
doing partial reads?

I think signalfd_dequeue() is warty, but read() has has a number of
details to sort out.

(Hey, can you send signals by writing into the signalfd?  Very plan9...)

J
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Linus Torvalds


On Thu, 8 Mar 2007, Jeremy Fitzhardinge wrote:
 
 The difficulty is that there are 4 different formats of signal structure
 you could get: (traditional|siginfo) x (32bit|64bit).

See my suggestion of a fixed-format (and much cleaned up) pseudo-siginfo 
thing earlier in this thread, and also actually mentioned as a good to 
do in my original email from 2003 that did the original patch ;)

 (Hey, can you send signals by writing into the signalfd?  Very plan9...)

It's an obvious extension. Whether there is any real point to it or not, I 
dunno. After all, you could just send the signal instead.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Oleg Nesterov
On 03/08, Davide Libenzi wrote:

 On Fri, 9 Mar 2007, Oleg Nesterov wrote:
 
   Logic is, if it's not an RT signal, queue only one, otherwise multiple.
   The bit on the -pending mask is clealer only when the queue slot becomes 
   empty.
  
  Yes, I see what the code does, but I don't undestand why. For example, 
  SIGCHLD was
  delivered to the process _and_ handled several times, then 
  sys_signalfd_dequeue()
  comes and finds only one siginfo. Isn't this strange?
 
 That's the same logic the kernel folows for non-RT signals. 

Yes, specific_send_sig_info/__group_send_sig_info ignores the non-RT signal

- if we already have a pending one (blocked, or not yet handled)

- the only reason is backward compatibility

signalfd:

- if the same signr was not fetched via sys_signalfd_dequeue()

- the reason is 

Suppose we are doing a simple application which just logs all signals
which were sent to this process. Now, we can miss the signal if the
logging blocks.

Think about sub-threads. We can send the same non-RT signal to T1, T2, T3
via specific_send_sig_info(). All 3 signals will be delivered, but signalfd
(which is process wide) will see only the first.

   The two trasports can rely on different masks. The signalfd_notify() does 
   not even go in signalfd_deliver() if no signalfds are attached to the 
   sighand.
  
  Sorry, I don't understand. The masks are different, yes, but -sighand is 
  the
  same? How this can make any difference for if no signalfds are attached ?
 
 The list_empty() che would not make you fall inside signalfd_deliver(), 
 hence the fast path really lives up to its name ;)

Ugh. Still can't understand, probably I missed something or misread this patch.

If we shift signalfd_notify() from specific_send_sig_info/__group_send_sig_info
to send_signal(), we have the same list_empty() fastpath if no signalfds are
attached to the sighand. The difference is that we don't count sig_ignored()
signals, which looks right to me.

Oleg.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread Davide Libenzi
On Fri, 9 Mar 2007, Oleg Nesterov wrote:

 Ugh. Still can't understand, probably I missed something or misread this 
 patch.
 
 If we shift signalfd_notify() from 
 specific_send_sig_info/__group_send_sig_info
 to send_signal(), we have the same list_empty() fastpath if no signalfds are
 attached to the sighand. The difference is that we don't count sig_ignored()
 signals, which looks right to me.

Now I shifted the check into send_signal(), since signalfd now uses the 
standard dequeue_signal, and hence compete with standard signal delivery 
against the queue. This was initial Linus design.




- Davide


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 2/5] signalfd v2 - signalfd core ...

2007-03-08 Thread David M. Lloyd
On Wed, 2007-03-07 at 17:21 -0800, Davide Libenzi wrote:
 int signalfd_dequeue(int fd, siginfo_t *info, long timeo);
 
 The fd parameter must ba a signalfd file descriptor. The info parameter
 is a pointer to the siginfo that will receive the dequeued signal, and
 timeo is a timeout in milliseconds, or -1 for infinite.
 The signalfd_dequeue function returns 0 if successfull.

Does this support non-blocking mode?  It doesn't seem to at my level of
understanding anyway.  If I use this with EPOLLET for example, I'd
expect to get a single EPOLLIN when a signal arrives, which would
indicate to me that I must call signalfd_dequeue() in a loop until I get
EAGAIN in order to be sure I've consumed all the outstanding signals so
that the edge-triggered notification can be re-armed.

Make sense?

- DML

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/