Re: PATCH: SysV semaphore race vs SIGSTOP

2005-02-05 Thread Ove Kaaven
lør, 05,.02.2005 kl. 09.37 +0100, skrev Manfred Spraul:
> Hi Ove,
> 
> >As I mentioned in an earlier mail, there is a race when SIGSTOP-ing a
> >process waiting for a SysV semaphore, where if a process holding a
> >semaphore suspends another process waiting on the semaphore and then
> >releases the semaphore, 
> >
> Your patch looks correct (for 2.4 - 2.6 uses a different approach), but 

Actually I myself don't think the patch I sent is 100% correct. The most
glaring problem is of course that it only handles SIGSTOP and nothing
else, but I wasn't sure how to handle anything else. Also, I've since
found that the "continue" in there made it impossible to attach
debuggers or the like to a process blocked in semop. Finally, it would
occasionally continue to loop after the condition is satisfied, e.g.
when the semop is 0.

So, the if check in my patch is more correct as

+   if (is_stopping(current) && queue.status == 1)
+   /* Could either EINTR out or continue.
+* I suppose EINTR is the robust choice. */
+   queue.status = -EINTR;

but that still doesn't handle all signals. I'm thinking that if the
SIGSTOP problem can be solved simply by returning EINTR (and let the
userspace code deal with that by retrying), then perhaps that can be
done for all signals.

Summarily, my "perfect" patch for 2.4 might simply look like this
(assuming signal_pending does the right thing, which I haven't
verified):

--- ipc/sem.c.original  2005-01-31 18:17:17.0 -0500
+++ ipc/sem.c   2005-02-06 01:52:01.0 -0500
@@ -961,6 +961,9 @@
error = -EIDRM;
goto out_free;
}
+   if (signal_pending(current) && queue.status == 1)
+   queue.status = -EINTR;
+
/*
 * If queue.status == 1 we where woken up and
 * have to retry else we simply return.


As for 2.6, yes, the 2.6 code does seem to be harder to make a simple
patch for. Then again, we're thinking of using futexes instead of
semaphores if a 2.6 kernel is detected, so if futexes don't have this
problem, I guess we can use them to work around this semaphore flaw.

> I'm not certain that it's needed:
> You assume that signals have an immediate effect.

I don't necessarily need it to have an "immediate" effect. Only that
pending signals are taken into consideration when the process returns
from certain blocking system calls dealing with atomic synchronization
primitives such as semaphores. I want to be able to think that
synchronization primitives are there to *protect* me against the effects
of implementation details such as delayed signal delivery, not that they
have their own synchronization issues. After all, the man page for
"semop" states that

* The calling process catches a signal: the value of semzcnt is
decremented and semop() fails, with errno set to EINTR.

and I want to be able to expect this to always actually happen. Because
semop is supposed to be an *atomic* operation, I don't expect it to
*both* catch a signal *and* succeed within the same semop system call
(knowing that the signal *must* have been sent by the time the semaphore
condition was fulfilled). See where I'm coming from?

> Linux ignores that - it delays signal processing under some 
> circumstances. If a syscall can be completed without blocking, then the 
> syscall is handled, regardless of any pending signals. The signal is 
> handled at the syscall return, i.e. after completing the syscall.
> That's not just in SysV semaphore - at least pipes are identical:

Perhaps. But at least read() doesn't claim to be an atomic
synchronization primitive like semop() does. Though I suppose it's
occasionally used that way...

> pipe_read first check if there is data. If there is some, then it 
> returns the data. Signals are only checked if there is no pending data.
> I'm not sure if this is a bug. But if it's one, then far more than just 
> sysv sem must be updated.

I'd probably only concern myself with things that are supposed to be
atomic, really.

Anyway, thanks a lot for answering.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH: SysV semaphore race vs SIGSTOP

2005-02-05 Thread Ove Kaaven
lør, 05,.02.2005 kl. 09.37 +0100, skrev Manfred Spraul:
 Hi Ove,
 
 As I mentioned in an earlier mail, there is a race when SIGSTOP-ing a
 process waiting for a SysV semaphore, where if a process holding a
 semaphore suspends another process waiting on the semaphore and then
 releases the semaphore, 
 
 Your patch looks correct (for 2.4 - 2.6 uses a different approach), but 

Actually I myself don't think the patch I sent is 100% correct. The most
glaring problem is of course that it only handles SIGSTOP and nothing
else, but I wasn't sure how to handle anything else. Also, I've since
found that the continue in there made it impossible to attach
debuggers or the like to a process blocked in semop. Finally, it would
occasionally continue to loop after the condition is satisfied, e.g.
when the semop is 0.

So, the if check in my patch is more correct as

+   if (is_stopping(current)  queue.status == 1)
+   /* Could either EINTR out or continue.
+* I suppose EINTR is the robust choice. */
+   queue.status = -EINTR;

but that still doesn't handle all signals. I'm thinking that if the
SIGSTOP problem can be solved simply by returning EINTR (and let the
userspace code deal with that by retrying), then perhaps that can be
done for all signals.

Summarily, my perfect patch for 2.4 might simply look like this
(assuming signal_pending does the right thing, which I haven't
verified):

--- ipc/sem.c.original  2005-01-31 18:17:17.0 -0500
+++ ipc/sem.c   2005-02-06 01:52:01.0 -0500
@@ -961,6 +961,9 @@
error = -EIDRM;
goto out_free;
}
+   if (signal_pending(current)  queue.status == 1)
+   queue.status = -EINTR;
+
/*
 * If queue.status == 1 we where woken up and
 * have to retry else we simply return.


As for 2.6, yes, the 2.6 code does seem to be harder to make a simple
patch for. Then again, we're thinking of using futexes instead of
semaphores if a 2.6 kernel is detected, so if futexes don't have this
problem, I guess we can use them to work around this semaphore flaw.

 I'm not certain that it's needed:
 You assume that signals have an immediate effect.

I don't necessarily need it to have an immediate effect. Only that
pending signals are taken into consideration when the process returns
from certain blocking system calls dealing with atomic synchronization
primitives such as semaphores. I want to be able to think that
synchronization primitives are there to *protect* me against the effects
of implementation details such as delayed signal delivery, not that they
have their own synchronization issues. After all, the man page for
semop states that

* The calling process catches a signal: the value of semzcnt is
decremented and semop() fails, with errno set to EINTR.

and I want to be able to expect this to always actually happen. Because
semop is supposed to be an *atomic* operation, I don't expect it to
*both* catch a signal *and* succeed within the same semop system call
(knowing that the signal *must* have been sent by the time the semaphore
condition was fulfilled). See where I'm coming from?

 Linux ignores that - it delays signal processing under some 
 circumstances. If a syscall can be completed without blocking, then the 
 syscall is handled, regardless of any pending signals. The signal is 
 handled at the syscall return, i.e. after completing the syscall.
 That's not just in SysV semaphore - at least pipes are identical:

Perhaps. But at least read() doesn't claim to be an atomic
synchronization primitive like semop() does. Though I suppose it's
occasionally used that way...

 pipe_read first check if there is data. If there is some, then it 
 returns the data. Signals are only checked if there is no pending data.
 I'm not sure if this is a bug. But if it's one, then far more than just 
 sysv sem must be updated.

I'd probably only concern myself with things that are supposed to be
atomic, really.

Anyway, thanks a lot for answering.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PATCH: SysV semaphore race vs SIGSTOP

2005-01-31 Thread Ove Kaaven
As I mentioned in an earlier mail, there is a race when SIGSTOP-ing a
process waiting for a SysV semaphore, where if a process holding a
semaphore suspends another process waiting on the semaphore and then
releases the semaphore, the suspended process might still get the
semaphore, resulting in a deadlock. My sample test program is at
http://www.ping.uio.no/~ovehk/sembug.c

Now I've written a patch for this which seems to work for me. It is
against 2.4.27, but the semaphore code doesn't seem to change often. And
please Cc me on any questions or comments, since I'm not subscribed.
Here is the patch:

--- ipc/sem.c.original  2005-01-31 18:17:17.0 -0500
+++ ipc/sem.c   2005-01-31 18:17:34.0 -0500
@@ -307,6 +307,18 @@
return result;
 }
 
+static int is_stopping(struct task_struct *t)
+{
+   if (t->state == TASK_STOPPED) {
+   /* shouldn't happen */
+   return 1;
+   }
+   if (sigismember(>pending.signal, SIGSTOP)) {
+   return 1;
+   }
+   return 0;
+}
+
 /* Go through the pending queue for the indicated semaphore
  * looking for tasks that can be completed.
  */
@@ -961,6 +973,12 @@
error = -EIDRM;
goto out_free;
}
+
+   if (is_stopping(current))
+   /* Could either EINTR out or continue.
+* Currently I've chosen to continue */
+   continue;
+
/*
 * If queue.status == 1 we where woken up and
 * have to retry else we simply return.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PATCH: SysV semaphore race vs SIGSTOP

2005-01-31 Thread Ove Kaaven
As I mentioned in an earlier mail, there is a race when SIGSTOP-ing a
process waiting for a SysV semaphore, where if a process holding a
semaphore suspends another process waiting on the semaphore and then
releases the semaphore, the suspended process might still get the
semaphore, resulting in a deadlock. My sample test program is at
http://www.ping.uio.no/~ovehk/sembug.c

Now I've written a patch for this which seems to work for me. It is
against 2.4.27, but the semaphore code doesn't seem to change often. And
please Cc me on any questions or comments, since I'm not subscribed.
Here is the patch:

--- ipc/sem.c.original  2005-01-31 18:17:17.0 -0500
+++ ipc/sem.c   2005-01-31 18:17:34.0 -0500
@@ -307,6 +307,18 @@
return result;
 }
 
+static int is_stopping(struct task_struct *t)
+{
+   if (t-state == TASK_STOPPED) {
+   /* shouldn't happen */
+   return 1;
+   }
+   if (sigismember(t-pending.signal, SIGSTOP)) {
+   return 1;
+   }
+   return 0;
+}
+
 /* Go through the pending queue for the indicated semaphore
  * looking for tasks that can be completed.
  */
@@ -961,6 +973,12 @@
error = -EIDRM;
goto out_free;
}
+
+   if (is_stopping(current))
+   /* Could either EINTR out or continue.
+* Currently I've chosen to continue */
+   continue;
+
/*
 * If queue.status == 1 we where woken up and
 * have to retry else we simply return.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PROBLEM: SysV semaphore race vs SIGSTOP

2005-01-28 Thread Ove Kaaven
There seem to be a race when SIGSTOP-ing a process waiting for a SysV
semaphore. Even if it could not possibly have owned the semaphore when
the signal was sent (because the sender of the signal owned it at the
time), it still occasionally happens that it both stops execution *and*
acquires the semaphore, with a deadlocked application as the result.
This is a problem for some of the high-performance stuff I'm working on.

A sample test program exhibiting the problem is available at
http://www.ping.uio.no/~ovehk/sembug.c

For me, it will show "ACQUIRE FAILED!! DEADLOCK!!" almost every time I
run it. Occasionally it will run fine; if it does for you, just try
again a couple of times.

The kernel I currently use is:

Linux version 2.4.27-1-k7 ([EMAIL PROTECTED]) (gcc
version 3.3.5 (Debian 1:3.3.5-2)) #1 Wed Dec 1 20:12:01 JST 2004

and I run it on a uniprocessor system (AMD Athlon, 1.9GHz) with Debian
"sid" installed.

I'm not a kernel hacker, but from a quick peruse of the 2.4 code, it
didn't seem to me like the semaphore code in the kernel (ipc/sem.c) even
try to handle suspended threads (though I wouldn't know how to do so).
The 2.6 semaphore code looked almost the same to me, too, so it might be
a problem there as well.

Please Cc me on any questions or comments, since I am too wimpy to
subscribe yet.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


PROBLEM: SysV semaphore race vs SIGSTOP

2005-01-28 Thread Ove Kaaven
There seem to be a race when SIGSTOP-ing a process waiting for a SysV
semaphore. Even if it could not possibly have owned the semaphore when
the signal was sent (because the sender of the signal owned it at the
time), it still occasionally happens that it both stops execution *and*
acquires the semaphore, with a deadlocked application as the result.
This is a problem for some of the high-performance stuff I'm working on.

A sample test program exhibiting the problem is available at
http://www.ping.uio.no/~ovehk/sembug.c

For me, it will show ACQUIRE FAILED!! DEADLOCK!! almost every time I
run it. Occasionally it will run fine; if it does for you, just try
again a couple of times.

The kernel I currently use is:

Linux version 2.4.27-1-k7 ([EMAIL PROTECTED]) (gcc
version 3.3.5 (Debian 1:3.3.5-2)) #1 Wed Dec 1 20:12:01 JST 2004

and I run it on a uniprocessor system (AMD Athlon, 1.9GHz) with Debian
sid installed.

I'm not a kernel hacker, but from a quick peruse of the 2.4 code, it
didn't seem to me like the semaphore code in the kernel (ipc/sem.c) even
try to handle suspended threads (though I wouldn't know how to do so).
The 2.6 semaphore code looked almost the same to me, too, so it might be
a problem there as well.

Please Cc me on any questions or comments, since I am too wimpy to
subscribe yet.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/