Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Thomas Gleixner
On Wed, 2007-06-06 at 01:00 +0400, Alexey Kuznetsov wrote:
> Hello!
> 
> > We actually need to do something about this, as we might loop for ever
> > there. The robust cleanup code can fail (e.g. due to list corruption)
> > and we would see exit_state != 0 and the OWNER_DIED bit would never be
> > set, so we are stuck in a busy loop.
> 
> Yes...
> 
> It is possible to take read_lock(_lock) before:

We really want to avoid the tasklist_lock at all. It's not trivial, but
I think it's doable. I send you a patch to test tomorrow morning when my
brain recovered from looking at that code :)

tglx



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Alexey Kuznetsov
Hello!

> We actually need to do something about this, as we might loop for ever
> there. The robust cleanup code can fail (e.g. due to list corruption)
> and we would see exit_state != 0 and the OWNER_DIED bit would never be
> set, so we are stuck in a busy loop.

Yes...

It is possible to take read_lock(_lock) before:

inc_preempt_count();
curval = futex_atomic_cmpxchg_inatomic(uaddr, uval, newval);
dec_preempt_count();

and drop it after lookup_pi_state().

In this case exiting task will set FUTEX_OWNER_DIED, but will
spin in exit_notify(), we will find valid pi_state and go slow path
taking rtmutex.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Thomas Gleixner
On Tue, 2007-06-05 at 20:48 +0200, Thomas Gleixner wrote:
> > > This does not really explain, why you do prevent the -ESRCH return value
> > > in the next cycle,
> > 
> > Because right curval is refetched, it already has FUTEX_OWNER_DIED bit set
> > and we succesfully take the lock.
> 
> Ok, handle_futex_death() is punching the OWNER_DIED bit into the futex
> without the hash bucket lock. We might as well grab the hash bucket lock
> right there to avoid this. I look for a sane solution.

We actually need to do something about this, as we might loop for ever
there. The robust cleanup code can fail (e.g. due to list corruption)
and we would see exit_state != 0 and the OWNER_DIED bit would never be
set, so we are stuck in a busy loop. I think I have an idea how to fix
this.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Thomas Gleixner
On Tue, 2007-06-05 at 21:39 +0400, Alexey Kuznetsov wrote:
> Hello!
> 
> > Hmm, what means not expected ? -ESRCH is returned, when the owner task
> > is not found. 
> 
> This is not supposed to happen with robust futexes.

Hmm, right.

> > This does not really explain, why you do prevent the -ESRCH return value
> > in the next cycle,
> 
> Because right curval is refetched, it already has FUTEX_OWNER_DIED bit set
> and we succesfully take the lock.

Ok, handle_futex_death() is punching the OWNER_DIED bit into the futex
without the hash bucket lock. We might as well grab the hash bucket lock
right there to avoid this. I look for a sane solution.

> > The rtmutex code only returns -EDEADLK, when the lock is already held by
> > the task
> 
> This case.

Sorry, I was not clear here: not the user space lock, the rtmutex must
be held or a deadlock situation against another rtmutex must be
detected. There is no way that the exiting code assigns the owner ship
of the rtmutex. It solely calls rtmutex_unlock() which makes the highest
priority waiter the _PENDING_ owner, which means the pending owner needs
to acquire it for real. 

> You need run only tst-robustpi8 in loop. It should be triggered quickly,
> a few of minutes on 8-way smp here.

My largest box is a 4 way and it runs since hours in a while true loop.

> If you want, I can insert some debugging printks, which you need,
> and run the test here.

I fix up some things in the code first and then I'll add a couple of
debugs to nail this EDEADLK problem.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Alexey Kuznetsov
Hello!

> Hmm, what means not expected ? -ESRCH is returned, when the owner task
> is not found. 

This is not supposed to happen with robust futexes.

glibs aborts (which is correct), or for build with disabled debugging
enters simulated deadlock (which is confusing).


> lock. Also using uval is wrong.

Yup. You are right.

This means those RETRY messages could be spurious. I must rerun the test.


> This does not really explain, why you do prevent the -ESRCH return value
> in the next cycle,

Because right curval is refetched, it already has FUTEX_OWNER_DIED bit set
and we succesfully take the lock.


> No, -EDEADLK is returned from here:
> 
>   ret = rt_mutex_timed_lock(_state->pi_mutex, to, 1);

Of course. It is the only place where ret is set. :-)

> 
> The rtmutex code only returns -EDEADLK, when the lock is already held by
> the task

This case.

> The fix is to remove it and to find the real cause of the problem.
> 
> I'm running the glibc tests since hours w/o tripping into it.

OK.

You need run only tst-robustpi8 in loop. It should be triggered quickly,
a few of minutes on 8-way smp here.

If you want, I can insert some debugging printks, which you need,
and run the test here.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Thomas Gleixner
Hi,

On Wed, 2007-05-23 at 15:51 +0400, Alexey Kuznetsov wrote:
> The first chunk: results in self-inflicted deadlock inside glibc.
> Sometimes futex_lock_pi returns -ESRCH, when it is not expected
> and glibc enters to for(;;) sleep() to simulate deadlock. This problem
> is quite obvious and I think the patch is right. Though it looks like
> each "if" in futex_lock_pi() got some stupid special case "else if". :-)

Hmm, what means not expected ? -ESRCH is returned, when the owner task
is not found. 

> + } else if (ret == -ESRCH) {
> + /* Process could exit right now, so that robust list
> +  * was processed after we got uval. Retry. */
> + pagefault_disable();
> + curval = futex_atomic_cmpxchg_inatomic(uaddr,
> +uval, uval);

When we retrieve uval, we hold the hash bucket lock and we keep it down
to this point, so the robust list processing is stuck on the hash bucket
lock. Also using uval is wrong. The user space variable holds "newval",
which was written there before:

 curval = cmpxchg_futex_value_locked(uaddr, uval, newval); 

So you just go back up to retry_locked or into the fault handling path
(which is unlikely).

This does not really explain, why you do prevent the -ESRCH return value
in the next cycle, except for the case, where we go through the fault
handling path.

> The second chunk: sometimes futex_lock_pi() returns -EDEADLK,
> when nobody has the lock. The reason is also obvious (see comment
> in the patch), but correct fix is far beyond my comprehension.
> I guess someone already saw this, the chunk:
> 
> if (rt_mutex_trylock(_state->pi_mutex))
> ret = 0;
> 
> is obviously from the same opera. But it does not work, because the
> rtmutex is really taken at this point: wake_futex_pi() of previous
> owner reassigned it to us.

No, -EDEADLK is returned from here:

ret = rt_mutex_timed_lock(_state->pi_mutex, to, 1);

The rtmutex code only returns -EDEADLK, when the lock is already held by
the task or when it runs into a circular rtmutex dependency (e.g. a ABBA
deadlock) during the PI-walk.

Also wake_futex_pi() does not assign the ownership of the rtmutex, it
assigns the ownership of the futex pi state and unlocks the rtmutex,
which sets the pending owner. The pending owner is the highest priority
waiter. The ownership assignment of the rtmutex happens inside the
rtmutex code, not in wake_futex_pi.

The trylock is covering the following situation:

rt_mutex_timed_lock(_state->pi_mutex, to, 1) returns -ETIMEOUT;

Now it get's blocked on:
spin_lock(q.lock_ptr);

because on the other CPU the futex is unlocked. The rt-mutex code does
not find a waiter and unlocks the rtmutex with no new owner. The hack
bucket lock is released and the task which is blocked on the hash bucket
lock has a chance to grab it.

>  My fix works. But it looks very stupid.
> I would think about removal of shift of ownership in wake_futex_pi()
> and making all the work in context of process taking lock.

I have no clue why it works. The only explanation is, that an existing
deadlock is resolved by the exiting task.

>   if (ret && q.pi_state->owner == curr) {

This happens only, when the rtmutex was unlocked by another task between
the return from the rtmutex code and reacquiring the hash bucket lock.

>   if (rt_mutex_trylock(_state->pi_mutex))
>   ret = 0;

If we can get the rtmutex here, we are the owner of the lock and need to
fix up ownership in the user space variable further down.

> + /* Holy crap... Now futex lock returns -EDEADLK
> +  * sometimes, because ownership was passed to us while
> +  * unlock of previous owner. Who wrote this?

I admit, that I was one of the authors :)

> +  * Please, fix this correctly. For now:
> +  */

The fix is to remove it and to find the real cause of the problem.

I'm running the glibc tests since hours w/o tripping into it.

tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Thomas Gleixner
Hi,

On Wed, 2007-05-23 at 15:51 +0400, Alexey Kuznetsov wrote:
 The first chunk: results in self-inflicted deadlock inside glibc.
 Sometimes futex_lock_pi returns -ESRCH, when it is not expected
 and glibc enters to for(;;) sleep() to simulate deadlock. This problem
 is quite obvious and I think the patch is right. Though it looks like
 each if in futex_lock_pi() got some stupid special case else if. :-)

Hmm, what means not expected ? -ESRCH is returned, when the owner task
is not found. 

 + } else if (ret == -ESRCH) {
 + /* Process could exit right now, so that robust list
 +  * was processed after we got uval. Retry. */
 + pagefault_disable();
 + curval = futex_atomic_cmpxchg_inatomic(uaddr,
 +uval, uval);

When we retrieve uval, we hold the hash bucket lock and we keep it down
to this point, so the robust list processing is stuck on the hash bucket
lock. Also using uval is wrong. The user space variable holds newval,
which was written there before:

 curval = cmpxchg_futex_value_locked(uaddr, uval, newval); 

So you just go back up to retry_locked or into the fault handling path
(which is unlikely).

This does not really explain, why you do prevent the -ESRCH return value
in the next cycle, except for the case, where we go through the fault
handling path.

 The second chunk: sometimes futex_lock_pi() returns -EDEADLK,
 when nobody has the lock. The reason is also obvious (see comment
 in the patch), but correct fix is far beyond my comprehension.
 I guess someone already saw this, the chunk:
 
 if (rt_mutex_trylock(q.pi_state-pi_mutex))
 ret = 0;
 
 is obviously from the same opera. But it does not work, because the
 rtmutex is really taken at this point: wake_futex_pi() of previous
 owner reassigned it to us.

No, -EDEADLK is returned from here:

ret = rt_mutex_timed_lock(q.pi_state-pi_mutex, to, 1);

The rtmutex code only returns -EDEADLK, when the lock is already held by
the task or when it runs into a circular rtmutex dependency (e.g. a ABBA
deadlock) during the PI-walk.

Also wake_futex_pi() does not assign the ownership of the rtmutex, it
assigns the ownership of the futex pi state and unlocks the rtmutex,
which sets the pending owner. The pending owner is the highest priority
waiter. The ownership assignment of the rtmutex happens inside the
rtmutex code, not in wake_futex_pi.

The trylock is covering the following situation:

rt_mutex_timed_lock(q.pi_state-pi_mutex, to, 1) returns -ETIMEOUT;

Now it get's blocked on:
spin_lock(q.lock_ptr);

because on the other CPU the futex is unlocked. The rt-mutex code does
not find a waiter and unlocks the rtmutex with no new owner. The hack
bucket lock is released and the task which is blocked on the hash bucket
lock has a chance to grab it.

  My fix works. But it looks very stupid.
 I would think about removal of shift of ownership in wake_futex_pi()
 and making all the work in context of process taking lock.

I have no clue why it works. The only explanation is, that an existing
deadlock is resolved by the exiting task.

   if (ret  q.pi_state-owner == curr) {

This happens only, when the rtmutex was unlocked by another task between
the return from the rtmutex code and reacquiring the hash bucket lock.

   if (rt_mutex_trylock(q.pi_state-pi_mutex))
   ret = 0;

If we can get the rtmutex here, we are the owner of the lock and need to
fix up ownership in the user space variable further down.

 + /* Holy crap... Now futex lock returns -EDEADLK
 +  * sometimes, because ownership was passed to us while
 +  * unlock of previous owner. Who wrote this?

I admit, that I was one of the authors :)

 +  * Please, fix this correctly. For now:
 +  */

The fix is to remove it and to find the real cause of the problem.

I'm running the glibc tests since hours w/o tripping into it.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Alexey Kuznetsov
Hello!

 Hmm, what means not expected ? -ESRCH is returned, when the owner task
 is not found. 

This is not supposed to happen with robust futexes.

glibs aborts (which is correct), or for build with disabled debugging
enters simulated deadlock (which is confusing).


 lock. Also using uval is wrong.

Yup. You are right.

This means those RETRY messages could be spurious. I must rerun the test.


 This does not really explain, why you do prevent the -ESRCH return value
 in the next cycle,

Because right curval is refetched, it already has FUTEX_OWNER_DIED bit set
and we succesfully take the lock.


 No, -EDEADLK is returned from here:
 
   ret = rt_mutex_timed_lock(q.pi_state-pi_mutex, to, 1);

Of course. It is the only place where ret is set. :-)

 
 The rtmutex code only returns -EDEADLK, when the lock is already held by
 the task

This case.

 The fix is to remove it and to find the real cause of the problem.
 
 I'm running the glibc tests since hours w/o tripping into it.

OK.

You need run only tst-robustpi8 in loop. It should be triggered quickly,
a few of minutes on 8-way smp here.

If you want, I can insert some debugging printks, which you need,
and run the test here.

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Thomas Gleixner
On Tue, 2007-06-05 at 21:39 +0400, Alexey Kuznetsov wrote:
 Hello!
 
  Hmm, what means not expected ? -ESRCH is returned, when the owner task
  is not found. 
 
 This is not supposed to happen with robust futexes.

Hmm, right.

  This does not really explain, why you do prevent the -ESRCH return value
  in the next cycle,
 
 Because right curval is refetched, it already has FUTEX_OWNER_DIED bit set
 and we succesfully take the lock.

Ok, handle_futex_death() is punching the OWNER_DIED bit into the futex
without the hash bucket lock. We might as well grab the hash bucket lock
right there to avoid this. I look for a sane solution.

  The rtmutex code only returns -EDEADLK, when the lock is already held by
  the task
 
 This case.

Sorry, I was not clear here: not the user space lock, the rtmutex must
be held or a deadlock situation against another rtmutex must be
detected. There is no way that the exiting code assigns the owner ship
of the rtmutex. It solely calls rtmutex_unlock() which makes the highest
priority waiter the _PENDING_ owner, which means the pending owner needs
to acquire it for real. 

 You need run only tst-robustpi8 in loop. It should be triggered quickly,
 a few of minutes on 8-way smp here.

My largest box is a 4 way and it runs since hours in a while true loop.

 If you want, I can insert some debugging printks, which you need,
 and run the test here.

I fix up some things in the code first and then I'll add a couple of
debugs to nail this EDEADLK problem.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Thomas Gleixner
On Tue, 2007-06-05 at 20:48 +0200, Thomas Gleixner wrote:
   This does not really explain, why you do prevent the -ESRCH return value
   in the next cycle,
  
  Because right curval is refetched, it already has FUTEX_OWNER_DIED bit set
  and we succesfully take the lock.
 
 Ok, handle_futex_death() is punching the OWNER_DIED bit into the futex
 without the hash bucket lock. We might as well grab the hash bucket lock
 right there to avoid this. I look for a sane solution.

We actually need to do something about this, as we might loop for ever
there. The robust cleanup code can fail (e.g. due to list corruption)
and we would see exit_state != 0 and the OWNER_DIED bit would never be
set, so we are stuck in a busy loop. I think I have an idea how to fix
this.

tglx


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Alexey Kuznetsov
Hello!

 We actually need to do something about this, as we might loop for ever
 there. The robust cleanup code can fail (e.g. due to list corruption)
 and we would see exit_state != 0 and the OWNER_DIED bit would never be
 set, so we are stuck in a busy loop.

Yes...

It is possible to take read_lock(tasklist_lock) before:

inc_preempt_count();
curval = futex_atomic_cmpxchg_inatomic(uaddr, uval, newval);
dec_preempt_count();

and drop it after lookup_pi_state().

In this case exiting task will set FUTEX_OWNER_DIED, but will
spin in exit_notify(), we will find valid pi_state and go slow path
taking rtmutex.

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-06-05 Thread Thomas Gleixner
On Wed, 2007-06-06 at 01:00 +0400, Alexey Kuznetsov wrote:
 Hello!
 
  We actually need to do something about this, as we might loop for ever
  there. The robust cleanup code can fail (e.g. due to list corruption)
  and we would see exit_state != 0 and the OWNER_DIED bit would never be
  set, so we are stuck in a busy loop.
 
 Yes...
 
 It is possible to take read_lock(tasklist_lock) before:

We really want to avoid the tasklist_lock at all. It's not trivial, but
I think it's doable. I send you a patch to test tomorrow morning when my
brain recovered from looking at that code :)

tglx



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-05-23 Thread Alexey Kuznetsov
Hello!

> #2 crash be explained via any of the bugs you fixed? (i.e. memory
> corruption?)

Yes, I found the reason, it is really fixed by taking tasklist_lock.
This happens after task struct with not cleared pi_state_list is freed
and the list of futex_pi_state's is corrupted.


Meanwhile... two more bugs were found.

The first chunk: results in self-inflicted deadlock inside glibc.
Sometimes futex_lock_pi returns -ESRCH, when it is not expected
and glibc enters to for(;;) sleep() to simulate deadlock. This problem
is quite obvious and I think the patch is right. Though it looks like
each "if" in futex_lock_pi() got some stupid special case "else if". :-)


The second chunk: sometimes futex_lock_pi() returns -EDEADLK,
when nobody has the lock. The reason is also obvious (see comment
in the patch), but correct fix is far beyond my comprehension.
I guess someone already saw this, the chunk:

if (rt_mutex_trylock(_state->pi_mutex))
ret = 0;

is obviously from the same opera. But it does not work, because the
rtmutex is really taken at this point: wake_futex_pi() of previous
owner reassigned it to us. My fix works. But it looks very stupid.
I would think about removal of shift of ownership in wake_futex_pi()
and making all the work in context of process taking lock.

Both bugs show up when running glibc's tst-robustpi8 long enough.

Yes, all this about pre May 8 futexes. Seems, updates did not change
anything in logic, but I am not sure.


--- kernel/futex.c.intermediate 2007-05-23 14:48:27.0 +0400
+++ kernel/futex.c  2007-05-23 14:58:06.0 +0400
@@ -1244,8 +1244,21 @@ static int futex_lock_pi(u32 __user *uad
if (unlikely(curval != uval))
goto retry_locked;
ret = 0;
-   }
-   goto out_unlock_release_sem;
+   } else if (ret == -ESRCH) {
+   /* Process could exit right now, so that robust list
+* was processed after we got uval. Retry. */
+   pagefault_disable();
+   curval = futex_atomic_cmpxchg_inatomic(uaddr,
+  uval, uval);
+   pagefault_enable();
+   if (unlikely(curval == -EFAULT))
+   goto uaddr_faulted;
+   if (unlikely(curval != uval)) {
+   printk("RETRY %x %x %x\n", current->pid, uval, 
curval);
+   goto retry_locked;
+   }
+   }
+   goto out_unlock_release_sem;
}
 
/*
@@ -1361,6 +1374,22 @@ static int futex_lock_pi(u32 __user *uad
if (ret && q.pi_state->owner == curr) {
if (rt_mutex_trylock(_state->pi_mutex))
ret = 0;
+   /* Holy crap... Now futex lock returns -EDEADLK
+* sometimes, because ownership was passed to us while
+* unlock of previous owner. Who wrote this?
+* Please, fix this correctly. For now:
+*/
+   if (ret == -EDEADLK) {
+   pagefault_disable();
+   uval = futex_atomic_cmpxchg_inatomic(uaddr,
+0, 0);
+   pagefault_enable();
+   if (uval != -EFAULT &&
+   (uval_TID_MASK) == current->pid) {
+   printk("ALERT1 %x\n", uval);
+   ret = 0;
+   }
+   }
}
/* Unqueue and drop the lock */
unqueue_me_pi(, hb);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-05-23 Thread Ingo Molnar

* Alexey Kuznetsov <[EMAIL PROTECTED]> wrote:

> Hello!
> 
> 1. New entries can be added to tsk->pi_state_list after task completed
>exit_pi_state_list(). The result is memory leakage and deadlocks.
> 
> 2. handle_mm_fault() is called under spinlock. The result is obvious.
> 
> 3. State machine is broken. Kernel thinks it owns futex after
>it released all the locks. Ergo, it corrupts futex. The result is that
>two processes think they took a futex.
> 
> All the bugs are trivially reproduced when running glibc's tst-robustpi7
> test long enough.
> 
> The patch is not quite good (RFC!), because:
> 
> 1. There is one case, when I did not figure out how to handle
>page fault correctly. I would do it releasing taken rtmutex
>and hb->lock and retrying futex from the very beginning.
>It is quite ugly. Probably, state machine can be fixed somehow.
> 
> 2. Before this patch I had one unexplained oops inside rtmutex
>in plist_del. I did _not_ fix this, but it does not want to reproduce.
>Probably, more strong locking did some race window too narrow.

thanks for the fixes - they look all good and we'll check it in -rt. 
We'll try to find a solution for the remaining problem too. Could your
#2 crash be explained via any of the bugs you fixed? (i.e. memory
corruption?) I'd exclude genuine rtmutex.c breakage for now because 
that's the basis of all locking in -rt - but maybe the futex interfacing 
upsets something ...

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-05-23 Thread Ingo Molnar

* Alexey Kuznetsov [EMAIL PROTECTED] wrote:

 Hello!
 
 1. New entries can be added to tsk-pi_state_list after task completed
exit_pi_state_list(). The result is memory leakage and deadlocks.
 
 2. handle_mm_fault() is called under spinlock. The result is obvious.
 
 3. State machine is broken. Kernel thinks it owns futex after
it released all the locks. Ergo, it corrupts futex. The result is that
two processes think they took a futex.
 
 All the bugs are trivially reproduced when running glibc's tst-robustpi7
 test long enough.
 
 The patch is not quite good (RFC!), because:
 
 1. There is one case, when I did not figure out how to handle
page fault correctly. I would do it releasing taken rtmutex
and hb-lock and retrying futex from the very beginning.
It is quite ugly. Probably, state machine can be fixed somehow.
 
 2. Before this patch I had one unexplained oops inside rtmutex
in plist_del. I did _not_ fix this, but it does not want to reproduce.
Probably, more strong locking did some race window too narrow.

thanks for the fixes - they look all good and we'll check it in -rt. 
We'll try to find a solution for the remaining problem too. Could your
#2 crash be explained via any of the bugs you fixed? (i.e. memory
corruption?) I'd exclude genuine rtmutex.c breakage for now because 
that's the basis of all locking in -rt - but maybe the futex interfacing 
upsets something ...

Ingo
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] muptiple bugs in PI futexes

2007-05-23 Thread Alexey Kuznetsov
Hello!

 #2 crash be explained via any of the bugs you fixed? (i.e. memory
 corruption?)

Yes, I found the reason, it is really fixed by taking tasklist_lock.
This happens after task struct with not cleared pi_state_list is freed
and the list of futex_pi_state's is corrupted.


Meanwhile... two more bugs were found.

The first chunk: results in self-inflicted deadlock inside glibc.
Sometimes futex_lock_pi returns -ESRCH, when it is not expected
and glibc enters to for(;;) sleep() to simulate deadlock. This problem
is quite obvious and I think the patch is right. Though it looks like
each if in futex_lock_pi() got some stupid special case else if. :-)


The second chunk: sometimes futex_lock_pi() returns -EDEADLK,
when nobody has the lock. The reason is also obvious (see comment
in the patch), but correct fix is far beyond my comprehension.
I guess someone already saw this, the chunk:

if (rt_mutex_trylock(q.pi_state-pi_mutex))
ret = 0;

is obviously from the same opera. But it does not work, because the
rtmutex is really taken at this point: wake_futex_pi() of previous
owner reassigned it to us. My fix works. But it looks very stupid.
I would think about removal of shift of ownership in wake_futex_pi()
and making all the work in context of process taking lock.

Both bugs show up when running glibc's tst-robustpi8 long enough.

Yes, all this about pre May 8 futexes. Seems, updates did not change
anything in logic, but I am not sure.


--- kernel/futex.c.intermediate 2007-05-23 14:48:27.0 +0400
+++ kernel/futex.c  2007-05-23 14:58:06.0 +0400
@@ -1244,8 +1244,21 @@ static int futex_lock_pi(u32 __user *uad
if (unlikely(curval != uval))
goto retry_locked;
ret = 0;
-   }
-   goto out_unlock_release_sem;
+   } else if (ret == -ESRCH) {
+   /* Process could exit right now, so that robust list
+* was processed after we got uval. Retry. */
+   pagefault_disable();
+   curval = futex_atomic_cmpxchg_inatomic(uaddr,
+  uval, uval);
+   pagefault_enable();
+   if (unlikely(curval == -EFAULT))
+   goto uaddr_faulted;
+   if (unlikely(curval != uval)) {
+   printk(RETRY %x %x %x\n, current-pid, uval, 
curval);
+   goto retry_locked;
+   }
+   }
+   goto out_unlock_release_sem;
}
 
/*
@@ -1361,6 +1374,22 @@ static int futex_lock_pi(u32 __user *uad
if (ret  q.pi_state-owner == curr) {
if (rt_mutex_trylock(q.pi_state-pi_mutex))
ret = 0;
+   /* Holy crap... Now futex lock returns -EDEADLK
+* sometimes, because ownership was passed to us while
+* unlock of previous owner. Who wrote this?
+* Please, fix this correctly. For now:
+*/
+   if (ret == -EDEADLK) {
+   pagefault_disable();
+   uval = futex_atomic_cmpxchg_inatomic(uaddr,
+0, 0);
+   pagefault_enable();
+   if (uval != -EFAULT 
+   (uvalFUTEX_TID_MASK) == current-pid) {
+   printk(ALERT1 %x\n, uval);
+   ret = 0;
+   }
+   }
}
/* Unqueue and drop the lock */
unqueue_me_pi(q, hb);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/