Re: [RFC][PATCH] muptiple bugs in PI futexes
On Wed, 2007-06-06 at 01:00 +0400, Alexey Kuznetsov wrote: > Hello! > > > We actually need to do something about this, as we might loop for ever > > there. The robust cleanup code can fail (e.g. due to list corruption) > > and we would see exit_state != 0 and the OWNER_DIED bit would never be > > set, so we are stuck in a busy loop. > > Yes... > > It is possible to take read_lock(_lock) before: We really want to avoid the tasklist_lock at all. It's not trivial, but I think it's doable. I send you a patch to test tomorrow morning when my brain recovered from looking at that code :) tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
Hello! > We actually need to do something about this, as we might loop for ever > there. The robust cleanup code can fail (e.g. due to list corruption) > and we would see exit_state != 0 and the OWNER_DIED bit would never be > set, so we are stuck in a busy loop. Yes... It is possible to take read_lock(_lock) before: inc_preempt_count(); curval = futex_atomic_cmpxchg_inatomic(uaddr, uval, newval); dec_preempt_count(); and drop it after lookup_pi_state(). In this case exiting task will set FUTEX_OWNER_DIED, but will spin in exit_notify(), we will find valid pi_state and go slow path taking rtmutex. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
On Tue, 2007-06-05 at 20:48 +0200, Thomas Gleixner wrote: > > > This does not really explain, why you do prevent the -ESRCH return value > > > in the next cycle, > > > > Because right curval is refetched, it already has FUTEX_OWNER_DIED bit set > > and we succesfully take the lock. > > Ok, handle_futex_death() is punching the OWNER_DIED bit into the futex > without the hash bucket lock. We might as well grab the hash bucket lock > right there to avoid this. I look for a sane solution. We actually need to do something about this, as we might loop for ever there. The robust cleanup code can fail (e.g. due to list corruption) and we would see exit_state != 0 and the OWNER_DIED bit would never be set, so we are stuck in a busy loop. I think I have an idea how to fix this. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
On Tue, 2007-06-05 at 21:39 +0400, Alexey Kuznetsov wrote: > Hello! > > > Hmm, what means not expected ? -ESRCH is returned, when the owner task > > is not found. > > This is not supposed to happen with robust futexes. Hmm, right. > > This does not really explain, why you do prevent the -ESRCH return value > > in the next cycle, > > Because right curval is refetched, it already has FUTEX_OWNER_DIED bit set > and we succesfully take the lock. Ok, handle_futex_death() is punching the OWNER_DIED bit into the futex without the hash bucket lock. We might as well grab the hash bucket lock right there to avoid this. I look for a sane solution. > > The rtmutex code only returns -EDEADLK, when the lock is already held by > > the task > > This case. Sorry, I was not clear here: not the user space lock, the rtmutex must be held or a deadlock situation against another rtmutex must be detected. There is no way that the exiting code assigns the owner ship of the rtmutex. It solely calls rtmutex_unlock() which makes the highest priority waiter the _PENDING_ owner, which means the pending owner needs to acquire it for real. > You need run only tst-robustpi8 in loop. It should be triggered quickly, > a few of minutes on 8-way smp here. My largest box is a 4 way and it runs since hours in a while true loop. > If you want, I can insert some debugging printks, which you need, > and run the test here. I fix up some things in the code first and then I'll add a couple of debugs to nail this EDEADLK problem. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
Hello! > Hmm, what means not expected ? -ESRCH is returned, when the owner task > is not found. This is not supposed to happen with robust futexes. glibs aborts (which is correct), or for build with disabled debugging enters simulated deadlock (which is confusing). > lock. Also using uval is wrong. Yup. You are right. This means those RETRY messages could be spurious. I must rerun the test. > This does not really explain, why you do prevent the -ESRCH return value > in the next cycle, Because right curval is refetched, it already has FUTEX_OWNER_DIED bit set and we succesfully take the lock. > No, -EDEADLK is returned from here: > > ret = rt_mutex_timed_lock(_state->pi_mutex, to, 1); Of course. It is the only place where ret is set. :-) > > The rtmutex code only returns -EDEADLK, when the lock is already held by > the task This case. > The fix is to remove it and to find the real cause of the problem. > > I'm running the glibc tests since hours w/o tripping into it. OK. You need run only tst-robustpi8 in loop. It should be triggered quickly, a few of minutes on 8-way smp here. If you want, I can insert some debugging printks, which you need, and run the test here. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
Hi, On Wed, 2007-05-23 at 15:51 +0400, Alexey Kuznetsov wrote: > The first chunk: results in self-inflicted deadlock inside glibc. > Sometimes futex_lock_pi returns -ESRCH, when it is not expected > and glibc enters to for(;;) sleep() to simulate deadlock. This problem > is quite obvious and I think the patch is right. Though it looks like > each "if" in futex_lock_pi() got some stupid special case "else if". :-) Hmm, what means not expected ? -ESRCH is returned, when the owner task is not found. > + } else if (ret == -ESRCH) { > + /* Process could exit right now, so that robust list > + * was processed after we got uval. Retry. */ > + pagefault_disable(); > + curval = futex_atomic_cmpxchg_inatomic(uaddr, > +uval, uval); When we retrieve uval, we hold the hash bucket lock and we keep it down to this point, so the robust list processing is stuck on the hash bucket lock. Also using uval is wrong. The user space variable holds "newval", which was written there before: curval = cmpxchg_futex_value_locked(uaddr, uval, newval); So you just go back up to retry_locked or into the fault handling path (which is unlikely). This does not really explain, why you do prevent the -ESRCH return value in the next cycle, except for the case, where we go through the fault handling path. > The second chunk: sometimes futex_lock_pi() returns -EDEADLK, > when nobody has the lock. The reason is also obvious (see comment > in the patch), but correct fix is far beyond my comprehension. > I guess someone already saw this, the chunk: > > if (rt_mutex_trylock(_state->pi_mutex)) > ret = 0; > > is obviously from the same opera. But it does not work, because the > rtmutex is really taken at this point: wake_futex_pi() of previous > owner reassigned it to us. No, -EDEADLK is returned from here: ret = rt_mutex_timed_lock(_state->pi_mutex, to, 1); The rtmutex code only returns -EDEADLK, when the lock is already held by the task or when it runs into a circular rtmutex dependency (e.g. a ABBA deadlock) during the PI-walk. Also wake_futex_pi() does not assign the ownership of the rtmutex, it assigns the ownership of the futex pi state and unlocks the rtmutex, which sets the pending owner. The pending owner is the highest priority waiter. The ownership assignment of the rtmutex happens inside the rtmutex code, not in wake_futex_pi. The trylock is covering the following situation: rt_mutex_timed_lock(_state->pi_mutex, to, 1) returns -ETIMEOUT; Now it get's blocked on: spin_lock(q.lock_ptr); because on the other CPU the futex is unlocked. The rt-mutex code does not find a waiter and unlocks the rtmutex with no new owner. The hack bucket lock is released and the task which is blocked on the hash bucket lock has a chance to grab it. > My fix works. But it looks very stupid. > I would think about removal of shift of ownership in wake_futex_pi() > and making all the work in context of process taking lock. I have no clue why it works. The only explanation is, that an existing deadlock is resolved by the exiting task. > if (ret && q.pi_state->owner == curr) { This happens only, when the rtmutex was unlocked by another task between the return from the rtmutex code and reacquiring the hash bucket lock. > if (rt_mutex_trylock(_state->pi_mutex)) > ret = 0; If we can get the rtmutex here, we are the owner of the lock and need to fix up ownership in the user space variable further down. > + /* Holy crap... Now futex lock returns -EDEADLK > + * sometimes, because ownership was passed to us while > + * unlock of previous owner. Who wrote this? I admit, that I was one of the authors :) > + * Please, fix this correctly. For now: > + */ The fix is to remove it and to find the real cause of the problem. I'm running the glibc tests since hours w/o tripping into it. tglx - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
Hi, On Wed, 2007-05-23 at 15:51 +0400, Alexey Kuznetsov wrote: The first chunk: results in self-inflicted deadlock inside glibc. Sometimes futex_lock_pi returns -ESRCH, when it is not expected and glibc enters to for(;;) sleep() to simulate deadlock. This problem is quite obvious and I think the patch is right. Though it looks like each if in futex_lock_pi() got some stupid special case else if. :-) Hmm, what means not expected ? -ESRCH is returned, when the owner task is not found. + } else if (ret == -ESRCH) { + /* Process could exit right now, so that robust list + * was processed after we got uval. Retry. */ + pagefault_disable(); + curval = futex_atomic_cmpxchg_inatomic(uaddr, +uval, uval); When we retrieve uval, we hold the hash bucket lock and we keep it down to this point, so the robust list processing is stuck on the hash bucket lock. Also using uval is wrong. The user space variable holds newval, which was written there before: curval = cmpxchg_futex_value_locked(uaddr, uval, newval); So you just go back up to retry_locked or into the fault handling path (which is unlikely). This does not really explain, why you do prevent the -ESRCH return value in the next cycle, except for the case, where we go through the fault handling path. The second chunk: sometimes futex_lock_pi() returns -EDEADLK, when nobody has the lock. The reason is also obvious (see comment in the patch), but correct fix is far beyond my comprehension. I guess someone already saw this, the chunk: if (rt_mutex_trylock(q.pi_state-pi_mutex)) ret = 0; is obviously from the same opera. But it does not work, because the rtmutex is really taken at this point: wake_futex_pi() of previous owner reassigned it to us. No, -EDEADLK is returned from here: ret = rt_mutex_timed_lock(q.pi_state-pi_mutex, to, 1); The rtmutex code only returns -EDEADLK, when the lock is already held by the task or when it runs into a circular rtmutex dependency (e.g. a ABBA deadlock) during the PI-walk. Also wake_futex_pi() does not assign the ownership of the rtmutex, it assigns the ownership of the futex pi state and unlocks the rtmutex, which sets the pending owner. The pending owner is the highest priority waiter. The ownership assignment of the rtmutex happens inside the rtmutex code, not in wake_futex_pi. The trylock is covering the following situation: rt_mutex_timed_lock(q.pi_state-pi_mutex, to, 1) returns -ETIMEOUT; Now it get's blocked on: spin_lock(q.lock_ptr); because on the other CPU the futex is unlocked. The rt-mutex code does not find a waiter and unlocks the rtmutex with no new owner. The hack bucket lock is released and the task which is blocked on the hash bucket lock has a chance to grab it. My fix works. But it looks very stupid. I would think about removal of shift of ownership in wake_futex_pi() and making all the work in context of process taking lock. I have no clue why it works. The only explanation is, that an existing deadlock is resolved by the exiting task. if (ret q.pi_state-owner == curr) { This happens only, when the rtmutex was unlocked by another task between the return from the rtmutex code and reacquiring the hash bucket lock. if (rt_mutex_trylock(q.pi_state-pi_mutex)) ret = 0; If we can get the rtmutex here, we are the owner of the lock and need to fix up ownership in the user space variable further down. + /* Holy crap... Now futex lock returns -EDEADLK + * sometimes, because ownership was passed to us while + * unlock of previous owner. Who wrote this? I admit, that I was one of the authors :) + * Please, fix this correctly. For now: + */ The fix is to remove it and to find the real cause of the problem. I'm running the glibc tests since hours w/o tripping into it. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
Hello! Hmm, what means not expected ? -ESRCH is returned, when the owner task is not found. This is not supposed to happen with robust futexes. glibs aborts (which is correct), or for build with disabled debugging enters simulated deadlock (which is confusing). lock. Also using uval is wrong. Yup. You are right. This means those RETRY messages could be spurious. I must rerun the test. This does not really explain, why you do prevent the -ESRCH return value in the next cycle, Because right curval is refetched, it already has FUTEX_OWNER_DIED bit set and we succesfully take the lock. No, -EDEADLK is returned from here: ret = rt_mutex_timed_lock(q.pi_state-pi_mutex, to, 1); Of course. It is the only place where ret is set. :-) The rtmutex code only returns -EDEADLK, when the lock is already held by the task This case. The fix is to remove it and to find the real cause of the problem. I'm running the glibc tests since hours w/o tripping into it. OK. You need run only tst-robustpi8 in loop. It should be triggered quickly, a few of minutes on 8-way smp here. If you want, I can insert some debugging printks, which you need, and run the test here. Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
On Tue, 2007-06-05 at 21:39 +0400, Alexey Kuznetsov wrote: Hello! Hmm, what means not expected ? -ESRCH is returned, when the owner task is not found. This is not supposed to happen with robust futexes. Hmm, right. This does not really explain, why you do prevent the -ESRCH return value in the next cycle, Because right curval is refetched, it already has FUTEX_OWNER_DIED bit set and we succesfully take the lock. Ok, handle_futex_death() is punching the OWNER_DIED bit into the futex without the hash bucket lock. We might as well grab the hash bucket lock right there to avoid this. I look for a sane solution. The rtmutex code only returns -EDEADLK, when the lock is already held by the task This case. Sorry, I was not clear here: not the user space lock, the rtmutex must be held or a deadlock situation against another rtmutex must be detected. There is no way that the exiting code assigns the owner ship of the rtmutex. It solely calls rtmutex_unlock() which makes the highest priority waiter the _PENDING_ owner, which means the pending owner needs to acquire it for real. You need run only tst-robustpi8 in loop. It should be triggered quickly, a few of minutes on 8-way smp here. My largest box is a 4 way and it runs since hours in a while true loop. If you want, I can insert some debugging printks, which you need, and run the test here. I fix up some things in the code first and then I'll add a couple of debugs to nail this EDEADLK problem. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
On Tue, 2007-06-05 at 20:48 +0200, Thomas Gleixner wrote: This does not really explain, why you do prevent the -ESRCH return value in the next cycle, Because right curval is refetched, it already has FUTEX_OWNER_DIED bit set and we succesfully take the lock. Ok, handle_futex_death() is punching the OWNER_DIED bit into the futex without the hash bucket lock. We might as well grab the hash bucket lock right there to avoid this. I look for a sane solution. We actually need to do something about this, as we might loop for ever there. The robust cleanup code can fail (e.g. due to list corruption) and we would see exit_state != 0 and the OWNER_DIED bit would never be set, so we are stuck in a busy loop. I think I have an idea how to fix this. tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
Hello! We actually need to do something about this, as we might loop for ever there. The robust cleanup code can fail (e.g. due to list corruption) and we would see exit_state != 0 and the OWNER_DIED bit would never be set, so we are stuck in a busy loop. Yes... It is possible to take read_lock(tasklist_lock) before: inc_preempt_count(); curval = futex_atomic_cmpxchg_inatomic(uaddr, uval, newval); dec_preempt_count(); and drop it after lookup_pi_state(). In this case exiting task will set FUTEX_OWNER_DIED, but will spin in exit_notify(), we will find valid pi_state and go slow path taking rtmutex. Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
On Wed, 2007-06-06 at 01:00 +0400, Alexey Kuznetsov wrote: Hello! We actually need to do something about this, as we might loop for ever there. The robust cleanup code can fail (e.g. due to list corruption) and we would see exit_state != 0 and the OWNER_DIED bit would never be set, so we are stuck in a busy loop. Yes... It is possible to take read_lock(tasklist_lock) before: We really want to avoid the tasklist_lock at all. It's not trivial, but I think it's doable. I send you a patch to test tomorrow morning when my brain recovered from looking at that code :) tglx - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
Hello! > #2 crash be explained via any of the bugs you fixed? (i.e. memory > corruption?) Yes, I found the reason, it is really fixed by taking tasklist_lock. This happens after task struct with not cleared pi_state_list is freed and the list of futex_pi_state's is corrupted. Meanwhile... two more bugs were found. The first chunk: results in self-inflicted deadlock inside glibc. Sometimes futex_lock_pi returns -ESRCH, when it is not expected and glibc enters to for(;;) sleep() to simulate deadlock. This problem is quite obvious and I think the patch is right. Though it looks like each "if" in futex_lock_pi() got some stupid special case "else if". :-) The second chunk: sometimes futex_lock_pi() returns -EDEADLK, when nobody has the lock. The reason is also obvious (see comment in the patch), but correct fix is far beyond my comprehension. I guess someone already saw this, the chunk: if (rt_mutex_trylock(_state->pi_mutex)) ret = 0; is obviously from the same opera. But it does not work, because the rtmutex is really taken at this point: wake_futex_pi() of previous owner reassigned it to us. My fix works. But it looks very stupid. I would think about removal of shift of ownership in wake_futex_pi() and making all the work in context of process taking lock. Both bugs show up when running glibc's tst-robustpi8 long enough. Yes, all this about pre May 8 futexes. Seems, updates did not change anything in logic, but I am not sure. --- kernel/futex.c.intermediate 2007-05-23 14:48:27.0 +0400 +++ kernel/futex.c 2007-05-23 14:58:06.0 +0400 @@ -1244,8 +1244,21 @@ static int futex_lock_pi(u32 __user *uad if (unlikely(curval != uval)) goto retry_locked; ret = 0; - } - goto out_unlock_release_sem; + } else if (ret == -ESRCH) { + /* Process could exit right now, so that robust list +* was processed after we got uval. Retry. */ + pagefault_disable(); + curval = futex_atomic_cmpxchg_inatomic(uaddr, + uval, uval); + pagefault_enable(); + if (unlikely(curval == -EFAULT)) + goto uaddr_faulted; + if (unlikely(curval != uval)) { + printk("RETRY %x %x %x\n", current->pid, uval, curval); + goto retry_locked; + } + } + goto out_unlock_release_sem; } /* @@ -1361,6 +1374,22 @@ static int futex_lock_pi(u32 __user *uad if (ret && q.pi_state->owner == curr) { if (rt_mutex_trylock(_state->pi_mutex)) ret = 0; + /* Holy crap... Now futex lock returns -EDEADLK +* sometimes, because ownership was passed to us while +* unlock of previous owner. Who wrote this? +* Please, fix this correctly. For now: +*/ + if (ret == -EDEADLK) { + pagefault_disable(); + uval = futex_atomic_cmpxchg_inatomic(uaddr, +0, 0); + pagefault_enable(); + if (uval != -EFAULT && + (uval_TID_MASK) == current->pid) { + printk("ALERT1 %x\n", uval); + ret = 0; + } + } } /* Unqueue and drop the lock */ unqueue_me_pi(, hb); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
* Alexey Kuznetsov <[EMAIL PROTECTED]> wrote: > Hello! > > 1. New entries can be added to tsk->pi_state_list after task completed >exit_pi_state_list(). The result is memory leakage and deadlocks. > > 2. handle_mm_fault() is called under spinlock. The result is obvious. > > 3. State machine is broken. Kernel thinks it owns futex after >it released all the locks. Ergo, it corrupts futex. The result is that >two processes think they took a futex. > > All the bugs are trivially reproduced when running glibc's tst-robustpi7 > test long enough. > > The patch is not quite good (RFC!), because: > > 1. There is one case, when I did not figure out how to handle >page fault correctly. I would do it releasing taken rtmutex >and hb->lock and retrying futex from the very beginning. >It is quite ugly. Probably, state machine can be fixed somehow. > > 2. Before this patch I had one unexplained oops inside rtmutex >in plist_del. I did _not_ fix this, but it does not want to reproduce. >Probably, more strong locking did some race window too narrow. thanks for the fixes - they look all good and we'll check it in -rt. We'll try to find a solution for the remaining problem too. Could your #2 crash be explained via any of the bugs you fixed? (i.e. memory corruption?) I'd exclude genuine rtmutex.c breakage for now because that's the basis of all locking in -rt - but maybe the futex interfacing upsets something ... Ingo - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
* Alexey Kuznetsov [EMAIL PROTECTED] wrote: Hello! 1. New entries can be added to tsk-pi_state_list after task completed exit_pi_state_list(). The result is memory leakage and deadlocks. 2. handle_mm_fault() is called under spinlock. The result is obvious. 3. State machine is broken. Kernel thinks it owns futex after it released all the locks. Ergo, it corrupts futex. The result is that two processes think they took a futex. All the bugs are trivially reproduced when running glibc's tst-robustpi7 test long enough. The patch is not quite good (RFC!), because: 1. There is one case, when I did not figure out how to handle page fault correctly. I would do it releasing taken rtmutex and hb-lock and retrying futex from the very beginning. It is quite ugly. Probably, state machine can be fixed somehow. 2. Before this patch I had one unexplained oops inside rtmutex in plist_del. I did _not_ fix this, but it does not want to reproduce. Probably, more strong locking did some race window too narrow. thanks for the fixes - they look all good and we'll check it in -rt. We'll try to find a solution for the remaining problem too. Could your #2 crash be explained via any of the bugs you fixed? (i.e. memory corruption?) I'd exclude genuine rtmutex.c breakage for now because that's the basis of all locking in -rt - but maybe the futex interfacing upsets something ... Ingo - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH] muptiple bugs in PI futexes
Hello! #2 crash be explained via any of the bugs you fixed? (i.e. memory corruption?) Yes, I found the reason, it is really fixed by taking tasklist_lock. This happens after task struct with not cleared pi_state_list is freed and the list of futex_pi_state's is corrupted. Meanwhile... two more bugs were found. The first chunk: results in self-inflicted deadlock inside glibc. Sometimes futex_lock_pi returns -ESRCH, when it is not expected and glibc enters to for(;;) sleep() to simulate deadlock. This problem is quite obvious and I think the patch is right. Though it looks like each if in futex_lock_pi() got some stupid special case else if. :-) The second chunk: sometimes futex_lock_pi() returns -EDEADLK, when nobody has the lock. The reason is also obvious (see comment in the patch), but correct fix is far beyond my comprehension. I guess someone already saw this, the chunk: if (rt_mutex_trylock(q.pi_state-pi_mutex)) ret = 0; is obviously from the same opera. But it does not work, because the rtmutex is really taken at this point: wake_futex_pi() of previous owner reassigned it to us. My fix works. But it looks very stupid. I would think about removal of shift of ownership in wake_futex_pi() and making all the work in context of process taking lock. Both bugs show up when running glibc's tst-robustpi8 long enough. Yes, all this about pre May 8 futexes. Seems, updates did not change anything in logic, but I am not sure. --- kernel/futex.c.intermediate 2007-05-23 14:48:27.0 +0400 +++ kernel/futex.c 2007-05-23 14:58:06.0 +0400 @@ -1244,8 +1244,21 @@ static int futex_lock_pi(u32 __user *uad if (unlikely(curval != uval)) goto retry_locked; ret = 0; - } - goto out_unlock_release_sem; + } else if (ret == -ESRCH) { + /* Process could exit right now, so that robust list +* was processed after we got uval. Retry. */ + pagefault_disable(); + curval = futex_atomic_cmpxchg_inatomic(uaddr, + uval, uval); + pagefault_enable(); + if (unlikely(curval == -EFAULT)) + goto uaddr_faulted; + if (unlikely(curval != uval)) { + printk(RETRY %x %x %x\n, current-pid, uval, curval); + goto retry_locked; + } + } + goto out_unlock_release_sem; } /* @@ -1361,6 +1374,22 @@ static int futex_lock_pi(u32 __user *uad if (ret q.pi_state-owner == curr) { if (rt_mutex_trylock(q.pi_state-pi_mutex)) ret = 0; + /* Holy crap... Now futex lock returns -EDEADLK +* sometimes, because ownership was passed to us while +* unlock of previous owner. Who wrote this? +* Please, fix this correctly. For now: +*/ + if (ret == -EDEADLK) { + pagefault_disable(); + uval = futex_atomic_cmpxchg_inatomic(uaddr, +0, 0); + pagefault_enable(); + if (uval != -EFAULT + (uvalFUTEX_TID_MASK) == current-pid) { + printk(ALERT1 %x\n, uval); + ret = 0; + } + } } /* Unqueue and drop the lock */ unqueue_me_pi(q, hb); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/