Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread Jamie Lokier
Olof Johansson wrote: > How's this? I went with get_val_no_fault(), since it isn't really a > get_user.*() any more (ptr being passed in), and no_paging is a little > misleading (not all faults are due to paging). How ironic: I deliberately didn't choose "no_fault" because that function *does* tak

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread Linus Torvalds
On Wed, 23 Feb 2005, Olof Johansson wrote: > > How's this? I went with get_val_no_fault(), since it isn't really a > get_user.*() any more (ptr being passed in), and no_paging is a little > misleading (not all faults are due to paging). Applied with minor cosmetic changes. I'm like a dog who li

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread Olof Johansson
On Wed, Feb 23, 2005 at 06:49:46PM +, Jamie Lokier wrote: > Linus Torvalds wrote: > > > I suggest putting it into futex.c, and make it an inline function > > > which takes "u32 __user *". > > > > Agreed, except we've traditionally just made it "int __user *". > > The type signatures in futex.

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread Jamie Lokier
Linus Torvalds wrote: > > I suggest putting it into futex.c, and make it an inline function > > which takes "u32 __user *". > > Agreed, except we've traditionally just made it "int __user *". The type signatures in futex.c are a bit mixed up - most places say "int __user *" but sys_futex() says "

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread David Howells
Olof Johansson <[EMAIL PROTECTED]> wrote: > > Alternately, you could just have do_page_fault() do: > > > > while (!down_read_trylock(¤t->mm->mmap_sem)) > > continue; > > > > However, note that this can suffer from starvation due to a never ending > > flow of mixed write-locks and

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread Olof Johansson
On Wed, Feb 23, 2005 at 06:22:04PM +, Jamie Lokier wrote: > Olof Johansson wrote: > > On Wed, Feb 23, 2005 at 07:54:06AM -0800, Linus Torvalds wrote: > > > > > > Otherwise, a preempt attempt in get_user would not be seen > > > > until some future preempt_enable was executed. > > > > > > True.

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread Linus Torvalds
On Wed, 23 Feb 2005, Jamie Lokier wrote: > > I suggest putting it into futex.c, and make it an inline function > which takes "u32 __user *". Agreed, except we've traditionally just made it "int __user *". Linus - To unsubscribe from this list: send the line "unsubscribe linux-k

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread Jamie Lokier
Olof Johansson wrote: > On Wed, Feb 23, 2005 at 07:54:06AM -0800, Linus Torvalds wrote: > > > > Otherwise, a preempt attempt in get_user would not be seen > > > until some future preempt_enable was executed. > > > > True. I guess we should have a "preempt_check_resched()" there too. That's > > w

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread Arjan van de Ven
On Wed, 2005-02-23 at 11:10 -0600, Olof Johansson wrote: > On Wed, Feb 23, 2005 at 07:54:06AM -0800, Linus Torvalds wrote: > > > > Otherwise, a preempt attempt in get_user would not be seen > > > until some future preempt_enable was executed. > > > > True. I guess we should have a "preempt_check_

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread Olof Johansson
On Wed, Feb 23, 2005 at 07:54:06AM -0800, Linus Torvalds wrote: > > Otherwise, a preempt attempt in get_user would not be seen > > until some future preempt_enable was executed. > > True. I guess we should have a "preempt_check_resched()" there too. That's > what "kunmap_atomic()" does too (whic

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread Olof Johansson
On Wed, Feb 23, 2005 at 11:39:08AM +, David Howells wrote: > Alternately, you could just have do_page_fault() do: > > while (!down_read_trylock(¤t->mm->mmap_sem)) > continue; > > However, note that this can suffer from starvation due to a never ending flow > of mixed write

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread Linus Torvalds
On Wed, 23 Feb 2005, Joe Korty wrote: > > Perhaps this should be preempt_disable preempt_enable. No, the problem with preempt_disable/enable is that they go away if preemption is not enabled. So you really do have to do it by hand with the "inc_preempt_count". > Otherwise, a preempt att

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread Joe Korty
On Tue, Feb 22, 2005 at 01:30:27PM -0800, Linus Torvalds wrote: > > We really have this already, and it's called "current->preempt". It > handles any lock at all, and doesn't add yet another special case to all > the architectures. > > Just do > > repeat: > down_read(¤t->mm

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread David Howells
Andrew Morton <[EMAIL PROTECTED]> wrote: > wrt this down_read/down_write/down_read deadlock: iirc, the reason why > down_write() takes precedence over down_read() is to avoid the permanent > writer starvation which would occur if there is heavy down_read() traffic. down_write() doesn't actually t

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread David Howells
Linus Torvalds <[EMAIL PROTECTED]> wrote: > That is uglee. True. You could just wrap it up in inline functions and hide it in a header file as I suggested in the email I've just sent. > We really have this already, and it's called "current->preempt". It > handles any lock at all, and doesn't a

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-23 Thread David Howells
Linus Torvalds <[EMAIL PROTECTED]> wrote: > It shouldn't be. If one read writer is active, another should be able to > come in, regardless of any pending writer trying to access it. At least > that's always been the rule for the rw-spinlocks _exactly_ for this > reaseon. But not with rw-semaph

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Olof Johansson
On Tue, Feb 22, 2005 at 03:20:27PM -0800, Andrew Morton wrote: > [EMAIL PROTECTED] (Olof Johansson) wrote: > > > > + inc_preempt_count(); > > + ret = get_user(curval, (int __user *)uaddr1); > > + dec_preempt_count(); > > That _should_ generate a might_sleep() warning,

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Andrew Morton
[EMAIL PROTECTED] (Olof Johansson) wrote: > > + inc_preempt_count(); > + ret = get_user(curval, (int __user *)uaddr1); > + dec_preempt_count(); That _should_ generate a might_sleep() warning, except it looks like we forgot to add a check to get_user(). It would

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Greg KH
On Tue, Feb 22, 2005 at 02:10:58PM -0800, Linus Torvalds wrote: > > Oh, well. The reason I hate the rwsem behaviour is exactly because it > results in this very subtle class of deadlocks. This one case is certainly > solvable several ways, but do we have other issues somewhere else? Things > like

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Olof Johansson
On Tue, Feb 22, 2005 at 10:34:57PM +, Jamie Lokier wrote: > There is one small but important error: the "return ret" mustn't just > return. It must call unqueue_me(&q) just like the code at out_unqueue, > _including_ the conditional "ret = 0", but _excluding_ the up_read(). Not only that, but

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Jamie Lokier
Linus Torvalds wrote: > > queue_me(...) etc. > > current->flags |= PF_MMAP_SEM; <- new > > ret = get_user(...); > > current->flags &= PF_MMAP_SEM; <- new > > /* the rest */ > > That is uglee. > > We really have this already, and it's called "current->p

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Benjamin Herrenschmidt
On Tue, 2005-02-22 at 14:10 -0800, Linus Torvalds wrote: > Oh, well. The reason I hate the rwsem behaviour is exactly because it > results in this very subtle class of deadlocks. This one case is certainly > solvable several ways, but do we have other issues somewhere else? Things > like kobject m

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Jamie Lokier
Olof Johansson wrote: > > That won't work because the vma lock must be help between key > > calculation and get_user() - otherwise futex is not reliable. It > > would work if the futex key calculation was inside the loop. > > Sure, but that's still true: It's just that the get_user() is done twic

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Linus Torvalds
On Wed, 23 Feb 2005, Benjamin Herrenschmidt wrote: > > Yours is probably the most efficient too. Note sure what is best for > rwsems tho, there seem to be some interest preventing readers from > starving writers for ever, this has been debated endlessly iirc, > though I have no personal opinion

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Linus Torvalds
On Tue, 22 Feb 2005, Andrew Morton wrote: > > However the pte can get unmapped by memory reclaim so we could still take a > minor fault, and hit the same deadlock, yes? You _could_ fix that by getting the pagetable spinlock, I guess. Which check_user_page_readable() assumes you'd be holding any

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Benjamin Herrenschmidt
On Tue, 2005-02-22 at 13:31 -0800, Linus Torvalds wrote: > > On Wed, 23 Feb 2005, Benjamin Herrenschmidt wrote: > > > > Isn't Olof scheme racy ? Can't the stuff get swapped out between the > > first get_user() and the "real" one ? > > Yes. But see my suggested modification (which I still think i

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Andrew Morton
Jamie Lokier <[EMAIL PROTECTED]> wrote: > > ... > > > > One attempt to fix this is included below. It works, but I'm not entirely > > > happy with the fact that it's a bit messy solution. If anyone has a > > > better idea for how to solve it I'd be all ears. > > > > It's fairly sane. Style-wise

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Linus Torvalds
On Tue, 22 Feb 2005, Jamie Lokier wrote: > > A much simpler solution (and sorry for not offering it earlier, > because Andrew Morton pointed out this bug long ago, but I was busy), is: > > In futex.c: > > down_read(¤t->mm->mmap_sem); > get_futex_key(...) etc. > queue_me(...)

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Linus Torvalds
On Wed, 23 Feb 2005, Benjamin Herrenschmidt wrote: > > Isn't Olof scheme racy ? Can't the stuff get swapped out between the > first get_user() and the "real" one ? Yes. But see my suggested modification (which I still think is "the thing that Olof does", except it's more efficient and avoids t

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Jamie Lokier
Chris Friesen wrote: > > down_read(¤t->mm->mmap_sem); > > get_futex_key(...) etc. > > queue_me(...) etc. > > current->flags |= PF_MMAP_SEM; <- new > > ret = get_user(...); > > current->flags &= PF_MMAP_SEM; <- new > > /* the rest */ > > Should th

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Olof Johansson
On Tue, Feb 22, 2005 at 09:07:52PM +, Jamie Lokier wrote: > > That won't work because the vma lock must be help between key > calculation and get_user() - otherwise futex is not reliable. It > would work if the futex key calculation was inside the loop. Sure, but that's still true: It's just

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Benjamin Herrenschmidt
On Wed, 2005-02-23 at 08:16 +1100, Benjamin Herrenschmidt wrote: > On Tue, 2005-02-22 at 11:36 -0800, Linus Torvalds wrote: > > > DavidH - what's the word on nested read-semaphores like this? Are they > > supposed to work (like nested read-spinlocks), or do we need to do the > > things Olof does

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Chris Friesen
Jamie Lokier wrote: In futex.c: down_read(¤t->mm->mmap_sem); get_futex_key(...) etc. queue_me(...) etc. current->flags |= PF_MMAP_SEM; <- new ret = get_user(...); current->flags &= PF_MMAP_SEM; <- new /* the rest */ Sho

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Benjamin Herrenschmidt
On Tue, 2005-02-22 at 11:36 -0800, Linus Torvalds wrote: > DavidH - what's the word on nested read-semaphores like this? Are they > supposed to work (like nested read-spinlocks), or do we need to do the > things Olof does? Isn't Olof scheme racy ? Can't the stuff get swapped out between the fir

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Jamie Lokier
Andrew Morton wrote: > > This will quickly lock up, since the futex_wait code dows a > > down_read(mmap_sem), then a get_user(). > > > > The do_page_fault code on ppc64 (as well as other architectures) needs > > to take the same semaphore for reading. This is all good until the > > second thread c

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Andrew Morton
[EMAIL PROTECTED] (Olof Johansson) wrote: > > Hi, > > Consider a small testcase that spawns off two threads, either thread > doing a loop of: > > buf = mmap /dev/zero MAP_SHARED for 0x10 bytes > call sys_futex (buf+page, FUTEX_WAIT, 1, NULL, NULL) for each page in > said mmap >

Re: [PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Linus Torvalds
On Tue, 22 Feb 2005, Olof Johansson wrote: > > Consider a small testcase that spawns off two threads, either thread > doing a loop of: > > buf = mmap /dev/zero MAP_SHARED for 0x10 bytes > call sys_futex (buf+page, FUTEX_WAIT, 1, NULL, NULL) for each page in > said mmap >

[PATCH/RFC] Futex mmap_sem deadlock

2005-02-22 Thread Olof Johansson
Hi, Consider a small testcase that spawns off two threads, either thread doing a loop of: buf = mmap /dev/zero MAP_SHARED for 0x10 bytes call sys_futex (buf+page, FUTEX_WAIT, 1, NULL, NULL) for each page in said mmap munmap(buf) repeat This will quickly lock