Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-27 Thread David Hildenbrand
 Code like
   spin_lock(lock);
   if (copy_to_user(...))
   rc = ...
   spin_unlock(lock);
 really *should* generate warnings like it did before.
 
 And *only* code like
   spin_lock(lock);

Is only code like this valid or also with the spin_lock() dropped?
(e.g. the access in patch1 if I remember correctly)


So should page_fault_disable() increment the pagefault counter and the preempt
counter or only the first one?

   page_fault_disable();
   if (copy_to_user(...))
   rc = ...
   page_fault_enable();
   spin_unlock(lock);
 should not generate warnings, since the author hopefully knew what he did.
 
 We could achieve that by e.g. adding a couple of pagefault disabled bits
 within current_thread_info()-preempt_count, which would allow
 pagefault_disable() and pagefault_enable() to modify a different part of
 preempt_count than it does now, so there is a way to tell if pagefaults have
 been explicitly disabled or are just a side effect of preemption being
 disabled.
 This would allow might_fault() to restore its old sane behaviour for the
 !page_fault_disabled() case.

So we would have pagefault code rely on:

in_disabled_pagefault() ( pagefault_disabled() ... whatever ) instead of
in_atomic().

I agree with this approach, as this is basically what I suggested in one of my
previous mails.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-27 Thread Heiko Carstens
On Thu, Nov 27, 2014 at 09:03:01AM +0100, David Hildenbrand wrote:
  Code like
  spin_lock(lock);
  if (copy_to_user(...))
  rc = ...
  spin_unlock(lock);
  really *should* generate warnings like it did before.
  
  And *only* code like
  spin_lock(lock);
 
 Is only code like this valid or also with the spin_lock() dropped?
 (e.g. the access in patch1 if I remember correctly)
 
 So should page_fault_disable() increment the pagefault counter and the preempt
 counter or only the first one?

Given that a sequence like

page_fault_disable();
if (copy_to_user(...))
rc = ...
page_fault_enable();

is correct code right now I think page_fault_disable() should increase both.
No need for surprising semantic changes.

 So we would have pagefault code rely on:
 
 in_disabled_pagefault() ( pagefault_disabled() ... whatever ) instead of
 in_atomic().

No, let's be more defensive: the page fault handler should do nothing if
in_atomic() just like now. But it could have a quick check and emit a one
time warning if page faults aren't disabled in addition.
That might help debugging but keeps the system more likely alive.

might_fault() however should call might_sleep() if page faults aren't
disabled, but that's what you proposed anyway I think.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-27 Thread David Hildenbrand
 On Thu, Nov 27, 2014 at 09:03:01AM +0100, David Hildenbrand wrote:
   Code like
 spin_lock(lock);
 if (copy_to_user(...))
 rc = ...
 spin_unlock(lock);
   really *should* generate warnings like it did before.
   
   And *only* code like
 spin_lock(lock);
  
  Is only code like this valid or also with the spin_lock() dropped?
  (e.g. the access in patch1 if I remember correctly)
  
  So should page_fault_disable() increment the pagefault counter and the 
  preempt
  counter or only the first one?
 
 Given that a sequence like
 
   page_fault_disable();
   if (copy_to_user(...))
   rc = ...
   page_fault_enable();
 
 is correct code right now I think page_fault_disable() should increase both.
 No need for surprising semantic changes.
 
  So we would have pagefault code rely on:
  
  in_disabled_pagefault() ( pagefault_disabled() ... whatever ) instead of
  in_atomic().
 
 No, let's be more defensive: the page fault handler should do nothing if
 in_atomic() just like now. But it could have a quick check and emit a one
 time warning if page faults aren't disabled in addition.
 That might help debugging but keeps the system more likely alive.

Sounds sane if we increase both counters!

 
 might_fault() however should call might_sleep() if page faults aren't
 disabled, but that's what you proposed anyway I think.

Jap, sounds good to me. Will see if I can come up with something.

Thanks!

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-27 Thread Thomas Gleixner
On Thu, 27 Nov 2014, Heiko Carstens wrote:
 On Thu, Nov 27, 2014 at 09:03:01AM +0100, David Hildenbrand wrote:
   Code like
 spin_lock(lock);
 if (copy_to_user(...))
 rc = ...
 spin_unlock(lock);
   really *should* generate warnings like it did before.
   
   And *only* code like
 spin_lock(lock);
  
  Is only code like this valid or also with the spin_lock() dropped?
  (e.g. the access in patch1 if I remember correctly)
  
  So should page_fault_disable() increment the pagefault counter and the 
  preempt
  counter or only the first one?
 
 Given that a sequence like
 
   page_fault_disable();
   if (copy_to_user(...))
   rc = ...
   page_fault_enable();
 
 is correct code right now I think page_fault_disable() should increase both.
 No need for surprising semantic changes.

OTOH, there is no reason why we need to disable preemption over that
page_fault_disabled() region. There are code pathes which really do
not require to disable preemption for that.

We have that seperated in preempt-rt for obvious reasons and IIRC
Peter Zijlstra tried to distangle it in mainline some time ago. I
forgot why that never got merged.

We tie way too much stuff on the preemption count already, which is a
mightmare because we have no clear distinction of protection
scopes. 

Thanks,

tglx
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-27 Thread David Hildenbrand
 OTOH, there is no reason why we need to disable preemption over that
 page_fault_disabled() region. There are code pathes which really do
 not require to disable preemption for that.
 
 We have that seperated in preempt-rt for obvious reasons and IIRC
 Peter Zijlstra tried to distangle it in mainline some time ago. I
 forgot why that never got merged.
 

Of course, we can completely separate that in our page fault code by doing
pagefault_disabled() checks instead of in_atomic() checks (even in add on
patches later).

 We tie way too much stuff on the preemption count already, which is a
 mightmare because we have no clear distinction of protection
 scopes. 

Although it might not be optimal, but keeping a separate counter for
pagefault_disable() as part of the preemption counter seems to be the only
doable thing right now. I am not sure if a completely separated counter is even
possible, increasing the size of thread_info.

I am working on a prototype right now.

Thanks!

 
 Thanks,
 
   tglx
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-27 Thread David Laight
From: David Hildenbrand
...
 Although it might not be optimal, but keeping a separate counter for
 pagefault_disable() as part of the preemption counter seems to be the only
 doable thing right now. I am not sure if a completely separated counter is 
 even
 possible, increasing the size of thread_info.

What about adding (say) 0x1 for the more restrictive test?

David

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-27 Thread David Hildenbrand
 From: David Hildenbrand
 ...
  Although it might not be optimal, but keeping a separate counter for
  pagefault_disable() as part of the preemption counter seems to be the only
  doable thing right now. I am not sure if a completely separated counter is 
  even
  possible, increasing the size of thread_info.
 
 What about adding (say) 0x1 for the more restrictive test?
 
   David
 

You mean as part of the preempt counter?

The current layout (on my branch) is

 * PREEMPT_MASK:0x00ff
 * SOFTIRQ_MASK:0xff00
 * HARDIRQ_MASK:0x000f
 * NMI_MASK:0x0010
 * PREEMPT_ACTIVE:  0x0020

I would have added
 * PAGEFAULT_MASK:  0x03C0

So 4 bit == 16 levels (tbd)

By implementing scope checks in the debug case like done for the regular
preempt_count_inc() preempt_count_dec(), we could catch over/underflows.

Thanks,

David

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-27 Thread David Laight
From: David Hildenbrand [mailto:d...@linux.vnet.ibm.com]
  From: David Hildenbrand
  ...
   Although it might not be optimal, but keeping a separate counter for
   pagefault_disable() as part of the preemption counter seems to be the only
   doable thing right now. I am not sure if a completely separated counter 
   is even
   possible, increasing the size of thread_info.
 
  What about adding (say) 0x1 for the more restrictive test?
 
  David
 
 
 You mean as part of the preempt counter?
 
 The current layout (on my branch) is
 
  * PREEMPT_MASK:0x00ff
  * SOFTIRQ_MASK:0xff00
  * HARDIRQ_MASK:0x000f
  * NMI_MASK:0x0010
  * PREEMPT_ACTIVE:  0x0020
 
 I would have added
  * PAGEFAULT_MASK:  0x03C0

I'm not sure where you'd need to add the bits.

I think the above works because disabling 'HARDIRQ' implicitly
disables 'SOFTIRQ' and 'PREEMPT' (etc), so if 256+ threads
disable PREEMPT everything still works.

So if disabling pagefaults implies that pre-emption is disabled
(but SOFTIRQ is still allowed) then you need to insert your bit(s)
between 0xff00 and 0x00ff.
OTOH if disabling pre-emption implies that pagefaults are disabled
then you'd need to use the lsb and change all the above values.

Which makes me think that 'PREEMPT_ACTIVE' isn't right at all.
Two threads disabling NMIs (or 32  disabling HARDIRQ) won't DTRT.

OTOH I'm only guessing at how this is used.

David



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-27 Thread David Hildenbrand
 From: David Hildenbrand [mailto:d...@linux.vnet.ibm.com]
   From: David Hildenbrand
   ...
Although it might not be optimal, but keeping a separate counter for
pagefault_disable() as part of the preemption counter seems to be the 
only
doable thing right now. I am not sure if a completely separated counter 
is even
possible, increasing the size of thread_info.
  
   What about adding (say) 0x1 for the more restrictive test?
  
 David
  
  
  You mean as part of the preempt counter?
  
  The current layout (on my branch) is
  
   * PREEMPT_MASK:0x00ff
   * SOFTIRQ_MASK:0xff00
   * HARDIRQ_MASK:0x000f
   * NMI_MASK:0x0010
   * PREEMPT_ACTIVE:  0x0020
  
  I would have added
   * PAGEFAULT_MASK:  0x03C0
 
 I'm not sure where you'd need to add the bits.
 
 I think the above works because disabling 'HARDIRQ' implicitly
 disables 'SOFTIRQ' and 'PREEMPT' (etc), so if 256+ threads
 disable PREEMPT everything still works.

AFAIK 256+ levels of preempt will break the system :)

Therefore with CONFIG_DEBUG_PREEMPT we verify that we don't have any
over/underflows.

But such bugs can only be found with CONFIG_DEBUG_PREEMPT enabled.

 
 So if disabling pagefaults implies that pre-emption is disabled
 (but SOFTIRQ is still allowed) then you need to insert your bit(s)
 between 0xff00 and 0x00ff.
 OTOH if disabling pre-emption implies that pagefaults are disabled
 then you'd need to use the lsb and change all the above values.
 
 Which makes me think that 'PREEMPT_ACTIVE' isn't right at all.
 Two threads disabling NMIs (or 32  disabling HARDIRQ) won't DTRT.

With threads you mean levels? This is a per thread information.

 
 OTOH I'm only guessing at how this is used.
 
   David
 
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-27 Thread David Hildenbrand
Simple prototype to enable might_sleep() checks in might_fault(), avoiding false
positives for scenarios involving explicit pagefault_disable().

So this should work:
spin_lock(lock); /* also if left away */
pagefault_disable()
rc = copy_to_user(...)
pagefault_enable();
spin_unlock(lock); /*

And this should report a warning again:
spin_lock(lock);
rc = copy_to_user(...);
spin_unlock(lock);

Still missing:
- Split of preempt documentation update + preempt_active define reshuffle
- Debug version to test for over/underflows
- Change documentation of user access methods to reflect the real behavior
- Don't touch the preempt counter, only the pagefault disable counter (future
  work)

David Hildenbrand (2):
  preempt: track pagefault_disable() calls in the preempt counter
  mm, sched: trigger might_sleep() in might_fault() when pagefaults are
disabled

 include/linux/kernel.h   |  9 +++--
 include/linux/preempt_mask.h | 24 +++-
 include/linux/uaccess.h  | 21 ++---
 mm/memory.c  | 15 ---
 4 files changed, 44 insertions(+), 25 deletions(-)

-- 
1.8.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-27 Thread Thomas Gleixner
On Thu, 27 Nov 2014, David Hildenbrand wrote:
  OTOH, there is no reason why we need to disable preemption over that
  page_fault_disabled() region. There are code pathes which really do
  not require to disable preemption for that.
  
  We have that seperated in preempt-rt for obvious reasons and IIRC
  Peter Zijlstra tried to distangle it in mainline some time ago. I
  forgot why that never got merged.
  
 
 Of course, we can completely separate that in our page fault code by doing
 pagefault_disabled() checks instead of in_atomic() checks (even in add on
 patches later).
 
  We tie way too much stuff on the preemption count already, which is a
  mightmare because we have no clear distinction of protection
  scopes. 
 
 Although it might not be optimal, but keeping a separate counter for
 pagefault_disable() as part of the preemption counter seems to be the only
 doable thing right now.

It needs to be seperate, if it should be useful. Otherwise we just
have a extra accounting in preempt_count() which does exactly the same
thing as we have now: disabling preemption.

Now you might say, that we could mask out that part when checking
preempt_count, but that wont work on x86 as x86 has the preempt
counter as a per cpu variable and not as a per thread one.

But if you want to distangle pagefault disable from preempt disable
then you must move it to the thread, because it is a property of the
thread. preempt count is very much a per cpu counter as you can only
go through schedule when it becomes 0.

Btw, I find the x86 representation way more clear, because it
documents that preempt count is a per cpu BKL and not a magic thread
property. And sadly that is how preempt count is used ...

 I am not sure if a completely separated counter is even possible,
 increasing the size of thread_info.

And adding a ulong to thread_info is going to create exactly which
problem?

Thanks,

tglx
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-27 Thread David Hildenbrand
 On Thu, 27 Nov 2014, David Hildenbrand wrote:
   OTOH, there is no reason why we need to disable preemption over that
   page_fault_disabled() region. There are code pathes which really do
   not require to disable preemption for that.
   
   We have that seperated in preempt-rt for obvious reasons and IIRC
   Peter Zijlstra tried to distangle it in mainline some time ago. I
   forgot why that never got merged.
   
  
  Of course, we can completely separate that in our page fault code by doing
  pagefault_disabled() checks instead of in_atomic() checks (even in add on
  patches later).
  
   We tie way too much stuff on the preemption count already, which is a
   mightmare because we have no clear distinction of protection
   scopes. 
  
  Although it might not be optimal, but keeping a separate counter for
  pagefault_disable() as part of the preemption counter seems to be the only
  doable thing right now.
 
 It needs to be seperate, if it should be useful. Otherwise we just
 have a extra accounting in preempt_count() which does exactly the same
 thing as we have now: disabling preemption.
 
 Now you might say, that we could mask out that part when checking
 preempt_count, but that wont work on x86 as x86 has the preempt
 counter as a per cpu variable and not as a per thread one.

Ah right, it's per cpu on x86. So it really belongs to a thread if we want to
demangle preemption and pagefault_disable.

Would work for now, but for x86 not on the long run.

 
 But if you want to distangle pagefault disable from preempt disable
 then you must move it to the thread, because it is a property of the
 thread. preempt count is very much a per cpu counter as you can only
 go through schedule when it becomes 0.

Thinking about it, this makes perfect sense!

 
 Btw, I find the x86 representation way more clear, because it
 documents that preempt count is a per cpu BKL and not a magic thread
 property. And sadly that is how preempt count is used ...
 
  I am not sure if a completely separated counter is even possible,
  increasing the size of thread_info.
 
 And adding a ulong to thread_info is going to create exactly which
 problem?

If we're allowed to increase the size of thread_info - absolutely fine with me!
(I am not sure if some archs have special constraints on the size)

Will see what I can come up with.

Thanks!

 
 Thanks,
 
   tglx
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread David Hildenbrand
Hi Michael,

thanks for your reply! some general thought:

This change was introduced mid 2013 but we don't have a single user relying
on this code change yet, right?

Disabling might_sleep() for functions that clearly state that they may sleep to
get some special case running feels wrong to me.

 On Tue, Nov 25, 2014 at 12:43:24PM +0100, David Hildenbrand wrote:
  I recently discovered that commit 662bbcb2747c2422cf98d3d97619509379eee466
  removed/skipped all might_sleep checks for might_fault() calls when in 
  atomic.
 
 Yes.  You can add e.g. might_sleep in your code if, for some reason, it is
 illegal to call it in an atomic context.
 But faults are legal in atomic if you handle the possible
 errors, and faults do not necessary cause caller to sleep, so might_fault
 should not call might_sleep.

My point is that and (almost at) everywhere where we use pagefault_disable, we
are using the inatomic variants. Also the documentation of copy_to_user()
clearly states at various points that this function may sleep:

- git grep This function may sleep yields
   Context: User context only.  This function may sleep. e.g. s390, x86, mips

Patching out the might_sleep() from these functions seems more to be a hack to
solve another problem - not using the inatomic variants or finding them not to
be sufficient for a task?

So calling might_sleep() in these functions seems very right to me.

 
  Reason was to allow calling copy_(to|from)_user while in 
  pagefault_disabled(),
  because otherwise, CONFIG_DEBUG_ATOMIC_SLEEP would find false positives.
 
 That wasn't the only reason BTW.
 Andi Kleen also showed that compiler did a terrible job optimizing
 get/put user when might_sleep was called.
 See e.g. this thread:
 https://lkml.org/lkml/2013/8/14/652
 There was even an lwn.net article about it, that I don't seem to be
 able to locate.

Thanks, I'll try to look it up! but:

might_sleep() will only be called when lock debugging / sleep in atomic is in,
so this doesn't seem to be a problem for me in a production environment. or am
I missing something?

 So might_sleep is not appropriate for lightweight operations like __get_user,
 which people literally expect to be a single instruction.

I agree that __.* variants should not call might_fault() (like I mentioned
after the table below).

 
 
 I also have a project going which handles very short packets by copying
 them into guest memory directly without waking up a thread.
 I do it by calling recvmsg on a socket, then switching to
 a thread if I get back EFAULT.
 Not yet clean enough to upstream but it does seem to cut
 the latency down quite a bit, so I'd like to have the option.
 
 
 Generally, a caller that does something like this under a spinlock:
 preempt_disable
 pagefault_disable
 error = copy_to_user

So why can't we use the inatomic variant? This seems to be
atomic environment to me. Calling a function that states that it may sleep
doesn't feel right to me.

 pagefault_enable
 preempt_enable_no_resched
 
 is not doing anything wrong and should not get a warning,
 as long as error is handled correctly later.
 
 You can also find the discussion around the patches
 educational:
 http://article.gmane.org/gmane.comp.emulators.kvm.devel/109928
 
  However
  we have the inatomic variants of these function for this purpose.
 
 Does inatomic install fixups now?

If not, I think this would rather be the way to go.

 
 Last I checked, it didn't so it did not satisfy this purpose.
 See this comment from x86:
 
  * Copy data from kernel space to user space.  Caller must check
  * the specified block with access_ok() before calling this function.
  * The caller should also make sure he pins the user space address
  * so that we don't result in page fault and sleep.
 
 
 Also - switching to inatomic would scatter if (atomic) all
 over code. It's much better to simply call the same
 function (e.g. recvmsg) and have it fail gracefully:
 after all, we have code to handle get/put user errors
 anyway.
 
  The result of this change was that all guest access (that correctly uses
  might_fault()) doesn't perform atomic checks when CONFIG_DEBUG_ATOMIC_SLEEP 
  is
  enabled. We lost a mighty debugging feature for user access.
 
 What's the path you are trying to debug?

Well, we had a problem where we held a spin_lock and called
copy_(from|to)_user(). We experienced very random deadlocks that took some guy
almost a week to debug. The simple might_sleep() check would have showed this
error immediately.

I would have said that in 99,9% of all copy_to_user() calls we want to check
might_sleep(). That pagefault_disable() is a special case that should be
handled differently - in my opinion.

 
 If your code can faults, then it's safe to call
 from atomic context.
 If it can't, it must pin the page.  You can not do access_ok checks and
 then assume access won't fault.
 
  As I wasn't able to come up with any other reason why this should be
  

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 11:05:04AM +0100, David Hildenbrand wrote:
  What's the path you are trying to debug?
 
 Well, we had a problem where we held a spin_lock and called
 copy_(from|to)_user(). We experienced very random deadlocks that took some guy
 almost a week to debug. The simple might_sleep() check would have showed this
 error immediately.

This must have been a very old kernel.
A modern kernel will return an error from copy_to_user.
Which is really the point of the patch you are trying to revert.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Michael S. Tsirkin
Hmm you sent same mail to me off-list, and I replied there.
Now there's a copy on list - I'm just going to assume
it's exactly identical, pasting my response here as well.
If there are more questions I missed, let me know.

On Wed, Nov 26, 2014 at 09:23:31AM +0100, David Hildenbrand wrote:
 Sorry I haven't put you on cc, must have lost you while packing my list :)
 Thanks for your answer!
 
 This change was introduced in 2013, and I haven't seen an instance making use
 of your described scenario, right?

My work is still out of tree. RHEL6 shipped some patches that use
this. I don't know whether any instances in-tree use this capability.

But it just doesn't make sense for might_fault to call might_sleep
because a fault does not imply sleep.

  On Tue, Nov 25, 2014 at 12:43:24PM +0100, David Hildenbrand wrote:
   I recently discovered that commit 662bbcb2747c2422cf98d3d97619509379eee466
   removed/skipped all might_sleep checks for might_fault() calls when in 
   atomic.
  
  Yes.  You can add e.g. might_sleep in your code if, for some reason, it is
  illegal to call it in an atomic context.
  But faults are legal in atomic if you handle the possible
  errors, and faults do not necessary cause caller to sleep, so might_fault
  should not call might_sleep.
 
 I think we should use in_atomic variants for this purpose (as done in the code
 for now) - especially as pagefault_disable has been relying on the preempt
 count for a long time. But see my comment below about existing documentation.

IIUC they are not interchangeable.
*in_atomic seems to require that page is pinned.
*user does not: it installs a fixup so you get an error code if you try
to access an invalid address.

Besides, this would just lead to a ton of
 if (atomic) return inatomic else return user
in code, for no good purpose.

  
   Reason was to allow calling copy_(to|from)_user while in 
   pagefault_disabled(),
   because otherwise, CONFIG_DEBUG_ATOMIC_SLEEP would find false positives.
  
  That wasn't the only reason BTW.
  Andi Kleen also showed that compiler did a terrible job optimizing
  get/put user when might_sleep was called.
 
 might_fault() should never call might_sleep() when lock debugging is off, so
 there should be no performance problem or am I missing something?

CONFIG_DEBUG_ATOMIC_SLEEP too, no?

  See e.g. this thread:
  https://lkml.org/lkml/2013/8/14/652
  There was even an lwn.net article about it, that I don't seem to be
  able to locate.
 
 Thanks, will see if I can find it.
 
  So might_sleep is not appropriate for lightweight operations like 
  __get_user,
  which people literally expect to be a single instruction.
 
 Yes, as discussed below, __.* variants should never call it.

So that would be even more inconsistent. They fault exactly the same as
the non __ variants.


  
  
  I also have a project going which handles very short packets by copying
  them into guest memory directly without waking up a thread.
  I do it by calling recvmsg on a socket, then switching to
  a thread if I get back EFAULT.
  Not yet clean enough to upstream but it does seem to cut
  the latency down quite a bit, so I'd like to have the option.
  
  
  Generally, a caller that does something like this under a spinlock:
  preempt_disable
  pagefault_disable
  error = copy_to_user
  pagefault_enable
  preempt_enable_no_resched
  
  is not doing anything wrong and should not get a warning,
  as long as error is handled correctly later.
 
 I think this would be a perfect fit for an inatomic variant, no?

No because inatomic does not handle faults.

 I mean even the copy_to_user documentation of e.g. s390, x86, mips
 clearly says:
   Context: User context only.-This function may sleep.

So the comment is incomplete - I didn't get around fixing that.
It may sleep but not in atomic context.

 So calling it from your described scenario is wrong. And as the interface 
 says,
 it might_sleep() and therefore also call the check in might_fault().

Exactly the reverse.
There's no way for it to sleep except on fault and that only if
preempttion is on.

  
  You can also find the discussion around the patches
  educational:
  http://article.gmane.org/gmane.comp.emulators.kvm.devel/109928
  
   However
   we have the inatomic variants of these function for this purpose.
  
  Does inatomic install fixups now?
 
 I think this varies between architectures but I am no expert. But as 99,9% of
 all pagefault_disable code uses inatomic, I would have guessed that this is
 rather the way to got than simply using the non atomic variant that clearly
 states on the interface that it might sleep?

Let's go ahead and make the comment more exact then.

  
  Last I checked, it didn't so it did not satisfy this purpose.
  See this comment from x86:
  
   * Copy data from kernel space to user space.  Caller must check
   * the specified block with access_ok() before calling this function.
   * The caller should also make sure 

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 05:17:29PM +0200, Michael S. Tsirkin wrote:
 On Wed, Nov 26, 2014 at 11:05:04AM +0100, David Hildenbrand wrote:
   What's the path you are trying to debug?
  
  Well, we had a problem where we held a spin_lock and called
  copy_(from|to)_user(). We experienced very random deadlocks that took some 
  guy
  almost a week to debug. The simple might_sleep() check would have showed 
  this
  error immediately.
 
 This must have been a very old kernel.
 A modern kernel will return an error from copy_to_user.
 Which is really the point of the patch you are trying to revert.

That's assuming you disabled preemption. If you didn't, and take
a spinlock, you have deadlocks even without userspace access.

-- 
MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Christian Borntraeger
Am 26.11.2014 um 16:17 schrieb Michael S. Tsirkin:
 On Wed, Nov 26, 2014 at 11:05:04AM +0100, David Hildenbrand wrote:
 What's the path you are trying to debug?

 Well, we had a problem where we held a spin_lock and called
 copy_(from|to)_user(). We experienced very random deadlocks that took some 
 guy
 almost a week to debug. The simple might_sleep() check would have showed this
 error immediately.
 

 This must have been a very old kernel.
 A modern kernel will return an error from copy_to_user.

I disagree. copy_to_user will not return while holding a spinlock, because it 
does not know! How should it?
See: spin_lock will call preempt_disable, but thats a no-op for a non-preempt 
kernel. So the mere fact that we hold a spin_lock is not known by any user 
access function. (or others). No?

Christian



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread David Hildenbrand
 On Wed, Nov 26, 2014 at 05:17:29PM +0200, Michael S. Tsirkin wrote:
  On Wed, Nov 26, 2014 at 11:05:04AM +0100, David Hildenbrand wrote:
What's the path you are trying to debug?
   
   Well, we had a problem where we held a spin_lock and called
   copy_(from|to)_user(). We experienced very random deadlocks that took 
   some guy
   almost a week to debug. The simple might_sleep() check would have showed 
   this
   error immediately.
  
  This must have been a very old kernel.
  A modern kernel will return an error from copy_to_user.
  Which is really the point of the patch you are trying to revert.
 
 That's assuming you disabled preemption. If you didn't, and take
 a spinlock, you have deadlocks even without userspace access.
 

(Thanks for your resent, my first email was sent directly to you ... grml)

This is what happened on our side (very recent kernel):

spin_lock(lock)
copy_to_user(...)
spin_unlock(lock)

1. s390 locks/unlocks a spin lock with a compare and swap, using the _cpu id_
   as old value
2. we slept during copy_to_user()
3. the thread got scheduled onto another cpu
4. spin_unlock failed as the _cpu id_ didn't match (another cpu that locked
   the spinlock tried to unlocked it).
5. lock remained locked - deadlock

Christian came up with the following explanation:
Without preemption, spin_lock() will not touch the preempt counter.
disable_pfault() will always touch it.

Therefore, with preemption disabled, copy_to_user() has no idea that it is
running in atomic context - and will therefore try to sleep.

So copy_to_user() will on s390:
1. run as atomic while spin_lock() with preemption enabled.
2. run as not atomic while spin_lock() with preemption disabled.
3.  run as atomic while pagefault_disabled() with preemption enabled or
disabled.
4. run as not atomic when really not atomic.

And exactly nr 2. is the thing that produced the deadlock in our scenario and
the reason why I want a might_sleep() :)

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 04:30:32PM +0100, Christian Borntraeger wrote:
 Am 26.11.2014 um 16:17 schrieb Michael S. Tsirkin:
  On Wed, Nov 26, 2014 at 11:05:04AM +0100, David Hildenbrand wrote:
  What's the path you are trying to debug?
 
  Well, we had a problem where we held a spin_lock and called
  copy_(from|to)_user(). We experienced very random deadlocks that took some 
  guy
  almost a week to debug. The simple might_sleep() check would have showed 
  this
  error immediately.
  
 
  This must have been a very old kernel.
  A modern kernel will return an error from copy_to_user.
 
 I disagree. copy_to_user will not return while holding a spinlock, because it 
 does not know! How should it?
 See: spin_lock will call preempt_disable, but thats a no-op for a non-preempt 
 kernel. So the mere fact that we hold a spin_lock is not known by any user 
 access function. (or others). No?
 
 Christian
 
 

Well might_sleep() merely checks preempt count and irqs_disabled too.
If you want debugging things to trigger, you need to enable
a bunch of config options.  That's not new.


-- 
MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 04:32:07PM +0100, David Hildenbrand wrote:
  On Wed, Nov 26, 2014 at 05:17:29PM +0200, Michael S. Tsirkin wrote:
   On Wed, Nov 26, 2014 at 11:05:04AM +0100, David Hildenbrand wrote:
 What's the path you are trying to debug?

Well, we had a problem where we held a spin_lock and called
copy_(from|to)_user(). We experienced very random deadlocks that took 
some guy
almost a week to debug. The simple might_sleep() check would have 
showed this
error immediately.
   
   This must have been a very old kernel.
   A modern kernel will return an error from copy_to_user.
   Which is really the point of the patch you are trying to revert.
  
  That's assuming you disabled preemption. If you didn't, and take
  a spinlock, you have deadlocks even without userspace access.
  
 
 (Thanks for your resent, my first email was sent directly to you ... grml)
 
 This is what happened on our side (very recent kernel):
 
 spin_lock(lock)
 copy_to_user(...)
 spin_unlock(lock)

That's a deadlock even without copy_to_user - it's
enough for the thread to be preempted and another one
to try taking the lock.


 1. s390 locks/unlocks a spin lock with a compare and swap, using the _cpu id_
as old value
 2. we slept during copy_to_user()
 3. the thread got scheduled onto another cpu
 4. spin_unlock failed as the _cpu id_ didn't match (another cpu that locked
the spinlock tried to unlocked it).
 5. lock remained locked - deadlock
 
 Christian came up with the following explanation:
 Without preemption, spin_lock() will not touch the preempt counter.
 disable_pfault() will always touch it.
 
 Therefore, with preemption disabled, copy_to_user() has no idea that it is
 running in atomic context - and will therefore try to sleep.

 So copy_to_user() will on s390:
 1. run as atomic while spin_lock() with preemption enabled.
 2. run as not atomic while spin_lock() with preemption disabled.
 3.  run as atomic while pagefault_disabled() with preemption enabled or
 disabled.
 4. run as not atomic when really not atomic.
 
 And exactly nr 2. is the thing that produced the deadlock in our scenario and
 the reason why I want a might_sleep() :)

IMHO it's not copy to user that causes the problem.
It's the misuse of spinlocks with preemption on.

So might_sleep would make you think copy_to_user is
the problem, and e.g. let you paper over it by
moving copy_to_user out.

Enable lock prover and you will see what the real
issue is, which is you didn't disable preempt.
and if you did, copy_to_user would be okay.

-- 
MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread David Hildenbrand
  This is what happened on our side (very recent kernel):
  
  spin_lock(lock)
  copy_to_user(...)
  spin_unlock(lock)
 
 That's a deadlock even without copy_to_user - it's
 enough for the thread to be preempted and another one
 to try taking the lock.
 
 
  1. s390 locks/unlocks a spin lock with a compare and swap, using the _cpu 
  id_
 as old value
  2. we slept during copy_to_user()
  3. the thread got scheduled onto another cpu
  4. spin_unlock failed as the _cpu id_ didn't match (another cpu that locked
 the spinlock tried to unlocked it).
  5. lock remained locked - deadlock
  
  Christian came up with the following explanation:
  Without preemption, spin_lock() will not touch the preempt counter.
  disable_pfault() will always touch it.
  
  Therefore, with preemption disabled, copy_to_user() has no idea that it is
  running in atomic context - and will therefore try to sleep.
 
  So copy_to_user() will on s390:
  1. run as atomic while spin_lock() with preemption enabled.
  2. run as not atomic while spin_lock() with preemption disabled.
  3.  run as atomic while pagefault_disabled() with preemption enabled or
  disabled.
  4. run as not atomic when really not atomic.

should have been more clear at that point: 
preemption enabled == kernel compiled with preemption support
preemption disabled == kernel compiled without preemption support

  
  And exactly nr 2. is the thing that produced the deadlock in our scenario 
  and
  the reason why I want a might_sleep() :)
 
 IMHO it's not copy to user that causes the problem.
 It's the misuse of spinlocks with preemption on.

As I said, preemption was off.

 
 So might_sleep would make you think copy_to_user is
 the problem, and e.g. let you paper over it by
 moving copy_to_user out.

Actually implementing different way of locking easily fixed the problem for us.
The old might_sleep() checks would have given us the problem within a few
seconds (I tested it).

 
 Enable lock prover and you will see what the real
 issue is, which is you didn't disable preempt.
 and if you did, copy_to_user would be okay.
 

Our kernel is compiled without preemption and we turned on all lock/atomic
sleep debugging aid. No problem was detected.


But the question is if we shouldn't rather provide a:

  copy_to_user_nosleep() implementation that can be called from
pagefault_disable() because it won't sleep.
and a
  copy_to_user_sleep() implementation that cannot be called from
pagefault_disable().

Another way to fix it would be a reworked pagefault_disable() function that
somehow sets a flag, so copy_to_user() knows that it is in fact called from a
valid context, not just from some atomic context. So we could trigger
might_sleep() when detecting a !pagefault_disable context.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Christian Borntraeger
Am 26.11.2014 um 16:37 schrieb Michael S. Tsirkin:
 On Wed, Nov 26, 2014 at 04:30:32PM +0100, Christian Borntraeger wrote:
 Am 26.11.2014 um 16:17 schrieb Michael S. Tsirkin:
 On Wed, Nov 26, 2014 at 11:05:04AM +0100, David Hildenbrand wrote:
 What's the path you are trying to debug?

 Well, we had a problem where we held a spin_lock and called
 copy_(from|to)_user(). We experienced very random deadlocks that took some 
 guy
 almost a week to debug. The simple might_sleep() check would have showed 
 this
 error immediately.


 This must have been a very old kernel.
 A modern kernel will return an error from copy_to_user.

 I disagree. copy_to_user will not return while holding a spinlock, because 
 it does not know! How should it?
 See: spin_lock will call preempt_disable, but thats a no-op for a 
 non-preempt kernel. So the mere fact that we hold a spin_lock is not known 
 by any user access function. (or others). No?

 Christian


 
 Well might_sleep() merely checks preempt count and irqs_disabled too.
 If you want debugging things to trigger, you need to enable
 a bunch of config options.  That's not new.

You miss the point of the whole thread: The problem is that even with debug 
options enabled, holding a spinlock would not trigger a bug on copy_to_user. So 
the problem is not the good path, the problem is that a debugging aid for 
detecting a broken case was lost. Even with all kernel debugging enabled.

That is because CONFIG_DEBUG_ATOMIC_SLEEP selects PREEMPT_COUNT. That means: 
spin_lock will then be considered as in_atomic and no message comes. Without 
CONFIG_DEBUG_ATOMIC_SLEEP spin_lock will not touch the preempt_count but we 
also dont see a message because might_fault is now a nop

I understand that you dont like Davids changes due to other side effects that 
you have mentioned. So lets focus on how we can fix the debug option. Ok?

Christian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Christian Borntraeger
Am 26.11.2014 um 16:47 schrieb Michael S. Tsirkin:
 On Wed, Nov 26, 2014 at 04:32:07PM +0100, David Hildenbrand wrote:
 On Wed, Nov 26, 2014 at 05:17:29PM +0200, Michael S. Tsirkin wrote:
 On Wed, Nov 26, 2014 at 11:05:04AM +0100, David Hildenbrand wrote:
 What's the path you are trying to debug?

 Well, we had a problem where we held a spin_lock and called
 copy_(from|to)_user(). We experienced very random deadlocks that took 
 some guy
 almost a week to debug. The simple might_sleep() check would have showed 
 this
 error immediately.

 This must have been a very old kernel.
 A modern kernel will return an error from copy_to_user.
 Which is really the point of the patch you are trying to revert.

 That's assuming you disabled preemption. If you didn't, and take
 a spinlock, you have deadlocks even without userspace access.


 (Thanks for your resent, my first email was sent directly to you ... grml)

 This is what happened on our side (very recent kernel):

 spin_lock(lock)
 copy_to_user(...)
 spin_unlock(lock)
 
 That's a deadlock even without copy_to_user - it's
 enough for the thread to be preempted and another one
 to try taking the lock.

Huh? With CONFIG_PREEMPT spin_lock will disable preemption. (we had preempt = 
server anyway).

But please: One step back. The problem is not the good path. The problem is 
that we lost a debugging aid for a known to be broken case. In other words: Our 
code had a bug. Older kernels detected that kind of bug. With your change we no 
longer saw the sleeping while atomic. Thats it. See my other mail.

Christian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 05:02:23PM +0100, David Hildenbrand wrote:
   This is what happened on our side (very recent kernel):
   
   spin_lock(lock)
   copy_to_user(...)
   spin_unlock(lock)
  
  That's a deadlock even without copy_to_user - it's
  enough for the thread to be preempted and another one
  to try taking the lock.
  
  
   1. s390 locks/unlocks a spin lock with a compare and swap, using the _cpu 
   id_
  as old value
   2. we slept during copy_to_user()
   3. the thread got scheduled onto another cpu
   4. spin_unlock failed as the _cpu id_ didn't match (another cpu that 
   locked
  the spinlock tried to unlocked it).
   5. lock remained locked - deadlock
   
   Christian came up with the following explanation:
   Without preemption, spin_lock() will not touch the preempt counter.
   disable_pfault() will always touch it.
   
   Therefore, with preemption disabled, copy_to_user() has no idea that it is
   running in atomic context - and will therefore try to sleep.
  
   So copy_to_user() will on s390:
   1. run as atomic while spin_lock() with preemption enabled.
   2. run as not atomic while spin_lock() with preemption disabled.
   3.  run as atomic while pagefault_disabled() with preemption enabled or
   disabled.
   4. run as not atomic when really not atomic.
 
 should have been more clear at that point: 
 preemption enabled == kernel compiled with preemption support
 preemption disabled == kernel compiled without preemption support
 
   
   And exactly nr 2. is the thing that produced the deadlock in our scenario 
   and
   the reason why I want a might_sleep() :)
  
  IMHO it's not copy to user that causes the problem.
  It's the misuse of spinlocks with preemption on.
 
 As I said, preemption was off.

off - disabled at compile time?

But the code is broken for people that do enable it.


  
  So might_sleep would make you think copy_to_user is
  the problem, and e.g. let you paper over it by
  moving copy_to_user out.
 
 Actually implementing different way of locking easily fixed the problem for 
 us.
 The old might_sleep() checks would have given us the problem within a few
 seconds (I tested it).


Or enable CONFIG_PREMPT, with same effect (copy_to_user will report
an error).

Do you check  return code from copy to user?
If not then you have another bug ...

  
  Enable lock prover and you will see what the real
  issue is, which is you didn't disable preempt.
  and if you did, copy_to_user would be okay.
  
 
 Our kernel is compiled without preemption and we turned on all lock/atomic
 sleep debugging aid. No problem was detected.

But your code is still buggy with preemption on, isn't it?


 
 But the question is if we shouldn't rather provide a:
 
   copy_to_user_nosleep() implementation that can be called from
 pagefault_disable() because it won't sleep.
 and a
   copy_to_user_sleep() implementation that cannot be called from
 pagefault_disable().
 
 Another way to fix it would be a reworked pagefault_disable() function that
 somehow sets a flag, so copy_to_user() knows that it is in fact called from 
 a
 valid context, not just from some atomic context. So we could trigger
 might_sleep() when detecting a !pagefault_disable contex

I think all this is just directing people to paper over the
problem. You should normally disable preemption if you take
spinlocks.
Yes it might happen to work if preempt is compiled out
and you don't trigger scheduler, but Linux might
add scheduler calls at any point without notice,
code must be preempt safe.

Maybe add a debug option warning about spinlocks taken
with preempt on.

That would make sense I think.


-- 
MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Christian Borntraeger
Am 26.11.2014 um 17:19 schrieb Michael S. Tsirkin:
 On Wed, Nov 26, 2014 at 05:02:23PM +0100, David Hildenbrand wrote:
 This is what happened on our side (very recent kernel):

 spin_lock(lock)
 copy_to_user(...)
 spin_unlock(lock)

 That's a deadlock even without copy_to_user - it's
 enough for the thread to be preempted and another one
 to try taking the lock.


 1. s390 locks/unlocks a spin lock with a compare and swap, using the _cpu 
 id_
as old value
 2. we slept during copy_to_user()
 3. the thread got scheduled onto another cpu
 4. spin_unlock failed as the _cpu id_ didn't match (another cpu that locked
the spinlock tried to unlocked it).
 5. lock remained locked - deadlock

 Christian came up with the following explanation:
 Without preemption, spin_lock() will not touch the preempt counter.
 disable_pfault() will always touch it.

 Therefore, with preemption disabled, copy_to_user() has no idea that it is
 running in atomic context - and will therefore try to sleep.

 So copy_to_user() will on s390:
 1. run as atomic while spin_lock() with preemption enabled.
 2. run as not atomic while spin_lock() with preemption disabled.
 3.  run as atomic while pagefault_disabled() with preemption enabled or
 disabled.
 4. run as not atomic when really not atomic.

 should have been more clear at that point: 
 preemption enabled == kernel compiled with preemption support
 preemption disabled == kernel compiled without preemption support


 And exactly nr 2. is the thing that produced the deadlock in our scenario 
 and
 the reason why I want a might_sleep() :)

 IMHO it's not copy to user that causes the problem.
 It's the misuse of spinlocks with preemption on.

 As I said, preemption was off.
 
 off - disabled at compile time?
 
 But the code is broken for people that do enable it.
[...]
 You should normally disable preemption if you take
 spinlocks.

Your are telling that any sequence of
spin_lock
...
spin_unlock

is broken with CONFIG_PREEMPT?
Michael, that is bullshit. spin_lock will take care of CONFIG_PREEMPT just 
fine. 

Only sequences like
spin_lock
...
schedule
...
spin_unlock
are broken.

But as I said. That is not the problem that we are discussing here.

Christian

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 05:07:13PM +0100, Christian Borntraeger wrote:
 Am 26.11.2014 um 16:47 schrieb Michael S. Tsirkin:
  On Wed, Nov 26, 2014 at 04:32:07PM +0100, David Hildenbrand wrote:
  On Wed, Nov 26, 2014 at 05:17:29PM +0200, Michael S. Tsirkin wrote:
  On Wed, Nov 26, 2014 at 11:05:04AM +0100, David Hildenbrand wrote:
  What's the path you are trying to debug?
 
  Well, we had a problem where we held a spin_lock and called
  copy_(from|to)_user(). We experienced very random deadlocks that took 
  some guy
  almost a week to debug. The simple might_sleep() check would have 
  showed this
  error immediately.
 
  This must have been a very old kernel.
  A modern kernel will return an error from copy_to_user.
  Which is really the point of the patch you are trying to revert.
 
  That's assuming you disabled preemption. If you didn't, and take
  a spinlock, you have deadlocks even without userspace access.
 
 
  (Thanks for your resent, my first email was sent directly to you ... grml)
 
  This is what happened on our side (very recent kernel):
 
  spin_lock(lock)
  copy_to_user(...)
  spin_unlock(lock)
  
  That's a deadlock even without copy_to_user - it's
  enough for the thread to be preempted and another one
  to try taking the lock.
 
 Huh? With CONFIG_PREEMPT spin_lock will disable preemption. (we had preempt = 
 server anyway).

Are you sure? Can you point me where it does this please?

 But please: One step back. The problem is not the good path. The problem is 
 that we lost a debugging aid for a known to be broken case. In other words: 
 Our code had a bug. Older kernels detected that kind of bug. With your change 
 we no longer saw the sleeping while atomic. Thats it. See my other mail.
 
 Christian

You want to add more debugging tools, fine. But this one was
giving users in field false positives.

The point is that *_user is safe with preempt off.
It returns an error gracefully.
It does not sleep.
It does not trigger the scheduler in that context.


David's patch makes it say it does, so it's wrong.



-- 
MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 05:30:35PM +0100, Christian Borntraeger wrote:
 Am 26.11.2014 um 17:19 schrieb Michael S. Tsirkin:
  On Wed, Nov 26, 2014 at 05:02:23PM +0100, David Hildenbrand wrote:
  This is what happened on our side (very recent kernel):
 
  spin_lock(lock)
  copy_to_user(...)
  spin_unlock(lock)
 
  That's a deadlock even without copy_to_user - it's
  enough for the thread to be preempted and another one
  to try taking the lock.
 
 
  1. s390 locks/unlocks a spin lock with a compare and swap, using the 
  _cpu id_
 as old value
  2. we slept during copy_to_user()
  3. the thread got scheduled onto another cpu
  4. spin_unlock failed as the _cpu id_ didn't match (another cpu that 
  locked
 the spinlock tried to unlocked it).
  5. lock remained locked - deadlock
 
  Christian came up with the following explanation:
  Without preemption, spin_lock() will not touch the preempt counter.
  disable_pfault() will always touch it.
 
  Therefore, with preemption disabled, copy_to_user() has no idea that it 
  is
  running in atomic context - and will therefore try to sleep.
 
  So copy_to_user() will on s390:
  1. run as atomic while spin_lock() with preemption enabled.
  2. run as not atomic while spin_lock() with preemption disabled.
  3.  run as atomic while pagefault_disabled() with preemption enabled or
  disabled.
  4. run as not atomic when really not atomic.
 
  should have been more clear at that point: 
  preemption enabled == kernel compiled with preemption support
  preemption disabled == kernel compiled without preemption support
 
 
  And exactly nr 2. is the thing that produced the deadlock in our 
  scenario and
  the reason why I want a might_sleep() :)
 
  IMHO it's not copy to user that causes the problem.
  It's the misuse of spinlocks with preemption on.
 
  As I said, preemption was off.
  
  off - disabled at compile time?
  
  But the code is broken for people that do enable it.
 [...]
  You should normally disable preemption if you take
  spinlocks.
 
 Your are telling that any sequence of
 spin_lock
 ...
 spin_unlock
 
 is broken with CONFIG_PREEMPT?
 Michael, that is bullshit. spin_lock will take care of CONFIG_PREEMPT just 
 fine. 
 
 Only sequences like
 spin_lock
 ...
 schedule
 ...
 spin_unlock
 are broken.
 
 But as I said. That is not the problem that we are discussing here.
 
 Christian

I'm saying spin_lock without _irqsave is often a bug.


I am also saying this code in mm/fault.c:
__do_page_fault
...
/*
 * If we're in an interrupt, have no user context or are running
 * in an atomic region then we must not take the fault:
 */
if (unlikely(in_atomic() || !mm)) {
bad_area_nosemaphore(regs, error_code, address);
return;
}

means that a fault won't cause sleep if called in atomic context.

And a bunch of code relies on this.

This is why might_fault does:

 * it would be nicer only to annotate paths which are not under
 * pagefault_disable, however that requires a larger audit and
 * providing helpers like get_user_atomic.
 */
if (in_atomic())
return;

__might_sleep(__FILE__, __LINE__, 0);


If you see this violated, let's figure out why.

-- 
MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Christian Borntraeger
Am 26.11.2014 um 17:32 schrieb Michael S. Tsirkin:
[...]
 This is what happened on our side (very recent kernel):

 spin_lock(lock)
 copy_to_user(...)
 spin_unlock(lock)

 That's a deadlock even without copy_to_user - it's
 enough for the thread to be preempted and another one
 to try taking the lock.

 Huh? With CONFIG_PREEMPT spin_lock will disable preemption. (we had preempt 
 = server anyway).
 
 Are you sure? Can you point me where it does this please?

spin_lock -- raw_spin_lock -- _raw_spin_lock -- __raw_spin_lock

static inline void __raw_spin_lock(raw_spinlock_t *lock)
{
   preempt_disable();   -
spin_acquire(lock-dep_map, 0, 0, _RET_IP_);
LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
}


Michael, please be serious. The whole kernel would be broken if spin_lock would 
not disable preemption.


 
 But please: One step back. The problem is not the good path. The problem is 
 that we lost a debugging aid for a known to be broken case. In other words: 
 Our code had a bug. Older kernels detected that kind of bug. With your 
 change we no longer saw the sleeping while atomic. Thats it. See my other 
 mail.

 Christian
 
 You want to add more debugging tools, fine.

We dont want to add, we want to fix something that used to work

 But this one was  giving users in field false positives.

So lets try to fix those, ok? If we cant, then tough luck. But coming up with 
wrong statements is not helpful.

 
 The point is that *_user is safe with preempt off.
 It returns an error gracefully.
 It does not sleep.
 It does not trigger the scheduler in that context.

There are special cases where your statement is true. But its not in general.
copy_to_user might fault and that fault might sleep and reschedule. For example 
handle_mm_fault might go down to pud_alloc, pmd_alloc etc and all these 
functions could do an GFP_KERNEL allocation. Which might sleep. Which will 
schedule.


 
 
 David's patch makes it say it does, so it's wrong.
 
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 05:51:08PM +0100, Christian Borntraeger wrote:
  But this one was  giving users in field false positives.
 
 So lets try to fix those, ok? If we cant, then tough luck.

Sure.
I think the simplest way might be to make spinlock disable
premption when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.

As a result, userspace access will fail and caller will
get a nice error.



 But coming up with wrong statements is not helpful.

True. Sorry that I did that.

  
  The point is that *_user is safe with preempt off.
  It returns an error gracefully.
  It does not sleep.
  It does not trigger the scheduler in that context.
 
 There are special cases where your statement is true. But its not in general.
 copy_to_user might fault and that fault might sleep and reschedule.

Yes. But not if called inatomic.



 For example handle_mm_fault might go down to pud_alloc, pmd_alloc etc and all 
 these functions could do an GFP_KERNEL allocation. Which might sleep. Which 
 will schedule.
 
 
  
  
  David's patch makes it say it does, so it's wrong.
  
  
  

Absolutely.
I think you can already debug your case easily, by enabling CONFIG_PREEMPT.
This seems counter-intuitive, and distro debug kernels don't seem to do this.

-- 
MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Michael S. Tsirkin
On Wed, Nov 26, 2014 at 07:04:47PM +0200, Michael S. Tsirkin wrote:
 On Wed, Nov 26, 2014 at 05:51:08PM +0100, Christian Borntraeger wrote:
   But this one was  giving users in field false positives.
  
  So lets try to fix those, ok? If we cant, then tough luck.
 
 Sure.
 I think the simplest way might be to make spinlock disable
 premption when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.

Specifically maybe DEBUG_ATOMIC_SLEEP should select PREEMPT_COUNT?


 As a result, userspace access will fail and caller will
 get a nice error.
 
 
 
  But coming up with wrong statements is not helpful.
 
 True. Sorry that I did that.
 
   
   The point is that *_user is safe with preempt off.
   It returns an error gracefully.
   It does not sleep.
   It does not trigger the scheduler in that context.
  
  There are special cases where your statement is true. But its not in 
  general.
  copy_to_user might fault and that fault might sleep and reschedule.
 
 Yes. But not if called inatomic.
 
 
 
  For example handle_mm_fault might go down to pud_alloc, pmd_alloc etc and 
  all these functions could do an GFP_KERNEL allocation. Which might sleep. 
  Which will schedule.
  
  
   
   
   David's patch makes it say it does, so it's wrong.
   
   
   
 
 Absolutely.
 I think you can already debug your case easily, by enabling CONFIG_PREEMPT.
 This seems counter-intuitive, and distro debug kernels don't seem to do this.
 
 -- 
 MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Heiko Carstens
On Wed, Nov 26, 2014 at 07:04:47PM +0200, Michael S. Tsirkin wrote:
 On Wed, Nov 26, 2014 at 05:51:08PM +0100, Christian Borntraeger wrote:
   But this one was  giving users in field false positives.
  
  So lets try to fix those, ok? If we cant, then tough luck.
 
 Sure.
 I think the simplest way might be to make spinlock disable
 premption when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.
 
 As a result, userspace access will fail and caller will
 get a nice error.

Yes, _userspace_ now sees unpredictable behaviour, instead of that the
kernel emits a big loud warning to the console.

Please consider this simple example:

int bar(char __user *ptr)
{
...
if (copy_to_user(ptr, ...)
return -EFAULT;
...
}

SYSCALL_DEFINE1(foo, char __user *, ptr)
{
int rc;

...
rc = bar(ptr);
if (rc)
goto out;
...
out:
return rc;  
}

The above simple system call just works fine, with and without your change,
however if somebody (incorrectly) changes sys_foo() to the code below:

spin_lock(lock);
rc = bar(ptr);
if (rc)
goto out;
out:
spin_unlock(lock);
return rc;  

Broken code like above used to generate warnings. With your change we won't
see any warnings anymore. Instead we get random and bad behaviour:

For !CONFIG_PREEMPT if the page at ptr is not mapped, the kernel will see
a fault, potentially schedule and potentially deadlock on lock.
Without _any_ warning anymore.

For CONFIG_PREEMPT if the page at ptr is mapped, everthing works. However if
the page is not mapped, userspace now all of the sudden will see an invalid(!)
-EFAULT return code, instead of that the kernel resolved the page fault.
Yes, the kernel can't resolve the fault since we hold a spinlock. But the
above bogus code did give warnings to give you an idea that something probably
is not correct.

Who on earth is supposed to debug crap like this???

What we really want is:

Code like
spin_lock(lock);
if (copy_to_user(...))
rc = ...
spin_unlock(lock);
really *should* generate warnings like it did before.

And *only* code like
spin_lock(lock);
page_fault_disable();
if (copy_to_user(...))
rc = ...
page_fault_enable();
spin_unlock(lock);
should not generate warnings, since the author hopefully knew what he did.

We could achieve that by e.g. adding a couple of pagefault disabled bits
within current_thread_info()-preempt_count, which would allow
pagefault_disable() and pagefault_enable() to modify a different part of
preempt_count than it does now, so there is a way to tell if pagefaults have
been explicitly disabled or are just a side effect of preemption being
disabled.
This would allow might_fault() to restore its old sane behaviour for the
!page_fault_disabled() case.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-26 Thread Michael S. Tsirkin
On Thu, Nov 27, 2014 at 08:09:19AM +0100, Heiko Carstens wrote:
 On Wed, Nov 26, 2014 at 07:04:47PM +0200, Michael S. Tsirkin wrote:
  On Wed, Nov 26, 2014 at 05:51:08PM +0100, Christian Borntraeger wrote:
But this one was  giving users in field false positives.
   
   So lets try to fix those, ok? If we cant, then tough luck.
  
  Sure.
  I think the simplest way might be to make spinlock disable
  premption when CONFIG_DEBUG_ATOMIC_SLEEP is enabled.
  
  As a result, userspace access will fail and caller will
  get a nice error.
 
 Yes, _userspace_ now sees unpredictable behaviour, instead of that the
 kernel emits a big loud warning to the console.

So I don't object to adding more debugging at all.
Sure, would be nice.

But the fix is not an unconditional might_sleep
within might_fault, this would trigger false positives.

Rather, detect that you took a spinlock
without disabling preemption.


 Please consider this simple example:
 
 int bar(char __user *ptr)
 {
   ...
   if (copy_to_user(ptr, ...)
   return -EFAULT;
   ...
 }
 
 SYSCALL_DEFINE1(foo, char __user *, ptr)
 {
   int rc;
 
   ...
   rc = bar(ptr);
   if (rc)
   goto out;
   ...
 out:
   return rc;  
 }
 
 The above simple system call just works fine, with and without your change,
 however if somebody (incorrectly) changes sys_foo() to the code below:
 
   spin_lock(lock);
   rc = bar(ptr);
   if (rc)
   goto out;
 out:
   spin_unlock(lock);
   return rc;  
 
 Broken code like above used to generate warnings. With your change we won't
 see any warnings anymore. Instead we get random and bad behaviour:
 
 For !CONFIG_PREEMPT if the page at ptr is not mapped, the kernel will see
 a fault, potentially schedule and potentially deadlock on lock.
 Without _any_ warning anymore.
 
 For CONFIG_PREEMPT if the page at ptr is mapped, everthing works. However if
 the page is not mapped, userspace now all of the sudden will see an invalid(!)
 -EFAULT return code, instead of that the kernel resolved the page fault.
 Yes, the kernel can't resolve the fault since we hold a spinlock. But the
 above bogus code did give warnings to give you an idea that something probably
 is not correct.
 
 Who on earth is supposed to debug crap like this???
 
 What we really want is:
 
 Code like
   spin_lock(lock);
   if (copy_to_user(...))
   rc = ...
   spin_unlock(lock);
 really *should* generate warnings like it did before.
 
 And *only* code like
   spin_lock(lock);
   page_fault_disable();
   if (copy_to_user(...))
   rc = ...
   page_fault_enable();
   spin_unlock(lock);
 should not generate warnings, since the author hopefully knew what he did.
 
 We could achieve that by e.g. adding a couple of pagefault disabled bits
 within current_thread_info()-preempt_count, which would allow
 pagefault_disable() and pagefault_enable() to modify a different part of
 preempt_count than it does now, so there is a way to tell if pagefaults have
 been explicitly disabled or are just a side effect of preemption being
 disabled.
 This would allow might_fault() to restore its old sane behaviour for the
 !page_fault_disabled() case.

Exactly. I agree, that would be a useful debugging tool.

In fact this comment in mm/memory.c hints at this:
 * it would be nicer only to annotate paths which are not under
 * pagefault_disable,

it further says
 * however that requires a larger audit and
 * providing helpers like get_user_atomic.

but I think that what you outline is a better way to do this.

-- 
MST
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-25 Thread David Hildenbrand
I recently discovered that commit 662bbcb2747c2422cf98d3d97619509379eee466
removed/skipped all might_sleep checks for might_fault() calls when in atomic.

Reason was to allow calling copy_(to|from)_user while in pagefault_disabled(),
because otherwise, CONFIG_DEBUG_ATOMIC_SLEEP would find false positives. However
we have the inatomic variants of these function for this purpose.

The result of this change was that all guest access (that correctly uses
might_fault()) doesn't perform atomic checks when CONFIG_DEBUG_ATOMIC_SLEEP is
enabled. We lost a mighty debugging feature for user access.

As I wasn't able to come up with any other reason why this should be
necessary, I suggest turning the might_sleep() checks on again in the
might_fault() code.

pagefault_disable/pagefault_enable seems to be used mainly for futex, perf event
and kmap.

Going over all changes since that commit, it seems like most code already uses 
the
inatomic variants of copy_(to|from)_user. Code relying on (get|put)_user during
pagefault_disable() don't make use of any might_fault() in their (get|put)_user
implementation. Examples:
- arch/m68k/include/asm/futex.h
- arch/parisc/include/asm/futex.h
- arch/sh/include/asm/futex-irq.h
- arch/tile/include/asm/futex.h
So changing might_fault() back to trigger might_sleep() won't change a thing for
them. Hope I haven't missed anything.

I only identified one might_fault() that would get triggered by get_user() on
powerpc and fixed it by using the inatomic variant (not tested). I am not sure
if we need some non-sleeping access_ok() prior to __get_user_inatomic().

By looking at the code I was wondering where the correct place for might_fault()
calls is? Doesn't seem to be very consistent. Examples:

   | asm-generic | arm | arm64 | frv | m32r | x86 and s390
---
get_user() |   Yes   | Yes | Yes   | No  | Yes  | Yes
__get_user()   |   No| Yes | No| No  | Yes  | No
put_user() |   Yes   | Yes | Yes   | No  | Yes  | Yes
__put_user()   |   No| Yes | No| No  | Yes  | No
copy_to_user() |   Yes   | No  | No| Yes | Yes  | Yes
__copy_to_user()   |   No| No  | No| Yes | No   | No
copy_from_user()   |   Yes   | No  | No| Yes | Yes  | Yes
__copy_from_user() |   No| No  | No| Yes | No   | No

So I would have assume that the way asm-generic, x86 and s390 (+ propably
others) do this is the right way?
So we can speed up multiple calls to e.g. copy_to_user() by doing the access
check manually (and also the might_fault() if relevant), then calling
__copy_to_user().

So in general, I conclude that the concept is:
1. __.* variants perform no checking and don't call might_fault()
2. [a-z].* variants perform access checking (access_ok() if implemented)
3. [a-z].* variants call might_fault()
4. .*_inatomic variants usually don't perform access checks
5. .*_inatomic variants don't call might_fault()
6. If common code uses the __.* variants, it has to trigger access_ok() and
   call might_fault()
7. For pagefault_disable(), the inatomic variants are to be used

Comments? Opinions?


David Hildenbrand (2):
  powerpc/fsl-pci: atomic get_user when pagefault_disabled
  mm, sched: trigger might_sleep() in might_fault() when atomic

 arch/powerpc/sysdev/fsl_pci.c |  2 +-
 include/linux/kernel.h|  8 ++--
 mm/memory.c   | 11 ---
 3 files changed, 11 insertions(+), 10 deletions(-)

-- 
1.8.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC 0/2] Reenable might_sleep() checks for might_fault() when atomic

2014-11-25 Thread Michael S. Tsirkin
On Tue, Nov 25, 2014 at 12:43:24PM +0100, David Hildenbrand wrote:
 I recently discovered that commit 662bbcb2747c2422cf98d3d97619509379eee466
 removed/skipped all might_sleep checks for might_fault() calls when in atomic.

Yes.  You can add e.g. might_sleep in your code if, for some reason, it is
illegal to call it in an atomic context.
But faults are legal in atomic if you handle the possible
errors, and faults do not necessary cause caller to sleep, so might_fault
should not call might_sleep.

 Reason was to allow calling copy_(to|from)_user while in pagefault_disabled(),
 because otherwise, CONFIG_DEBUG_ATOMIC_SLEEP would find false positives.

That wasn't the only reason BTW.
Andi Kleen also showed that compiler did a terrible job optimizing
get/put user when might_sleep was called.
See e.g. this thread:
https://lkml.org/lkml/2013/8/14/652
There was even an lwn.net article about it, that I don't seem to be
able to locate.
So might_sleep is not appropriate for lightweight operations like __get_user,
which people literally expect to be a single instruction.


I also have a project going which handles very short packets by copying
them into guest memory directly without waking up a thread.
I do it by calling recvmsg on a socket, then switching to
a thread if I get back EFAULT.
Not yet clean enough to upstream but it does seem to cut
the latency down quite a bit, so I'd like to have the option.


Generally, a caller that does something like this under a spinlock:
preempt_disable
pagefault_disable
error = copy_to_user
pagefault_enable
preempt_enable_no_resched

is not doing anything wrong and should not get a warning,
as long as error is handled correctly later.

You can also find the discussion around the patches
educational:
http://article.gmane.org/gmane.comp.emulators.kvm.devel/109928

 However
 we have the inatomic variants of these function for this purpose.

Does inatomic install fixups now?

Last I checked, it didn't so it did not satisfy this purpose.
See this comment from x86:

 * Copy data from kernel space to user space.  Caller must check
 * the specified block with access_ok() before calling this function.
 * The caller should also make sure he pins the user space address
 * so that we don't result in page fault and sleep.


Also - switching to inatomic would scatter if (atomic) all
over code. It's much better to simply call the same
function (e.g. recvmsg) and have it fail gracefully:
after all, we have code to handle get/put user errors
anyway.

 The result of this change was that all guest access (that correctly uses
 might_fault()) doesn't perform atomic checks when CONFIG_DEBUG_ATOMIC_SLEEP is
 enabled. We lost a mighty debugging feature for user access.

What's the path you are trying to debug?

If your code can faults, then it's safe to call
from atomic context.
If it can't, it must pin the page.  You can not do access_ok checks and
then assume access won't fault.

 As I wasn't able to come up with any other reason why this should be
 necessary, I suggest turning the might_sleep() checks on again in the
 might_fault() code.

Faults triggered with pagefault_disabled do not cause
caller to sleep, merely to fail and get an error,
so might_sleep is simply wrong.

 
 pagefault_disable/pagefault_enable seems to be used mainly for futex, perf 
 event
 and kmap.
 
 Going over all changes since that commit, it seems like most code already 
 uses the
 inatomic variants of copy_(to|from)_user. Code relying on (get|put)_user 
 during
 pagefault_disable() don't make use of any might_fault() in their 
 (get|put)_user
 implementation. Examples:
 - arch/m68k/include/asm/futex.h
 - arch/parisc/include/asm/futex.h
 - arch/sh/include/asm/futex-irq.h
 - arch/tile/include/asm/futex.h
 So changing might_fault() back to trigger might_sleep() won't change a thing 
 for
 them. Hope I haven't missed anything.

Did you check the generated code?
On x86 it seems to me this patchset is definitely going to
slow things down, in fact putting back in a performance regression
that Andi found.


 I only identified one might_fault() that would get triggered by get_user() on
 powerpc and fixed it by using the inatomic variant (not tested). I am not sure
 if we need some non-sleeping access_ok() prior to __get_user_inatomic().
 
 By looking at the code I was wondering where the correct place for 
 might_fault()
 calls is? Doesn't seem to be very consistent. Examples:
 
| asm-generic | arm | arm64 | frv | m32r | x86 and s390
 ---
 get_user() |   Yes   | Yes | Yes   | No  | Yes  | Yes
 __get_user()   |   No| Yes | No| No  | Yes  | No
 put_user() |   Yes   | Yes | Yes   | No  | Yes  | Yes
 __put_user()   |   No| Yes | No| No  | Yes  | No
 copy_to_user() |   Yes   | No  | No| Yes | Yes  | Yes
 __copy_to_user()