Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-09-07 Thread Peter Feiner
On Thu, Sep 04, 2014 at 09:43:11AM -0700, Peter Feiner wrote:
> On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote:
> > That sets me wondering: have you placed the VM_SOFTDIRTY check in the
> > right place in this series of tests?
> > 
> > I think, once pgprot_modify() is correct on all architectures,
> > it should be possible to drop that pgprot_val() check from
> > vma_wants_writenotify() - which would be a welcome simplification.
> > 
> > But what about the VM_PFNMAP test below it?  If that test was necessary,
> > then having your VM_SOFTDIRTY check before it seems dangerous.  But I'm
> > hoping we can persuade ourselves that the VM_PFNMAP test was unnecessary,
> > and simply delete it.
> 
> If VM_PFNMAP is necessary, then I definitely put the VM_SOFTDIRTY check in the
> wrong spot :-) I don't know much (i.e., anything) about VM_PFNMAP, so I'll
> have to bone up on a lot of code before I have an informed opinion about the
> necessity of the check.

AFAICT, the VM_PFNMAP check is unnecessary since I can't find any drivers that
set VM_PFNMAP and enable dirty accounting on their mappings. If anything,
VM_PFNMAP precludes mapping dirty tracking since set_page_dirty takes a
struct_page argument! Perhaps the VM_PFNMAP check was originally put in
vma_wants_writenotify as a safeguard against bogus calls to set_page_dirty?
In any case, it seems harmless to me to put the VM_SOFTDIRTY check before the
VM_PFNMAP check since none of the fault handling code in mm/memory.c calls
set_page_dirty on a VM_PFNMAP fault because either vm_normal_page() returns
NULL or ->fault() / ->page_mkwrite() return VM_FAULT_NOPAGE. Moreover, for
the purpose of softdirty tracking, enabling write notifications on VM_PFNMAP
VMAs is OK since do_wp_page does the right thing when vm_normal_page() returns
NULL.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-09-07 Thread Peter Feiner
On Thu, Sep 04, 2014 at 09:43:11AM -0700, Peter Feiner wrote:
 On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote:
  That sets me wondering: have you placed the VM_SOFTDIRTY check in the
  right place in this series of tests?
  
  I think, once pgprot_modify() is correct on all architectures,
  it should be possible to drop that pgprot_val() check from
  vma_wants_writenotify() - which would be a welcome simplification.
  
  But what about the VM_PFNMAP test below it?  If that test was necessary,
  then having your VM_SOFTDIRTY check before it seems dangerous.  But I'm
  hoping we can persuade ourselves that the VM_PFNMAP test was unnecessary,
  and simply delete it.
 
 If VM_PFNMAP is necessary, then I definitely put the VM_SOFTDIRTY check in the
 wrong spot :-) I don't know much (i.e., anything) about VM_PFNMAP, so I'll
 have to bone up on a lot of code before I have an informed opinion about the
 necessity of the check.

AFAICT, the VM_PFNMAP check is unnecessary since I can't find any drivers that
set VM_PFNMAP and enable dirty accounting on their mappings. If anything,
VM_PFNMAP precludes mapping dirty tracking since set_page_dirty takes a
struct_page argument! Perhaps the VM_PFNMAP check was originally put in
vma_wants_writenotify as a safeguard against bogus calls to set_page_dirty?
In any case, it seems harmless to me to put the VM_SOFTDIRTY check before the
VM_PFNMAP check since none of the fault handling code in mm/memory.c calls
set_page_dirty on a VM_PFNMAP fault because either vm_normal_page() returns
NULL or -fault() / -page_mkwrite() return VM_FAULT_NOPAGE. Moreover, for
the purpose of softdirty tracking, enabling write notifications on VM_PFNMAP
VMAs is OK since do_wp_page does the right thing when vm_normal_page() returns
NULL.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-09-04 Thread Peter Feiner
On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote:
> On Sun, 24 Aug 2014, Peter Feiner wrote:
> > With this patch, write notifications are enabled when VM_SOFTDIRTY is
> > cleared. Furthermore, to avoid unnecessary faults, write
> > notifications are disabled when VM_SOFTDIRTY is reset.
> 
> "reset" is often a synonym for "cleared": "whenever VM_SOFTDIRTY is set"?

Agreed, "set" sounds good.

> > As a side effect of enabling and disabling write notifications with
> > care, this patch fixes a bug in mprotect where vm_page_prot bits set
> > by drivers were zapped on mprotect. An analogous bug was fixed in mmap
> > by c9d0bf241451a3ab7d02e1652c22b80cd7d93e8f.
> 
> 
> Commit c9d0bf241451 ("mm: uncached vma support with writenotify").
> Adding Magnus to the Cc list: I have some doubt as to whether his
> bugfix is in fact preserved below, and would like him to check.

I believe the fix is preserved as long as pgprot_modify preserves cache flags.
As you explain below, pgprot_modify only does this on x86 and tile. So this
patch does indeed break c9d0bf241451 on most architectures. Furthermore, as you
said below, this patch would break the build on the other architectures :-)

> I like Kirill's suggestion to approach this via writenotify,
> but find the disable/enable rather confusing (partly because
> enabling writenotify amounts to disabling write access).
> I may be alone in my confusion.

I agree about the confusion. I wasn't too happy with the names myself.
Furthermore, I only really use vma_disable_writenotify to set vm_page_prot
from vm_flags. So I think the enable / disable idea is pretty broken. As you
suggest below, I'm going to give vma_set_page_prot a try.

> > +   if (type == CLEAR_REFS_SOFT_DIRTY &&
> > +   (vma->vm_flags & VM_SOFTDIRTY)) {
> > +   if (!write) {
> > +   r = -EAGAIN;
> > +   break;
> 
> Hmm.  For a long time I thought you were fixing another important bug
> with down_write, since we "always" use down_write to modify vm_flags.
> 
> But now I'm realizing that if this is the _only_ place which modifies
> vm_flags with down_read, then it's "probably" safe.  I've a vague
> feeling that this was discussed before - is that so, Cyrill?
> 
> It certainly feels fragile to depend on this; but conversely, I don't
> like replacing a long down_read scan by an indefinite down_read scan
> followed by a long down_write scan.
> 
> I see that you earlier persuaded yourself that the races are benign
> if you stick with down_read.  I can't confirm or deny that at present:
> seems more important right now to get this mail out to you than think
> through that aspect.

Your observation is correct: clear_refs_write is the only place that vm_flags
is modified without an exclusive lock on mmap_sem.

I was wrong about the race between clear_refs_write modifying vm_flags and the
fault handler reading vm_flags being benign. I had thought that since
clear_refs_write zaps all of the PTEs in the VMA after it modifies vm_flags,
it was ok for a writable PTE to be temporarily installed to handle a read
fault. However, if a write happened after the read fault and before
clear_refs_write zapped the PTE, then we'd miss the write. Therefore I'm
convinced that its necessary to serialize changes to vm_flags and fault
handling.

There are a few ways to accomplish this serialization, all with their pros and
cons:

* One down_read scan followed by a down_write scan, if necessary. This
  is the current implementation.
  Pros: won't take exclusive lock when VMAs haven't changed.
  Cons: might hold exclusive lock during page table walk.

* Per-vma lock, as Cyrill and Kirill were discussing.
  Pros: handle faults on other VMAs when vm_flags is changing.
  Cons: another lock acquired in fault path.

* Iterate over VMAs and modify vm_flags with down_write, then
  downgrade to down_read for page table scan, as Kirill suggested.
  Pros: won't hold exclusive lock during page table walk.
  Cons: clear_refs_write always grabs exclusive lock.

I think the extra lock in the fault handling path rules the per-vma lock out.
Whether the first or third approach is better depends on whether or not VMAs
are changing, which is obviously application specific behavior. A hybrid
approach offers the best of both worlds (i.e., an optimistic down_read scan
that bails out if there's a VM_SOFTDIRTY VMA and falls back to the downgrading
approach).

> 
> > +   }
> > +   vma->vm_flags &= ~VM_SOFTDIRTY;
> > +   vma_enable_writenotify(vma);
> 
> That's an example of how the vma_enable_writenotify() interface
> may be confusing.  I thought for a while that that line was unsafe,
> there being quite other reasons why write protection may be needed;
> then realized it's okay because "enable" is the restrictive one.

Yep, agreed. 

Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-09-04 Thread Peter Feiner
On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote:
 On Sun, 24 Aug 2014, Peter Feiner wrote:
  With this patch, write notifications are enabled when VM_SOFTDIRTY is
  cleared. Furthermore, to avoid unnecessary faults, write
  notifications are disabled when VM_SOFTDIRTY is reset.
 
 reset is often a synonym for cleared: whenever VM_SOFTDIRTY is set?

Agreed, set sounds good.

  As a side effect of enabling and disabling write notifications with
  care, this patch fixes a bug in mprotect where vm_page_prot bits set
  by drivers were zapped on mprotect. An analogous bug was fixed in mmap
  by c9d0bf241451a3ab7d02e1652c22b80cd7d93e8f.
 
 
 Commit c9d0bf241451 (mm: uncached vma support with writenotify).
 Adding Magnus to the Cc list: I have some doubt as to whether his
 bugfix is in fact preserved below, and would like him to check.

I believe the fix is preserved as long as pgprot_modify preserves cache flags.
As you explain below, pgprot_modify only does this on x86 and tile. So this
patch does indeed break c9d0bf241451 on most architectures. Furthermore, as you
said below, this patch would break the build on the other architectures :-)

 I like Kirill's suggestion to approach this via writenotify,
 but find the disable/enable rather confusing (partly because
 enabling writenotify amounts to disabling write access).
 I may be alone in my confusion.

I agree about the confusion. I wasn't too happy with the names myself.
Furthermore, I only really use vma_disable_writenotify to set vm_page_prot
from vm_flags. So I think the enable / disable idea is pretty broken. As you
suggest below, I'm going to give vma_set_page_prot a try.

  +   if (type == CLEAR_REFS_SOFT_DIRTY 
  +   (vma-vm_flags  VM_SOFTDIRTY)) {
  +   if (!write) {
  +   r = -EAGAIN;
  +   break;
 
 Hmm.  For a long time I thought you were fixing another important bug
 with down_write, since we always use down_write to modify vm_flags.
 
 But now I'm realizing that if this is the _only_ place which modifies
 vm_flags with down_read, then it's probably safe.  I've a vague
 feeling that this was discussed before - is that so, Cyrill?
 
 It certainly feels fragile to depend on this; but conversely, I don't
 like replacing a long down_read scan by an indefinite down_read scan
 followed by a long down_write scan.
 
 I see that you earlier persuaded yourself that the races are benign
 if you stick with down_read.  I can't confirm or deny that at present:
 seems more important right now to get this mail out to you than think
 through that aspect.

Your observation is correct: clear_refs_write is the only place that vm_flags
is modified without an exclusive lock on mmap_sem.

I was wrong about the race between clear_refs_write modifying vm_flags and the
fault handler reading vm_flags being benign. I had thought that since
clear_refs_write zaps all of the PTEs in the VMA after it modifies vm_flags,
it was ok for a writable PTE to be temporarily installed to handle a read
fault. However, if a write happened after the read fault and before
clear_refs_write zapped the PTE, then we'd miss the write. Therefore I'm
convinced that its necessary to serialize changes to vm_flags and fault
handling.

There are a few ways to accomplish this serialization, all with their pros and
cons:

* One down_read scan followed by a down_write scan, if necessary. This
  is the current implementation.
  Pros: won't take exclusive lock when VMAs haven't changed.
  Cons: might hold exclusive lock during page table walk.

* Per-vma lock, as Cyrill and Kirill were discussing.
  Pros: handle faults on other VMAs when vm_flags is changing.
  Cons: another lock acquired in fault path.

* Iterate over VMAs and modify vm_flags with down_write, then
  downgrade to down_read for page table scan, as Kirill suggested.
  Pros: won't hold exclusive lock during page table walk.
  Cons: clear_refs_write always grabs exclusive lock.

I think the extra lock in the fault handling path rules the per-vma lock out.
Whether the first or third approach is better depends on whether or not VMAs
are changing, which is obviously application specific behavior. A hybrid
approach offers the best of both worlds (i.e., an optimistic down_read scan
that bails out if there's a VM_SOFTDIRTY VMA and falls back to the downgrading
approach).

 
  +   }
  +   vma-vm_flags = ~VM_SOFTDIRTY;
  +   vma_enable_writenotify(vma);
 
 That's an example of how the vma_enable_writenotify() interface
 may be confusing.  I thought for a while that that line was unsafe,
 there being quite other reasons why write protection may be needed;
 then realized it's okay because enable is the restrictive one.

Yep, agreed. It'll look like

vma-vm_flags = ~VM_SOFTDIRTY;
vma_set_page_prot(vma);


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-28 Thread Cyrill Gorcunov
On Wed, Aug 27, 2014 at 04:12:43PM -0700, Hugh Dickins wrote:
> > > 
> > > Weak argument to me.
> 
> Yes.  However rarely it's modified, we don't want any chance of it
> corrupting another flag.
> 
> VM_SOFTDIRTY is special in the sense that it's maintained in a very
> different way from the other VM_flags.  If we had a little alignment
> padding space somewhere in struct vm_area_struct, I think I'd jump at
> Kirill's suggestion to move it out of vm_flags and into a new field:
> that would remove some other special casing, like the vma merge issue.
> 
> But I don't think we have such padding space, and we'd prefer not to
> bloat struct vm_area_struct for it; so maybe it should stay for now.
> Besides, with Peter's patch, we're also talking about the locking on
> modifications to vm_page_prot, aren't we?

I think so.

> > > What about walk through vmas twice: first with down_write() to modify
> > > vm_flags and vm_page_prot, then downgrade_write() and do
> > > walk_page_range() on every vma?
> > 
> > I still it's undeeded,
> 
> Yes, so long as nothing else is doing the same.
> No bug yet, that we can see, but a bug in waiting.

:-)

> 
> > but for sure using write-lock/downgrade won't hurt,
> > so no argues from my side.
> 
> Yes, Kirill's two-stage suggestion seems the best:
> 
> down_write
> quickly scan vmas clearing VM_SOFT_DIRTY and updating vm_page_prot
> downgrade_write (or up_write, down_read?)
> slowly walk page tables write protecting and clearing soft-dirty on ptes
> up_read
> 
> But please don't mistake me for someone who has a good grasp of
> soft-dirty: I don't.

Thanks for sharing opinion, Hugh! (And thanks for second email about
vma-flags) So lets move it to Kirill's way, otherwise indeed one day
it might end up in a bug which for sure will not be easy to catch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-28 Thread Cyrill Gorcunov
On Wed, Aug 27, 2014 at 04:12:43PM -0700, Hugh Dickins wrote:
   
   Weak argument to me.
 
 Yes.  However rarely it's modified, we don't want any chance of it
 corrupting another flag.
 
 VM_SOFTDIRTY is special in the sense that it's maintained in a very
 different way from the other VM_flags.  If we had a little alignment
 padding space somewhere in struct vm_area_struct, I think I'd jump at
 Kirill's suggestion to move it out of vm_flags and into a new field:
 that would remove some other special casing, like the vma merge issue.
 
 But I don't think we have such padding space, and we'd prefer not to
 bloat struct vm_area_struct for it; so maybe it should stay for now.
 Besides, with Peter's patch, we're also talking about the locking on
 modifications to vm_page_prot, aren't we?

I think so.

   What about walk through vmas twice: first with down_write() to modify
   vm_flags and vm_page_prot, then downgrade_write() and do
   walk_page_range() on every vma?
  
  I still it's undeeded,
 
 Yes, so long as nothing else is doing the same.
 No bug yet, that we can see, but a bug in waiting.

:-)

 
  but for sure using write-lock/downgrade won't hurt,
  so no argues from my side.
 
 Yes, Kirill's two-stage suggestion seems the best:
 
 down_write
 quickly scan vmas clearing VM_SOFT_DIRTY and updating vm_page_prot
 downgrade_write (or up_write, down_read?)
 slowly walk page tables write protecting and clearing soft-dirty on ptes
 up_read
 
 But please don't mistake me for someone who has a good grasp of
 soft-dirty: I don't.

Thanks for sharing opinion, Hugh! (And thanks for second email about
vma-flags) So lets move it to Kirill's way, otherwise indeed one day
it might end up in a bug which for sure will not be easy to catch.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-27 Thread Hugh Dickins
On Tue, 26 Aug 2014, Cyrill Gorcunov wrote:
> On Tue, Aug 26, 2014 at 06:43:55PM +0300, Kirill A. Shutemov wrote:
> > On Tue, Aug 26, 2014 at 07:18:13PM +0400, Cyrill Gorcunov wrote:
> > > > Basically, it's safe if only soft-dirty is allowed to modify vm_flags
> > > > without down_write(). But why is soft-dirty so special?
> > > 
> > > because how we use this bit, i mean in normal workload this bit won't
> > > be used intensively i think so it's not widespread in kernel code
> > 
> > Weak argument to me.

Yes.  However rarely it's modified, we don't want any chance of it
corrupting another flag.

VM_SOFTDIRTY is special in the sense that it's maintained in a very
different way from the other VM_flags.  If we had a little alignment
padding space somewhere in struct vm_area_struct, I think I'd jump at
Kirill's suggestion to move it out of vm_flags and into a new field:
that would remove some other special casing, like the vma merge issue.

But I don't think we have such padding space, and we'd prefer not to
bloat struct vm_area_struct for it; so maybe it should stay for now.
Besides, with Peter's patch, we're also talking about the locking on
modifications to vm_page_prot, aren't we?

> > 
> > What about walk through vmas twice: first with down_write() to modify
> > vm_flags and vm_page_prot, then downgrade_write() and do
> > walk_page_range() on every vma?
> 
> I still it's undeeded,

Yes, so long as nothing else is doing the same.
No bug yet, that we can see, but a bug in waiting.

> but for sure using write-lock/downgrade won't hurt,
> so no argues from my side.

Yes, Kirill's two-stage suggestion seems the best:

down_write
quickly scan vmas clearing VM_SOFT_DIRTY and updating vm_page_prot
downgrade_write (or up_write, down_read?)
slowly walk page tables write protecting and clearing soft-dirty on ptes
up_read

But please don't mistake me for someone who has a good grasp of
soft-dirty: I don't.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-27 Thread Hugh Dickins
On Tue, 26 Aug 2014, Cyrill Gorcunov wrote:
> On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote:
> > 
> > Hmm.  For a long time I thought you were fixing another important bug
> > with down_write, since we "always" use down_write to modify vm_flags.
> > 
> > But now I'm realizing that if this is the _only_ place which modifies
> > vm_flags with down_read, then it's "probably" safe.  I've a vague
> > feeling that this was discussed before - is that so, Cyrill?
> 
> Well, as far as I remember we were not talking before about vm_flags
> and read-lock in this function, maybe it was on some unrelated lkml thread
> without me CC'ed? Until I miss something obvious using read-lock here
> for vm_flags modification should be safe, since the only thing which is
> important (in context of vma-softdirty) is the vma's presence. Hugh,
> mind to refresh my memory, how long ago the discussion took place?

Sorry for making you think you were losing your mind, Cyrill.

I myself have no recollection of any such conversation with you;
but afraid that I might have lost _my_ memory of it - I didn't want
to get too strident about how fragile (though probably not yet buggy)
this down_read-for-updating-VM_SOFTDIRTY-onlyi is, if there had already
been such a discussion, coming to the conclusion that it is okay for now.

I am fairly sure that I have had some such discussion before; but
probably with someone else, probably still about mmap_sem and vm_flags,
but probably some other VM_flag: the surprising realization that it may
be safe but fragile to use just down_read for updating one particular flag.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-27 Thread Hugh Dickins
On Tue, 26 Aug 2014, Cyrill Gorcunov wrote:
 On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote:
  
  Hmm.  For a long time I thought you were fixing another important bug
  with down_write, since we always use down_write to modify vm_flags.
  
  But now I'm realizing that if this is the _only_ place which modifies
  vm_flags with down_read, then it's probably safe.  I've a vague
  feeling that this was discussed before - is that so, Cyrill?
 
 Well, as far as I remember we were not talking before about vm_flags
 and read-lock in this function, maybe it was on some unrelated lkml thread
 without me CC'ed? Until I miss something obvious using read-lock here
 for vm_flags modification should be safe, since the only thing which is
 important (in context of vma-softdirty) is the vma's presence. Hugh,
 mind to refresh my memory, how long ago the discussion took place?

Sorry for making you think you were losing your mind, Cyrill.

I myself have no recollection of any such conversation with you;
but afraid that I might have lost _my_ memory of it - I didn't want
to get too strident about how fragile (though probably not yet buggy)
this down_read-for-updating-VM_SOFTDIRTY-onlyi is, if there had already
been such a discussion, coming to the conclusion that it is okay for now.

I am fairly sure that I have had some such discussion before; but
probably with someone else, probably still about mmap_sem and vm_flags,
but probably some other VM_flag: the surprising realization that it may
be safe but fragile to use just down_read for updating one particular flag.

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-27 Thread Hugh Dickins
On Tue, 26 Aug 2014, Cyrill Gorcunov wrote:
 On Tue, Aug 26, 2014 at 06:43:55PM +0300, Kirill A. Shutemov wrote:
  On Tue, Aug 26, 2014 at 07:18:13PM +0400, Cyrill Gorcunov wrote:
Basically, it's safe if only soft-dirty is allowed to modify vm_flags
without down_write(). But why is soft-dirty so special?
   
   because how we use this bit, i mean in normal workload this bit won't
   be used intensively i think so it's not widespread in kernel code
  
  Weak argument to me.

Yes.  However rarely it's modified, we don't want any chance of it
corrupting another flag.

VM_SOFTDIRTY is special in the sense that it's maintained in a very
different way from the other VM_flags.  If we had a little alignment
padding space somewhere in struct vm_area_struct, I think I'd jump at
Kirill's suggestion to move it out of vm_flags and into a new field:
that would remove some other special casing, like the vma merge issue.

But I don't think we have such padding space, and we'd prefer not to
bloat struct vm_area_struct for it; so maybe it should stay for now.
Besides, with Peter's patch, we're also talking about the locking on
modifications to vm_page_prot, aren't we?

  
  What about walk through vmas twice: first with down_write() to modify
  vm_flags and vm_page_prot, then downgrade_write() and do
  walk_page_range() on every vma?
 
 I still it's undeeded,

Yes, so long as nothing else is doing the same.
No bug yet, that we can see, but a bug in waiting.

 but for sure using write-lock/downgrade won't hurt,
 so no argues from my side.

Yes, Kirill's two-stage suggestion seems the best:

down_write
quickly scan vmas clearing VM_SOFT_DIRTY and updating vm_page_prot
downgrade_write (or up_write, down_read?)
slowly walk page tables write protecting and clearing soft-dirty on ptes
up_read

But please don't mistake me for someone who has a good grasp of
soft-dirty: I don't.

Hugh
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-26 Thread Cyrill Gorcunov
On Tue, Aug 26, 2014 at 06:43:55PM +0300, Kirill A. Shutemov wrote:
> On Tue, Aug 26, 2014 at 07:18:13PM +0400, Cyrill Gorcunov wrote:
> > > Basically, it's safe if only soft-dirty is allowed to modify vm_flags
> > > without down_write(). But why is soft-dirty so special?
> > 
> > because how we use this bit, i mean in normal workload this bit won't
> > be used intensively i think so it's not widespread in kernel code
> 
> Weak argument to me.
> 
> What about walk through vmas twice: first with down_write() to modify
> vm_flags and vm_page_prot, then downgrade_write() and do
> walk_page_range() on every vma?

I still it's undeeded, but for sure using write-lock/downgrade won't hurt,
so no argues from my side.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-26 Thread Kirill A. Shutemov
On Tue, Aug 26, 2014 at 07:18:13PM +0400, Cyrill Gorcunov wrote:
> > Basically, it's safe if only soft-dirty is allowed to modify vm_flags
> > without down_write(). But why is soft-dirty so special?
> 
> because how we use this bit, i mean in normal workload this bit won't
> be used intensively i think so it's not widespread in kernel code

Weak argument to me.

What about walk through vmas twice: first with down_write() to modify
vm_flags and vm_page_prot, then downgrade_write() and do
walk_page_range() on every vma?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-26 Thread Cyrill Gorcunov
On Tue, Aug 26, 2014 at 05:56:12PM +0300, Kirill A. Shutemov wrote:
> > > 
> > > It seems safe in vma-softdirty context. But if somebody else will decide 
> > > that
> > > it's fine to modify vm_flags without down_write (in their context), we
> > > will get trouble. Sasha will come with weird bug report one day ;)
> > > 
> > > At least vm_flags must be updated atomically to avoid race in middle of
> > > load-modify-store.
> > 
> > Which race you mean here? Two concurrent clear-refs?
> 
> Two concurent clear-refs is fine. But if somebody else will exploit the
> same approch to set/clear other VM_FOO and it will race with clear-refs
> we get trouble: some modifications can be lost.

yup, i see

> Basically, it's safe if only soft-dirty is allowed to modify vm_flags
> without down_write(). But why is soft-dirty so special?

because how we use this bit, i mean in normal workload this bit won't
be used intensively i think so it's not widespread in kernel code

> Should we consider moving protection of some vma fields under per-vma lock
> rather use over-loaded mmap_sem?

Hard to say, if vma-softdirty bit is the reason then I guess no, probably
it worth to estimate how much profit we would have if using per-vma lock.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-26 Thread Kirill A. Shutemov
On Tue, Aug 26, 2014 at 06:19:14PM +0400, Cyrill Gorcunov wrote:
> On Tue, Aug 26, 2014 at 05:04:19PM +0300, Kirill A. Shutemov wrote:
> > > > 
> > > > But now I'm realizing that if this is the _only_ place which modifies
> > > > vm_flags with down_read, then it's "probably" safe.  I've a vague
> > > > feeling that this was discussed before - is that so, Cyrill?
> > > 
> > > Well, as far as I remember we were not talking before about vm_flags
> > > and read-lock in this function, maybe it was on some unrelated lkml thread
> > > without me CC'ed? Until I miss something obvious using read-lock here
> > > for vm_flags modification should be safe, since the only thing which is
> > > important (in context of vma-softdirty) is the vma's presence. Hugh,
> > > mind to refresh my memory, how long ago the discussion took place?
> > 
> > It seems safe in vma-softdirty context. But if somebody else will decide 
> > that
> > it's fine to modify vm_flags without down_write (in their context), we
> > will get trouble. Sasha will come with weird bug report one day ;)
> > 
> > At least vm_flags must be updated atomically to avoid race in middle of
> > load-modify-store.
> 
> Which race you mean here? Two concurrent clear-refs?

Two concurent clear-refs is fine. But if somebody else will exploit the
same approch to set/clear other VM_FOO and it will race with clear-refs
we get trouble: some modifications can be lost.

Basically, it's safe if only soft-dirty is allowed to modify vm_flags
without down_write(). But why is soft-dirty so special?

Should we consider moving protection of some vma fields under per-vma lock
rather use over-loaded mmap_sem?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-26 Thread Cyrill Gorcunov
On Tue, Aug 26, 2014 at 05:04:19PM +0300, Kirill A. Shutemov wrote:
> > > 
> > > But now I'm realizing that if this is the _only_ place which modifies
> > > vm_flags with down_read, then it's "probably" safe.  I've a vague
> > > feeling that this was discussed before - is that so, Cyrill?
> > 
> > Well, as far as I remember we were not talking before about vm_flags
> > and read-lock in this function, maybe it was on some unrelated lkml thread
> > without me CC'ed? Until I miss something obvious using read-lock here
> > for vm_flags modification should be safe, since the only thing which is
> > important (in context of vma-softdirty) is the vma's presence. Hugh,
> > mind to refresh my memory, how long ago the discussion took place?
> 
> It seems safe in vma-softdirty context. But if somebody else will decide that
> it's fine to modify vm_flags without down_write (in their context), we
> will get trouble. Sasha will come with weird bug report one day ;)
> 
> At least vm_flags must be updated atomically to avoid race in middle of
> load-modify-store.

Which race you mean here? Two concurrent clear-refs?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-26 Thread Kirill A. Shutemov
On Tue, Aug 26, 2014 at 10:49:52AM +0400, Cyrill Gorcunov wrote:
> On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote:
> > > +static int clear_refs(struct mm_struct *mm, enum clear_refs_types type,
> > > +  int write)
> > > +{
> ...
> > > +
> > > + if (write)
> > > + down_write(>mmap_sem);
> > > + else
> > > + down_read(>mmap_sem);
> > > +
> > > + if (type == CLEAR_REFS_SOFT_DIRTY)
> > > + mmu_notifier_invalidate_range_start(mm, 0, -1);
> > > +
> > > + for (vma = mm->mmap; vma; vma = vma->vm_next) {
> > > + cp.vma = vma;
> > > + if (is_vm_hugetlb_page(vma))
> > > + continue;
> ...
> > > + if (type == CLEAR_REFS_ANON && vma->vm_file)
> > > + continue;
> > > + if (type == CLEAR_REFS_MAPPED && !vma->vm_file)
> > > + continue;
> > > + if (type == CLEAR_REFS_SOFT_DIRTY &&
> > > + (vma->vm_flags & VM_SOFTDIRTY)) {
> > > + if (!write) {
> > > + r = -EAGAIN;
> > > + break;
> > 
> > Hmm.  For a long time I thought you were fixing another important bug
> > with down_write, since we "always" use down_write to modify vm_flags.
> > 
> > But now I'm realizing that if this is the _only_ place which modifies
> > vm_flags with down_read, then it's "probably" safe.  I've a vague
> > feeling that this was discussed before - is that so, Cyrill?
> 
> Well, as far as I remember we were not talking before about vm_flags
> and read-lock in this function, maybe it was on some unrelated lkml thread
> without me CC'ed? Until I miss something obvious using read-lock here
> for vm_flags modification should be safe, since the only thing which is
> important (in context of vma-softdirty) is the vma's presence. Hugh,
> mind to refresh my memory, how long ago the discussion took place?

It seems safe in vma-softdirty context. But if somebody else will decide that
it's fine to modify vm_flags without down_write (in their context), we
will get trouble. Sasha will come with weird bug report one day ;)

At least vm_flags must be updated atomically to avoid race in middle of
load-modify-store.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-26 Thread Cyrill Gorcunov
On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote:
> > +static int clear_refs(struct mm_struct *mm, enum clear_refs_types type,
> > +  int write)
> > +{
...
> > +
> > +   if (write)
> > +   down_write(>mmap_sem);
> > +   else
> > +   down_read(>mmap_sem);
> > +
> > +   if (type == CLEAR_REFS_SOFT_DIRTY)
> > +   mmu_notifier_invalidate_range_start(mm, 0, -1);
> > +
> > +   for (vma = mm->mmap; vma; vma = vma->vm_next) {
> > +   cp.vma = vma;
> > +   if (is_vm_hugetlb_page(vma))
> > +   continue;
...
> > +   if (type == CLEAR_REFS_ANON && vma->vm_file)
> > +   continue;
> > +   if (type == CLEAR_REFS_MAPPED && !vma->vm_file)
> > +   continue;
> > +   if (type == CLEAR_REFS_SOFT_DIRTY &&
> > +   (vma->vm_flags & VM_SOFTDIRTY)) {
> > +   if (!write) {
> > +   r = -EAGAIN;
> > +   break;
> 
> Hmm.  For a long time I thought you were fixing another important bug
> with down_write, since we "always" use down_write to modify vm_flags.
> 
> But now I'm realizing that if this is the _only_ place which modifies
> vm_flags with down_read, then it's "probably" safe.  I've a vague
> feeling that this was discussed before - is that so, Cyrill?

Well, as far as I remember we were not talking before about vm_flags
and read-lock in this function, maybe it was on some unrelated lkml thread
without me CC'ed? Until I miss something obvious using read-lock here
for vm_flags modification should be safe, since the only thing which is
important (in context of vma-softdirty) is the vma's presence. Hugh,
mind to refresh my memory, how long ago the discussion took place?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-26 Thread Cyrill Gorcunov
On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote:
  +static int clear_refs(struct mm_struct *mm, enum clear_refs_types type,
  +  int write)
  +{
...
  +
  +   if (write)
  +   down_write(mm-mmap_sem);
  +   else
  +   down_read(mm-mmap_sem);
  +
  +   if (type == CLEAR_REFS_SOFT_DIRTY)
  +   mmu_notifier_invalidate_range_start(mm, 0, -1);
  +
  +   for (vma = mm-mmap; vma; vma = vma-vm_next) {
  +   cp.vma = vma;
  +   if (is_vm_hugetlb_page(vma))
  +   continue;
...
  +   if (type == CLEAR_REFS_ANON  vma-vm_file)
  +   continue;
  +   if (type == CLEAR_REFS_MAPPED  !vma-vm_file)
  +   continue;
  +   if (type == CLEAR_REFS_SOFT_DIRTY 
  +   (vma-vm_flags  VM_SOFTDIRTY)) {
  +   if (!write) {
  +   r = -EAGAIN;
  +   break;
 
 Hmm.  For a long time I thought you were fixing another important bug
 with down_write, since we always use down_write to modify vm_flags.
 
 But now I'm realizing that if this is the _only_ place which modifies
 vm_flags with down_read, then it's probably safe.  I've a vague
 feeling that this was discussed before - is that so, Cyrill?

Well, as far as I remember we were not talking before about vm_flags
and read-lock in this function, maybe it was on some unrelated lkml thread
without me CC'ed? Until I miss something obvious using read-lock here
for vm_flags modification should be safe, since the only thing which is
important (in context of vma-softdirty) is the vma's presence. Hugh,
mind to refresh my memory, how long ago the discussion took place?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-26 Thread Kirill A. Shutemov
On Tue, Aug 26, 2014 at 10:49:52AM +0400, Cyrill Gorcunov wrote:
 On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote:
   +static int clear_refs(struct mm_struct *mm, enum clear_refs_types type,
   +  int write)
   +{
 ...
   +
   + if (write)
   + down_write(mm-mmap_sem);
   + else
   + down_read(mm-mmap_sem);
   +
   + if (type == CLEAR_REFS_SOFT_DIRTY)
   + mmu_notifier_invalidate_range_start(mm, 0, -1);
   +
   + for (vma = mm-mmap; vma; vma = vma-vm_next) {
   + cp.vma = vma;
   + if (is_vm_hugetlb_page(vma))
   + continue;
 ...
   + if (type == CLEAR_REFS_ANON  vma-vm_file)
   + continue;
   + if (type == CLEAR_REFS_MAPPED  !vma-vm_file)
   + continue;
   + if (type == CLEAR_REFS_SOFT_DIRTY 
   + (vma-vm_flags  VM_SOFTDIRTY)) {
   + if (!write) {
   + r = -EAGAIN;
   + break;
  
  Hmm.  For a long time I thought you were fixing another important bug
  with down_write, since we always use down_write to modify vm_flags.
  
  But now I'm realizing that if this is the _only_ place which modifies
  vm_flags with down_read, then it's probably safe.  I've a vague
  feeling that this was discussed before - is that so, Cyrill?
 
 Well, as far as I remember we were not talking before about vm_flags
 and read-lock in this function, maybe it was on some unrelated lkml thread
 without me CC'ed? Until I miss something obvious using read-lock here
 for vm_flags modification should be safe, since the only thing which is
 important (in context of vma-softdirty) is the vma's presence. Hugh,
 mind to refresh my memory, how long ago the discussion took place?

It seems safe in vma-softdirty context. But if somebody else will decide that
it's fine to modify vm_flags without down_write (in their context), we
will get trouble. Sasha will come with weird bug report one day ;)

At least vm_flags must be updated atomically to avoid race in middle of
load-modify-store.

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-26 Thread Cyrill Gorcunov
On Tue, Aug 26, 2014 at 05:04:19PM +0300, Kirill A. Shutemov wrote:
   
   But now I'm realizing that if this is the _only_ place which modifies
   vm_flags with down_read, then it's probably safe.  I've a vague
   feeling that this was discussed before - is that so, Cyrill?
  
  Well, as far as I remember we were not talking before about vm_flags
  and read-lock in this function, maybe it was on some unrelated lkml thread
  without me CC'ed? Until I miss something obvious using read-lock here
  for vm_flags modification should be safe, since the only thing which is
  important (in context of vma-softdirty) is the vma's presence. Hugh,
  mind to refresh my memory, how long ago the discussion took place?
 
 It seems safe in vma-softdirty context. But if somebody else will decide that
 it's fine to modify vm_flags without down_write (in their context), we
 will get trouble. Sasha will come with weird bug report one day ;)
 
 At least vm_flags must be updated atomically to avoid race in middle of
 load-modify-store.

Which race you mean here? Two concurrent clear-refs?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-26 Thread Kirill A. Shutemov
On Tue, Aug 26, 2014 at 06:19:14PM +0400, Cyrill Gorcunov wrote:
 On Tue, Aug 26, 2014 at 05:04:19PM +0300, Kirill A. Shutemov wrote:

But now I'm realizing that if this is the _only_ place which modifies
vm_flags with down_read, then it's probably safe.  I've a vague
feeling that this was discussed before - is that so, Cyrill?
   
   Well, as far as I remember we were not talking before about vm_flags
   and read-lock in this function, maybe it was on some unrelated lkml thread
   without me CC'ed? Until I miss something obvious using read-lock here
   for vm_flags modification should be safe, since the only thing which is
   important (in context of vma-softdirty) is the vma's presence. Hugh,
   mind to refresh my memory, how long ago the discussion took place?
  
  It seems safe in vma-softdirty context. But if somebody else will decide 
  that
  it's fine to modify vm_flags without down_write (in their context), we
  will get trouble. Sasha will come with weird bug report one day ;)
  
  At least vm_flags must be updated atomically to avoid race in middle of
  load-modify-store.
 
 Which race you mean here? Two concurrent clear-refs?

Two concurent clear-refs is fine. But if somebody else will exploit the
same approch to set/clear other VM_FOO and it will race with clear-refs
we get trouble: some modifications can be lost.

Basically, it's safe if only soft-dirty is allowed to modify vm_flags
without down_write(). But why is soft-dirty so special?

Should we consider moving protection of some vma fields under per-vma lock
rather use over-loaded mmap_sem?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-26 Thread Cyrill Gorcunov
On Tue, Aug 26, 2014 at 05:56:12PM +0300, Kirill A. Shutemov wrote:
   
   It seems safe in vma-softdirty context. But if somebody else will decide 
   that
   it's fine to modify vm_flags without down_write (in their context), we
   will get trouble. Sasha will come with weird bug report one day ;)
   
   At least vm_flags must be updated atomically to avoid race in middle of
   load-modify-store.
  
  Which race you mean here? Two concurrent clear-refs?
 
 Two concurent clear-refs is fine. But if somebody else will exploit the
 same approch to set/clear other VM_FOO and it will race with clear-refs
 we get trouble: some modifications can be lost.

yup, i see

 Basically, it's safe if only soft-dirty is allowed to modify vm_flags
 without down_write(). But why is soft-dirty so special?

because how we use this bit, i mean in normal workload this bit won't
be used intensively i think so it's not widespread in kernel code

 Should we consider moving protection of some vma fields under per-vma lock
 rather use over-loaded mmap_sem?

Hard to say, if vma-softdirty bit is the reason then I guess no, probably
it worth to estimate how much profit we would have if using per-vma lock.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-26 Thread Kirill A. Shutemov
On Tue, Aug 26, 2014 at 07:18:13PM +0400, Cyrill Gorcunov wrote:
  Basically, it's safe if only soft-dirty is allowed to modify vm_flags
  without down_write(). But why is soft-dirty so special?
 
 because how we use this bit, i mean in normal workload this bit won't
 be used intensively i think so it's not widespread in kernel code

Weak argument to me.

What about walk through vmas twice: first with down_write() to modify
vm_flags and vm_page_prot, then downgrade_write() and do
walk_page_range() on every vma?

-- 
 Kirill A. Shutemov
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-26 Thread Cyrill Gorcunov
On Tue, Aug 26, 2014 at 06:43:55PM +0300, Kirill A. Shutemov wrote:
 On Tue, Aug 26, 2014 at 07:18:13PM +0400, Cyrill Gorcunov wrote:
   Basically, it's safe if only soft-dirty is allowed to modify vm_flags
   without down_write(). But why is soft-dirty so special?
  
  because how we use this bit, i mean in normal workload this bit won't
  be used intensively i think so it's not widespread in kernel code
 
 Weak argument to me.
 
 What about walk through vmas twice: first with down_write() to modify
 vm_flags and vm_page_prot, then downgrade_write() and do
 walk_page_range() on every vma?

I still it's undeeded, but for sure using write-lock/downgrade won't hurt,
so no argues from my side.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-25 Thread Hugh Dickins
On Sun, 24 Aug 2014, Peter Feiner wrote:

> For VMAs that don't want write notifications, PTEs created for read
> faults have their write bit set. If the read fault happens after
> VM_SOFTDIRTY is cleared, then the PTE's softdirty bit will remain
> clear after subsequent writes.

Good catch.  Worrying that it's not been noticed until now.
But I find quite a lot that needs thinking about in the fix.

> 
> Here's a simple code snippet to demonstrate the bug:
> 
>   char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
>  MAP_ANONYMOUS | MAP_SHARED, -1, 0);
>   system("echo 4 > /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY */
>   assert(*m == '\0'); /* new PTE allows write access */
>   assert(!soft_dirty(x));
>   *m = 'x';   /* should dirty the page */
>   assert(soft_dirty(x));  /* fails */
> 
> With this patch, write notifications are enabled when VM_SOFTDIRTY is
> cleared. Furthermore, to avoid unnecessary faults, write
> notifications are disabled when VM_SOFTDIRTY is reset.

"reset" is often a synonym for "cleared": "whenever VM_SOFTDIRTY is set"?

> 
> As a side effect of enabling and disabling write notifications with
> care, this patch fixes a bug in mprotect where vm_page_prot bits set
> by drivers were zapped on mprotect. An analogous bug was fixed in mmap
> by c9d0bf241451a3ab7d02e1652c22b80cd7d93e8f.


Commit c9d0bf241451 ("mm: uncached vma support with writenotify").
Adding Magnus to the Cc list: I have some doubt as to whether his
bugfix is in fact preserved below, and would like him to check.

> 
> Reported-by: Peter Feiner 
> Suggested-by: Kirill A. Shutemov 
> Reviewed-by: Kirill A. Shutemov 
> Reviewed-by: Cyrill Gorcunov 
> Signed-off-by: Peter Feiner 

I like Kirill's suggestion to approach this via writenotify,
but find the disable/enable rather confusing (partly because
enabling writenotify amounts to disabling write access).
I may be alone in my confusion.

> 
> ---
> 
> v1 -> v2: Instead of checking VM_SOFTDIRTY in the fault handler,
>   enable write notifications on vm_page_prot when we clear
>   VM_SOFTDIRTY.
> 
> v2 -> v3: * Grab the mmap_sem in write mode if any VMAs have
> VM_SOFTDIRTY set. This involved refactoring clear_refs_write
> to make it less unwieldy.
> 
>   * In mprotect, don't inadvertently disable write notifications on 
> VMAs
> that have had VM_SOFTDIRTY cleared
> 
>   * The mprotect fix and mmap cleanup that comprised the
> second and third patches in v2 were swallowed by the main
> patch because of vm_page_prot corner case handling.
> 
> v3 -> v4: Handle !defined(CONFIG_MEM_SOFT_DIRTY): old patch would have
>   enabled write notifications for all VMAs in this case.
> 
> v4 -> v5: IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) instead of #ifdef ...
> ---
>  fs/proc/task_mmu.c | 113 
> +
>  include/linux/mm.h |  14 +++
>  mm/mmap.c  |  24 +---
>  mm/mprotect.c  |   6 +--
>  4 files changed, 97 insertions(+), 60 deletions(-)
> 
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index dfc791c..f5e75c6 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -785,13 +785,80 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned 
> long addr,
>   return 0;
>  }
>  
> +static int clear_refs(struct mm_struct *mm, enum clear_refs_types type,
> +  int write)
> +{
> + int r = 0;
> + struct vm_area_struct *vma;
> + struct clear_refs_private cp = {
> + .type = type,
> + };
> + struct mm_walk clear_refs_walk = {
> + .pmd_entry = clear_refs_pte_range,
> + .mm = mm,
> + .private = ,
> + };
> +
> + if (write)
> + down_write(>mmap_sem);
> + else
> + down_read(>mmap_sem);
> +
> + if (type == CLEAR_REFS_SOFT_DIRTY)
> + mmu_notifier_invalidate_range_start(mm, 0, -1);
> +
> + for (vma = mm->mmap; vma; vma = vma->vm_next) {
> + cp.vma = vma;
> + if (is_vm_hugetlb_page(vma))
> + continue;
> + /*
> +  * Writing 1 to /proc/pid/clear_refs affects all pages.
> +  *
> +  * Writing 2 to /proc/pid/clear_refs only affects
> +  * Anonymous pages.
> +  *
> +  * Writing 3 to /proc/pid/clear_refs only affects file
> +  * mapped pages.
> +  *
> +  * Writing 4 to /proc/pid/clear_refs affects all pages.
> +  */
> + if (type == CLEAR_REFS_ANON && vma->vm_file)
> + continue;
> + if (type == CLEAR_REFS_MAPPED && !vma->vm_file)
> + continue;
> + if (type == CLEAR_REFS_SOFT_DIRTY &&
> + (vma->vm_flags & VM_SOFTDIRTY)) {
> + if (!write) {
> +

Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-25 Thread Hugh Dickins
On Sun, 24 Aug 2014, Peter Feiner wrote:

 For VMAs that don't want write notifications, PTEs created for read
 faults have their write bit set. If the read fault happens after
 VM_SOFTDIRTY is cleared, then the PTE's softdirty bit will remain
 clear after subsequent writes.

Good catch.  Worrying that it's not been noticed until now.
But I find quite a lot that needs thinking about in the fix.

 
 Here's a simple code snippet to demonstrate the bug:
 
   char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
  MAP_ANONYMOUS | MAP_SHARED, -1, 0);
   system(echo 4  /proc/$PPID/clear_refs); /* clear VM_SOFTDIRTY */
   assert(*m == '\0'); /* new PTE allows write access */
   assert(!soft_dirty(x));
   *m = 'x';   /* should dirty the page */
   assert(soft_dirty(x));  /* fails */
 
 With this patch, write notifications are enabled when VM_SOFTDIRTY is
 cleared. Furthermore, to avoid unnecessary faults, write
 notifications are disabled when VM_SOFTDIRTY is reset.

reset is often a synonym for cleared: whenever VM_SOFTDIRTY is set?

 
 As a side effect of enabling and disabling write notifications with
 care, this patch fixes a bug in mprotect where vm_page_prot bits set
 by drivers were zapped on mprotect. An analogous bug was fixed in mmap
 by c9d0bf241451a3ab7d02e1652c22b80cd7d93e8f.


Commit c9d0bf241451 (mm: uncached vma support with writenotify).
Adding Magnus to the Cc list: I have some doubt as to whether his
bugfix is in fact preserved below, and would like him to check.

 
 Reported-by: Peter Feiner pfei...@google.com
 Suggested-by: Kirill A. Shutemov kirill.shute...@linux.intel.com
 Reviewed-by: Kirill A. Shutemov kirill.shute...@linux.intel.com
 Reviewed-by: Cyrill Gorcunov gorcu...@openvz.org
 Signed-off-by: Peter Feiner pfei...@google.com

I like Kirill's suggestion to approach this via writenotify,
but find the disable/enable rather confusing (partly because
enabling writenotify amounts to disabling write access).
I may be alone in my confusion.

 
 ---
 
 v1 - v2: Instead of checking VM_SOFTDIRTY in the fault handler,
   enable write notifications on vm_page_prot when we clear
   VM_SOFTDIRTY.
 
 v2 - v3: * Grab the mmap_sem in write mode if any VMAs have
 VM_SOFTDIRTY set. This involved refactoring clear_refs_write
 to make it less unwieldy.
 
   * In mprotect, don't inadvertently disable write notifications on 
 VMAs
 that have had VM_SOFTDIRTY cleared
 
   * The mprotect fix and mmap cleanup that comprised the
 second and third patches in v2 were swallowed by the main
 patch because of vm_page_prot corner case handling.
 
 v3 - v4: Handle !defined(CONFIG_MEM_SOFT_DIRTY): old patch would have
   enabled write notifications for all VMAs in this case.
 
 v4 - v5: IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) instead of #ifdef ...
 ---
  fs/proc/task_mmu.c | 113 
 +
  include/linux/mm.h |  14 +++
  mm/mmap.c  |  24 +---
  mm/mprotect.c  |   6 +--
  4 files changed, 97 insertions(+), 60 deletions(-)
 
 diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
 index dfc791c..f5e75c6 100644
 --- a/fs/proc/task_mmu.c
 +++ b/fs/proc/task_mmu.c
 @@ -785,13 +785,80 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned 
 long addr,
   return 0;
  }
  
 +static int clear_refs(struct mm_struct *mm, enum clear_refs_types type,
 +  int write)
 +{
 + int r = 0;
 + struct vm_area_struct *vma;
 + struct clear_refs_private cp = {
 + .type = type,
 + };
 + struct mm_walk clear_refs_walk = {
 + .pmd_entry = clear_refs_pte_range,
 + .mm = mm,
 + .private = cp,
 + };
 +
 + if (write)
 + down_write(mm-mmap_sem);
 + else
 + down_read(mm-mmap_sem);
 +
 + if (type == CLEAR_REFS_SOFT_DIRTY)
 + mmu_notifier_invalidate_range_start(mm, 0, -1);
 +
 + for (vma = mm-mmap; vma; vma = vma-vm_next) {
 + cp.vma = vma;
 + if (is_vm_hugetlb_page(vma))
 + continue;
 + /*
 +  * Writing 1 to /proc/pid/clear_refs affects all pages.
 +  *
 +  * Writing 2 to /proc/pid/clear_refs only affects
 +  * Anonymous pages.
 +  *
 +  * Writing 3 to /proc/pid/clear_refs only affects file
 +  * mapped pages.
 +  *
 +  * Writing 4 to /proc/pid/clear_refs affects all pages.
 +  */
 + if (type == CLEAR_REFS_ANON  vma-vm_file)
 + continue;
 + if (type == CLEAR_REFS_MAPPED  !vma-vm_file)
 + continue;
 + if (type == CLEAR_REFS_SOFT_DIRTY 
 + (vma-vm_flags  VM_SOFTDIRTY)) {
 + if (!write) {
 + r = -EAGAIN;

[PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-24 Thread Peter Feiner
For VMAs that don't want write notifications, PTEs created for read
faults have their write bit set. If the read fault happens after
VM_SOFTDIRTY is cleared, then the PTE's softdirty bit will remain
clear after subsequent writes.

Here's a simple code snippet to demonstrate the bug:

  char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
 MAP_ANONYMOUS | MAP_SHARED, -1, 0);
  system("echo 4 > /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY */
  assert(*m == '\0'); /* new PTE allows write access */
  assert(!soft_dirty(x));
  *m = 'x';   /* should dirty the page */
  assert(soft_dirty(x));  /* fails */

With this patch, write notifications are enabled when VM_SOFTDIRTY is
cleared. Furthermore, to avoid unnecessary faults, write
notifications are disabled when VM_SOFTDIRTY is reset.

As a side effect of enabling and disabling write notifications with
care, this patch fixes a bug in mprotect where vm_page_prot bits set
by drivers were zapped on mprotect. An analogous bug was fixed in mmap
by c9d0bf241451a3ab7d02e1652c22b80cd7d93e8f.

Reported-by: Peter Feiner 
Suggested-by: Kirill A. Shutemov 
Reviewed-by: Kirill A. Shutemov 
Reviewed-by: Cyrill Gorcunov 
Signed-off-by: Peter Feiner 

---

v1 -> v2: Instead of checking VM_SOFTDIRTY in the fault handler,
  enable write notifications on vm_page_prot when we clear
  VM_SOFTDIRTY.

v2 -> v3: * Grab the mmap_sem in write mode if any VMAs have
VM_SOFTDIRTY set. This involved refactoring clear_refs_write
to make it less unwieldy.

  * In mprotect, don't inadvertently disable write notifications on VMAs
that have had VM_SOFTDIRTY cleared

  * The mprotect fix and mmap cleanup that comprised the
second and third patches in v2 were swallowed by the main
patch because of vm_page_prot corner case handling.

v3 -> v4: Handle !defined(CONFIG_MEM_SOFT_DIRTY): old patch would have
  enabled write notifications for all VMAs in this case.

v4 -> v5: IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) instead of #ifdef ...
---
 fs/proc/task_mmu.c | 113 +
 include/linux/mm.h |  14 +++
 mm/mmap.c  |  24 +---
 mm/mprotect.c  |   6 +--
 4 files changed, 97 insertions(+), 60 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index dfc791c..f5e75c6 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -785,13 +785,80 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long 
addr,
return 0;
 }
 
+static int clear_refs(struct mm_struct *mm, enum clear_refs_types type,
+  int write)
+{
+   int r = 0;
+   struct vm_area_struct *vma;
+   struct clear_refs_private cp = {
+   .type = type,
+   };
+   struct mm_walk clear_refs_walk = {
+   .pmd_entry = clear_refs_pte_range,
+   .mm = mm,
+   .private = ,
+   };
+
+   if (write)
+   down_write(>mmap_sem);
+   else
+   down_read(>mmap_sem);
+
+   if (type == CLEAR_REFS_SOFT_DIRTY)
+   mmu_notifier_invalidate_range_start(mm, 0, -1);
+
+   for (vma = mm->mmap; vma; vma = vma->vm_next) {
+   cp.vma = vma;
+   if (is_vm_hugetlb_page(vma))
+   continue;
+   /*
+* Writing 1 to /proc/pid/clear_refs affects all pages.
+*
+* Writing 2 to /proc/pid/clear_refs only affects
+* Anonymous pages.
+*
+* Writing 3 to /proc/pid/clear_refs only affects file
+* mapped pages.
+*
+* Writing 4 to /proc/pid/clear_refs affects all pages.
+*/
+   if (type == CLEAR_REFS_ANON && vma->vm_file)
+   continue;
+   if (type == CLEAR_REFS_MAPPED && !vma->vm_file)
+   continue;
+   if (type == CLEAR_REFS_SOFT_DIRTY &&
+   (vma->vm_flags & VM_SOFTDIRTY)) {
+   if (!write) {
+   r = -EAGAIN;
+   break;
+   }
+   vma->vm_flags &= ~VM_SOFTDIRTY;
+   vma_enable_writenotify(vma);
+   }
+   walk_page_range(vma->vm_start, vma->vm_end,
+   _refs_walk);
+   }
+
+   if (type == CLEAR_REFS_SOFT_DIRTY)
+   mmu_notifier_invalidate_range_end(mm, 0, -1);
+
+   if (!r)
+   flush_tlb_mm(mm);
+
+   if (write)
+   up_write(>mmap_sem);
+   else
+   up_read(>mmap_sem);
+
+   return r;
+}
+
 static ssize_t clear_refs_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
 {
struct task_struct *task;
char 

[PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared

2014-08-24 Thread Peter Feiner
For VMAs that don't want write notifications, PTEs created for read
faults have their write bit set. If the read fault happens after
VM_SOFTDIRTY is cleared, then the PTE's softdirty bit will remain
clear after subsequent writes.

Here's a simple code snippet to demonstrate the bug:

  char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
 MAP_ANONYMOUS | MAP_SHARED, -1, 0);
  system(echo 4  /proc/$PPID/clear_refs); /* clear VM_SOFTDIRTY */
  assert(*m == '\0'); /* new PTE allows write access */
  assert(!soft_dirty(x));
  *m = 'x';   /* should dirty the page */
  assert(soft_dirty(x));  /* fails */

With this patch, write notifications are enabled when VM_SOFTDIRTY is
cleared. Furthermore, to avoid unnecessary faults, write
notifications are disabled when VM_SOFTDIRTY is reset.

As a side effect of enabling and disabling write notifications with
care, this patch fixes a bug in mprotect where vm_page_prot bits set
by drivers were zapped on mprotect. An analogous bug was fixed in mmap
by c9d0bf241451a3ab7d02e1652c22b80cd7d93e8f.

Reported-by: Peter Feiner pfei...@google.com
Suggested-by: Kirill A. Shutemov kirill.shute...@linux.intel.com
Reviewed-by: Kirill A. Shutemov kirill.shute...@linux.intel.com
Reviewed-by: Cyrill Gorcunov gorcu...@openvz.org
Signed-off-by: Peter Feiner pfei...@google.com

---

v1 - v2: Instead of checking VM_SOFTDIRTY in the fault handler,
  enable write notifications on vm_page_prot when we clear
  VM_SOFTDIRTY.

v2 - v3: * Grab the mmap_sem in write mode if any VMAs have
VM_SOFTDIRTY set. This involved refactoring clear_refs_write
to make it less unwieldy.

  * In mprotect, don't inadvertently disable write notifications on VMAs
that have had VM_SOFTDIRTY cleared

  * The mprotect fix and mmap cleanup that comprised the
second and third patches in v2 were swallowed by the main
patch because of vm_page_prot corner case handling.

v3 - v4: Handle !defined(CONFIG_MEM_SOFT_DIRTY): old patch would have
  enabled write notifications for all VMAs in this case.

v4 - v5: IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) instead of #ifdef ...
---
 fs/proc/task_mmu.c | 113 +
 include/linux/mm.h |  14 +++
 mm/mmap.c  |  24 +---
 mm/mprotect.c  |   6 +--
 4 files changed, 97 insertions(+), 60 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index dfc791c..f5e75c6 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -785,13 +785,80 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long 
addr,
return 0;
 }
 
+static int clear_refs(struct mm_struct *mm, enum clear_refs_types type,
+  int write)
+{
+   int r = 0;
+   struct vm_area_struct *vma;
+   struct clear_refs_private cp = {
+   .type = type,
+   };
+   struct mm_walk clear_refs_walk = {
+   .pmd_entry = clear_refs_pte_range,
+   .mm = mm,
+   .private = cp,
+   };
+
+   if (write)
+   down_write(mm-mmap_sem);
+   else
+   down_read(mm-mmap_sem);
+
+   if (type == CLEAR_REFS_SOFT_DIRTY)
+   mmu_notifier_invalidate_range_start(mm, 0, -1);
+
+   for (vma = mm-mmap; vma; vma = vma-vm_next) {
+   cp.vma = vma;
+   if (is_vm_hugetlb_page(vma))
+   continue;
+   /*
+* Writing 1 to /proc/pid/clear_refs affects all pages.
+*
+* Writing 2 to /proc/pid/clear_refs only affects
+* Anonymous pages.
+*
+* Writing 3 to /proc/pid/clear_refs only affects file
+* mapped pages.
+*
+* Writing 4 to /proc/pid/clear_refs affects all pages.
+*/
+   if (type == CLEAR_REFS_ANON  vma-vm_file)
+   continue;
+   if (type == CLEAR_REFS_MAPPED  !vma-vm_file)
+   continue;
+   if (type == CLEAR_REFS_SOFT_DIRTY 
+   (vma-vm_flags  VM_SOFTDIRTY)) {
+   if (!write) {
+   r = -EAGAIN;
+   break;
+   }
+   vma-vm_flags = ~VM_SOFTDIRTY;
+   vma_enable_writenotify(vma);
+   }
+   walk_page_range(vma-vm_start, vma-vm_end,
+   clear_refs_walk);
+   }
+
+   if (type == CLEAR_REFS_SOFT_DIRTY)
+   mmu_notifier_invalidate_range_end(mm, 0, -1);
+
+   if (!r)
+   flush_tlb_mm(mm);
+
+   if (write)
+   up_write(mm-mmap_sem);
+   else
+   up_read(mm-mmap_sem);
+
+   return r;
+}
+
 static ssize_t clear_refs_write(struct file *file, const char __user *buf,