Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Thu, Sep 04, 2014 at 09:43:11AM -0700, Peter Feiner wrote: > On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote: > > That sets me wondering: have you placed the VM_SOFTDIRTY check in the > > right place in this series of tests? > > > > I think, once pgprot_modify() is correct on all architectures, > > it should be possible to drop that pgprot_val() check from > > vma_wants_writenotify() - which would be a welcome simplification. > > > > But what about the VM_PFNMAP test below it? If that test was necessary, > > then having your VM_SOFTDIRTY check before it seems dangerous. But I'm > > hoping we can persuade ourselves that the VM_PFNMAP test was unnecessary, > > and simply delete it. > > If VM_PFNMAP is necessary, then I definitely put the VM_SOFTDIRTY check in the > wrong spot :-) I don't know much (i.e., anything) about VM_PFNMAP, so I'll > have to bone up on a lot of code before I have an informed opinion about the > necessity of the check. AFAICT, the VM_PFNMAP check is unnecessary since I can't find any drivers that set VM_PFNMAP and enable dirty accounting on their mappings. If anything, VM_PFNMAP precludes mapping dirty tracking since set_page_dirty takes a struct_page argument! Perhaps the VM_PFNMAP check was originally put in vma_wants_writenotify as a safeguard against bogus calls to set_page_dirty? In any case, it seems harmless to me to put the VM_SOFTDIRTY check before the VM_PFNMAP check since none of the fault handling code in mm/memory.c calls set_page_dirty on a VM_PFNMAP fault because either vm_normal_page() returns NULL or ->fault() / ->page_mkwrite() return VM_FAULT_NOPAGE. Moreover, for the purpose of softdirty tracking, enabling write notifications on VM_PFNMAP VMAs is OK since do_wp_page does the right thing when vm_normal_page() returns NULL. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Thu, Sep 04, 2014 at 09:43:11AM -0700, Peter Feiner wrote: On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote: That sets me wondering: have you placed the VM_SOFTDIRTY check in the right place in this series of tests? I think, once pgprot_modify() is correct on all architectures, it should be possible to drop that pgprot_val() check from vma_wants_writenotify() - which would be a welcome simplification. But what about the VM_PFNMAP test below it? If that test was necessary, then having your VM_SOFTDIRTY check before it seems dangerous. But I'm hoping we can persuade ourselves that the VM_PFNMAP test was unnecessary, and simply delete it. If VM_PFNMAP is necessary, then I definitely put the VM_SOFTDIRTY check in the wrong spot :-) I don't know much (i.e., anything) about VM_PFNMAP, so I'll have to bone up on a lot of code before I have an informed opinion about the necessity of the check. AFAICT, the VM_PFNMAP check is unnecessary since I can't find any drivers that set VM_PFNMAP and enable dirty accounting on their mappings. If anything, VM_PFNMAP precludes mapping dirty tracking since set_page_dirty takes a struct_page argument! Perhaps the VM_PFNMAP check was originally put in vma_wants_writenotify as a safeguard against bogus calls to set_page_dirty? In any case, it seems harmless to me to put the VM_SOFTDIRTY check before the VM_PFNMAP check since none of the fault handling code in mm/memory.c calls set_page_dirty on a VM_PFNMAP fault because either vm_normal_page() returns NULL or -fault() / -page_mkwrite() return VM_FAULT_NOPAGE. Moreover, for the purpose of softdirty tracking, enabling write notifications on VM_PFNMAP VMAs is OK since do_wp_page does the right thing when vm_normal_page() returns NULL. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote: > On Sun, 24 Aug 2014, Peter Feiner wrote: > > With this patch, write notifications are enabled when VM_SOFTDIRTY is > > cleared. Furthermore, to avoid unnecessary faults, write > > notifications are disabled when VM_SOFTDIRTY is reset. > > "reset" is often a synonym for "cleared": "whenever VM_SOFTDIRTY is set"? Agreed, "set" sounds good. > > As a side effect of enabling and disabling write notifications with > > care, this patch fixes a bug in mprotect where vm_page_prot bits set > > by drivers were zapped on mprotect. An analogous bug was fixed in mmap > > by c9d0bf241451a3ab7d02e1652c22b80cd7d93e8f. > > > Commit c9d0bf241451 ("mm: uncached vma support with writenotify"). > Adding Magnus to the Cc list: I have some doubt as to whether his > bugfix is in fact preserved below, and would like him to check. I believe the fix is preserved as long as pgprot_modify preserves cache flags. As you explain below, pgprot_modify only does this on x86 and tile. So this patch does indeed break c9d0bf241451 on most architectures. Furthermore, as you said below, this patch would break the build on the other architectures :-) > I like Kirill's suggestion to approach this via writenotify, > but find the disable/enable rather confusing (partly because > enabling writenotify amounts to disabling write access). > I may be alone in my confusion. I agree about the confusion. I wasn't too happy with the names myself. Furthermore, I only really use vma_disable_writenotify to set vm_page_prot from vm_flags. So I think the enable / disable idea is pretty broken. As you suggest below, I'm going to give vma_set_page_prot a try. > > + if (type == CLEAR_REFS_SOFT_DIRTY && > > + (vma->vm_flags & VM_SOFTDIRTY)) { > > + if (!write) { > > + r = -EAGAIN; > > + break; > > Hmm. For a long time I thought you were fixing another important bug > with down_write, since we "always" use down_write to modify vm_flags. > > But now I'm realizing that if this is the _only_ place which modifies > vm_flags with down_read, then it's "probably" safe. I've a vague > feeling that this was discussed before - is that so, Cyrill? > > It certainly feels fragile to depend on this; but conversely, I don't > like replacing a long down_read scan by an indefinite down_read scan > followed by a long down_write scan. > > I see that you earlier persuaded yourself that the races are benign > if you stick with down_read. I can't confirm or deny that at present: > seems more important right now to get this mail out to you than think > through that aspect. Your observation is correct: clear_refs_write is the only place that vm_flags is modified without an exclusive lock on mmap_sem. I was wrong about the race between clear_refs_write modifying vm_flags and the fault handler reading vm_flags being benign. I had thought that since clear_refs_write zaps all of the PTEs in the VMA after it modifies vm_flags, it was ok for a writable PTE to be temporarily installed to handle a read fault. However, if a write happened after the read fault and before clear_refs_write zapped the PTE, then we'd miss the write. Therefore I'm convinced that its necessary to serialize changes to vm_flags and fault handling. There are a few ways to accomplish this serialization, all with their pros and cons: * One down_read scan followed by a down_write scan, if necessary. This is the current implementation. Pros: won't take exclusive lock when VMAs haven't changed. Cons: might hold exclusive lock during page table walk. * Per-vma lock, as Cyrill and Kirill were discussing. Pros: handle faults on other VMAs when vm_flags is changing. Cons: another lock acquired in fault path. * Iterate over VMAs and modify vm_flags with down_write, then downgrade to down_read for page table scan, as Kirill suggested. Pros: won't hold exclusive lock during page table walk. Cons: clear_refs_write always grabs exclusive lock. I think the extra lock in the fault handling path rules the per-vma lock out. Whether the first or third approach is better depends on whether or not VMAs are changing, which is obviously application specific behavior. A hybrid approach offers the best of both worlds (i.e., an optimistic down_read scan that bails out if there's a VM_SOFTDIRTY VMA and falls back to the downgrading approach). > > > + } > > + vma->vm_flags &= ~VM_SOFTDIRTY; > > + vma_enable_writenotify(vma); > > That's an example of how the vma_enable_writenotify() interface > may be confusing. I thought for a while that that line was unsafe, > there being quite other reasons why write protection may be needed; > then realized it's okay because "enable" is the restrictive one. Yep, agreed.
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote: On Sun, 24 Aug 2014, Peter Feiner wrote: With this patch, write notifications are enabled when VM_SOFTDIRTY is cleared. Furthermore, to avoid unnecessary faults, write notifications are disabled when VM_SOFTDIRTY is reset. reset is often a synonym for cleared: whenever VM_SOFTDIRTY is set? Agreed, set sounds good. As a side effect of enabling and disabling write notifications with care, this patch fixes a bug in mprotect where vm_page_prot bits set by drivers were zapped on mprotect. An analogous bug was fixed in mmap by c9d0bf241451a3ab7d02e1652c22b80cd7d93e8f. Commit c9d0bf241451 (mm: uncached vma support with writenotify). Adding Magnus to the Cc list: I have some doubt as to whether his bugfix is in fact preserved below, and would like him to check. I believe the fix is preserved as long as pgprot_modify preserves cache flags. As you explain below, pgprot_modify only does this on x86 and tile. So this patch does indeed break c9d0bf241451 on most architectures. Furthermore, as you said below, this patch would break the build on the other architectures :-) I like Kirill's suggestion to approach this via writenotify, but find the disable/enable rather confusing (partly because enabling writenotify amounts to disabling write access). I may be alone in my confusion. I agree about the confusion. I wasn't too happy with the names myself. Furthermore, I only really use vma_disable_writenotify to set vm_page_prot from vm_flags. So I think the enable / disable idea is pretty broken. As you suggest below, I'm going to give vma_set_page_prot a try. + if (type == CLEAR_REFS_SOFT_DIRTY + (vma-vm_flags VM_SOFTDIRTY)) { + if (!write) { + r = -EAGAIN; + break; Hmm. For a long time I thought you were fixing another important bug with down_write, since we always use down_write to modify vm_flags. But now I'm realizing that if this is the _only_ place which modifies vm_flags with down_read, then it's probably safe. I've a vague feeling that this was discussed before - is that so, Cyrill? It certainly feels fragile to depend on this; but conversely, I don't like replacing a long down_read scan by an indefinite down_read scan followed by a long down_write scan. I see that you earlier persuaded yourself that the races are benign if you stick with down_read. I can't confirm or deny that at present: seems more important right now to get this mail out to you than think through that aspect. Your observation is correct: clear_refs_write is the only place that vm_flags is modified without an exclusive lock on mmap_sem. I was wrong about the race between clear_refs_write modifying vm_flags and the fault handler reading vm_flags being benign. I had thought that since clear_refs_write zaps all of the PTEs in the VMA after it modifies vm_flags, it was ok for a writable PTE to be temporarily installed to handle a read fault. However, if a write happened after the read fault and before clear_refs_write zapped the PTE, then we'd miss the write. Therefore I'm convinced that its necessary to serialize changes to vm_flags and fault handling. There are a few ways to accomplish this serialization, all with their pros and cons: * One down_read scan followed by a down_write scan, if necessary. This is the current implementation. Pros: won't take exclusive lock when VMAs haven't changed. Cons: might hold exclusive lock during page table walk. * Per-vma lock, as Cyrill and Kirill were discussing. Pros: handle faults on other VMAs when vm_flags is changing. Cons: another lock acquired in fault path. * Iterate over VMAs and modify vm_flags with down_write, then downgrade to down_read for page table scan, as Kirill suggested. Pros: won't hold exclusive lock during page table walk. Cons: clear_refs_write always grabs exclusive lock. I think the extra lock in the fault handling path rules the per-vma lock out. Whether the first or third approach is better depends on whether or not VMAs are changing, which is obviously application specific behavior. A hybrid approach offers the best of both worlds (i.e., an optimistic down_read scan that bails out if there's a VM_SOFTDIRTY VMA and falls back to the downgrading approach). + } + vma-vm_flags = ~VM_SOFTDIRTY; + vma_enable_writenotify(vma); That's an example of how the vma_enable_writenotify() interface may be confusing. I thought for a while that that line was unsafe, there being quite other reasons why write protection may be needed; then realized it's okay because enable is the restrictive one. Yep, agreed. It'll look like vma-vm_flags = ~VM_SOFTDIRTY; vma_set_page_prot(vma);
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Wed, Aug 27, 2014 at 04:12:43PM -0700, Hugh Dickins wrote: > > > > > > Weak argument to me. > > Yes. However rarely it's modified, we don't want any chance of it > corrupting another flag. > > VM_SOFTDIRTY is special in the sense that it's maintained in a very > different way from the other VM_flags. If we had a little alignment > padding space somewhere in struct vm_area_struct, I think I'd jump at > Kirill's suggestion to move it out of vm_flags and into a new field: > that would remove some other special casing, like the vma merge issue. > > But I don't think we have such padding space, and we'd prefer not to > bloat struct vm_area_struct for it; so maybe it should stay for now. > Besides, with Peter's patch, we're also talking about the locking on > modifications to vm_page_prot, aren't we? I think so. > > > What about walk through vmas twice: first with down_write() to modify > > > vm_flags and vm_page_prot, then downgrade_write() and do > > > walk_page_range() on every vma? > > > > I still it's undeeded, > > Yes, so long as nothing else is doing the same. > No bug yet, that we can see, but a bug in waiting. :-) > > > but for sure using write-lock/downgrade won't hurt, > > so no argues from my side. > > Yes, Kirill's two-stage suggestion seems the best: > > down_write > quickly scan vmas clearing VM_SOFT_DIRTY and updating vm_page_prot > downgrade_write (or up_write, down_read?) > slowly walk page tables write protecting and clearing soft-dirty on ptes > up_read > > But please don't mistake me for someone who has a good grasp of > soft-dirty: I don't. Thanks for sharing opinion, Hugh! (And thanks for second email about vma-flags) So lets move it to Kirill's way, otherwise indeed one day it might end up in a bug which for sure will not be easy to catch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Wed, Aug 27, 2014 at 04:12:43PM -0700, Hugh Dickins wrote: Weak argument to me. Yes. However rarely it's modified, we don't want any chance of it corrupting another flag. VM_SOFTDIRTY is special in the sense that it's maintained in a very different way from the other VM_flags. If we had a little alignment padding space somewhere in struct vm_area_struct, I think I'd jump at Kirill's suggestion to move it out of vm_flags and into a new field: that would remove some other special casing, like the vma merge issue. But I don't think we have such padding space, and we'd prefer not to bloat struct vm_area_struct for it; so maybe it should stay for now. Besides, with Peter's patch, we're also talking about the locking on modifications to vm_page_prot, aren't we? I think so. What about walk through vmas twice: first with down_write() to modify vm_flags and vm_page_prot, then downgrade_write() and do walk_page_range() on every vma? I still it's undeeded, Yes, so long as nothing else is doing the same. No bug yet, that we can see, but a bug in waiting. :-) but for sure using write-lock/downgrade won't hurt, so no argues from my side. Yes, Kirill's two-stage suggestion seems the best: down_write quickly scan vmas clearing VM_SOFT_DIRTY and updating vm_page_prot downgrade_write (or up_write, down_read?) slowly walk page tables write protecting and clearing soft-dirty on ptes up_read But please don't mistake me for someone who has a good grasp of soft-dirty: I don't. Thanks for sharing opinion, Hugh! (And thanks for second email about vma-flags) So lets move it to Kirill's way, otherwise indeed one day it might end up in a bug which for sure will not be easy to catch. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, 26 Aug 2014, Cyrill Gorcunov wrote: > On Tue, Aug 26, 2014 at 06:43:55PM +0300, Kirill A. Shutemov wrote: > > On Tue, Aug 26, 2014 at 07:18:13PM +0400, Cyrill Gorcunov wrote: > > > > Basically, it's safe if only soft-dirty is allowed to modify vm_flags > > > > without down_write(). But why is soft-dirty so special? > > > > > > because how we use this bit, i mean in normal workload this bit won't > > > be used intensively i think so it's not widespread in kernel code > > > > Weak argument to me. Yes. However rarely it's modified, we don't want any chance of it corrupting another flag. VM_SOFTDIRTY is special in the sense that it's maintained in a very different way from the other VM_flags. If we had a little alignment padding space somewhere in struct vm_area_struct, I think I'd jump at Kirill's suggestion to move it out of vm_flags and into a new field: that would remove some other special casing, like the vma merge issue. But I don't think we have such padding space, and we'd prefer not to bloat struct vm_area_struct for it; so maybe it should stay for now. Besides, with Peter's patch, we're also talking about the locking on modifications to vm_page_prot, aren't we? > > > > What about walk through vmas twice: first with down_write() to modify > > vm_flags and vm_page_prot, then downgrade_write() and do > > walk_page_range() on every vma? > > I still it's undeeded, Yes, so long as nothing else is doing the same. No bug yet, that we can see, but a bug in waiting. > but for sure using write-lock/downgrade won't hurt, > so no argues from my side. Yes, Kirill's two-stage suggestion seems the best: down_write quickly scan vmas clearing VM_SOFT_DIRTY and updating vm_page_prot downgrade_write (or up_write, down_read?) slowly walk page tables write protecting and clearing soft-dirty on ptes up_read But please don't mistake me for someone who has a good grasp of soft-dirty: I don't. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, 26 Aug 2014, Cyrill Gorcunov wrote: > On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote: > > > > Hmm. For a long time I thought you were fixing another important bug > > with down_write, since we "always" use down_write to modify vm_flags. > > > > But now I'm realizing that if this is the _only_ place which modifies > > vm_flags with down_read, then it's "probably" safe. I've a vague > > feeling that this was discussed before - is that so, Cyrill? > > Well, as far as I remember we were not talking before about vm_flags > and read-lock in this function, maybe it was on some unrelated lkml thread > without me CC'ed? Until I miss something obvious using read-lock here > for vm_flags modification should be safe, since the only thing which is > important (in context of vma-softdirty) is the vma's presence. Hugh, > mind to refresh my memory, how long ago the discussion took place? Sorry for making you think you were losing your mind, Cyrill. I myself have no recollection of any such conversation with you; but afraid that I might have lost _my_ memory of it - I didn't want to get too strident about how fragile (though probably not yet buggy) this down_read-for-updating-VM_SOFTDIRTY-onlyi is, if there had already been such a discussion, coming to the conclusion that it is okay for now. I am fairly sure that I have had some such discussion before; but probably with someone else, probably still about mmap_sem and vm_flags, but probably some other VM_flag: the surprising realization that it may be safe but fragile to use just down_read for updating one particular flag. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, 26 Aug 2014, Cyrill Gorcunov wrote: On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote: Hmm. For a long time I thought you were fixing another important bug with down_write, since we always use down_write to modify vm_flags. But now I'm realizing that if this is the _only_ place which modifies vm_flags with down_read, then it's probably safe. I've a vague feeling that this was discussed before - is that so, Cyrill? Well, as far as I remember we were not talking before about vm_flags and read-lock in this function, maybe it was on some unrelated lkml thread without me CC'ed? Until I miss something obvious using read-lock here for vm_flags modification should be safe, since the only thing which is important (in context of vma-softdirty) is the vma's presence. Hugh, mind to refresh my memory, how long ago the discussion took place? Sorry for making you think you were losing your mind, Cyrill. I myself have no recollection of any such conversation with you; but afraid that I might have lost _my_ memory of it - I didn't want to get too strident about how fragile (though probably not yet buggy) this down_read-for-updating-VM_SOFTDIRTY-onlyi is, if there had already been such a discussion, coming to the conclusion that it is okay for now. I am fairly sure that I have had some such discussion before; but probably with someone else, probably still about mmap_sem and vm_flags, but probably some other VM_flag: the surprising realization that it may be safe but fragile to use just down_read for updating one particular flag. Hugh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, 26 Aug 2014, Cyrill Gorcunov wrote: On Tue, Aug 26, 2014 at 06:43:55PM +0300, Kirill A. Shutemov wrote: On Tue, Aug 26, 2014 at 07:18:13PM +0400, Cyrill Gorcunov wrote: Basically, it's safe if only soft-dirty is allowed to modify vm_flags without down_write(). But why is soft-dirty so special? because how we use this bit, i mean in normal workload this bit won't be used intensively i think so it's not widespread in kernel code Weak argument to me. Yes. However rarely it's modified, we don't want any chance of it corrupting another flag. VM_SOFTDIRTY is special in the sense that it's maintained in a very different way from the other VM_flags. If we had a little alignment padding space somewhere in struct vm_area_struct, I think I'd jump at Kirill's suggestion to move it out of vm_flags and into a new field: that would remove some other special casing, like the vma merge issue. But I don't think we have such padding space, and we'd prefer not to bloat struct vm_area_struct for it; so maybe it should stay for now. Besides, with Peter's patch, we're also talking about the locking on modifications to vm_page_prot, aren't we? What about walk through vmas twice: first with down_write() to modify vm_flags and vm_page_prot, then downgrade_write() and do walk_page_range() on every vma? I still it's undeeded, Yes, so long as nothing else is doing the same. No bug yet, that we can see, but a bug in waiting. but for sure using write-lock/downgrade won't hurt, so no argues from my side. Yes, Kirill's two-stage suggestion seems the best: down_write quickly scan vmas clearing VM_SOFT_DIRTY and updating vm_page_prot downgrade_write (or up_write, down_read?) slowly walk page tables write protecting and clearing soft-dirty on ptes up_read But please don't mistake me for someone who has a good grasp of soft-dirty: I don't. Hugh -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, Aug 26, 2014 at 06:43:55PM +0300, Kirill A. Shutemov wrote: > On Tue, Aug 26, 2014 at 07:18:13PM +0400, Cyrill Gorcunov wrote: > > > Basically, it's safe if only soft-dirty is allowed to modify vm_flags > > > without down_write(). But why is soft-dirty so special? > > > > because how we use this bit, i mean in normal workload this bit won't > > be used intensively i think so it's not widespread in kernel code > > Weak argument to me. > > What about walk through vmas twice: first with down_write() to modify > vm_flags and vm_page_prot, then downgrade_write() and do > walk_page_range() on every vma? I still it's undeeded, but for sure using write-lock/downgrade won't hurt, so no argues from my side. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, Aug 26, 2014 at 07:18:13PM +0400, Cyrill Gorcunov wrote: > > Basically, it's safe if only soft-dirty is allowed to modify vm_flags > > without down_write(). But why is soft-dirty so special? > > because how we use this bit, i mean in normal workload this bit won't > be used intensively i think so it's not widespread in kernel code Weak argument to me. What about walk through vmas twice: first with down_write() to modify vm_flags and vm_page_prot, then downgrade_write() and do walk_page_range() on every vma? -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, Aug 26, 2014 at 05:56:12PM +0300, Kirill A. Shutemov wrote: > > > > > > It seems safe in vma-softdirty context. But if somebody else will decide > > > that > > > it's fine to modify vm_flags without down_write (in their context), we > > > will get trouble. Sasha will come with weird bug report one day ;) > > > > > > At least vm_flags must be updated atomically to avoid race in middle of > > > load-modify-store. > > > > Which race you mean here? Two concurrent clear-refs? > > Two concurent clear-refs is fine. But if somebody else will exploit the > same approch to set/clear other VM_FOO and it will race with clear-refs > we get trouble: some modifications can be lost. yup, i see > Basically, it's safe if only soft-dirty is allowed to modify vm_flags > without down_write(). But why is soft-dirty so special? because how we use this bit, i mean in normal workload this bit won't be used intensively i think so it's not widespread in kernel code > Should we consider moving protection of some vma fields under per-vma lock > rather use over-loaded mmap_sem? Hard to say, if vma-softdirty bit is the reason then I guess no, probably it worth to estimate how much profit we would have if using per-vma lock. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, Aug 26, 2014 at 06:19:14PM +0400, Cyrill Gorcunov wrote: > On Tue, Aug 26, 2014 at 05:04:19PM +0300, Kirill A. Shutemov wrote: > > > > > > > > But now I'm realizing that if this is the _only_ place which modifies > > > > vm_flags with down_read, then it's "probably" safe. I've a vague > > > > feeling that this was discussed before - is that so, Cyrill? > > > > > > Well, as far as I remember we were not talking before about vm_flags > > > and read-lock in this function, maybe it was on some unrelated lkml thread > > > without me CC'ed? Until I miss something obvious using read-lock here > > > for vm_flags modification should be safe, since the only thing which is > > > important (in context of vma-softdirty) is the vma's presence. Hugh, > > > mind to refresh my memory, how long ago the discussion took place? > > > > It seems safe in vma-softdirty context. But if somebody else will decide > > that > > it's fine to modify vm_flags without down_write (in their context), we > > will get trouble. Sasha will come with weird bug report one day ;) > > > > At least vm_flags must be updated atomically to avoid race in middle of > > load-modify-store. > > Which race you mean here? Two concurrent clear-refs? Two concurent clear-refs is fine. But if somebody else will exploit the same approch to set/clear other VM_FOO and it will race with clear-refs we get trouble: some modifications can be lost. Basically, it's safe if only soft-dirty is allowed to modify vm_flags without down_write(). But why is soft-dirty so special? Should we consider moving protection of some vma fields under per-vma lock rather use over-loaded mmap_sem? -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, Aug 26, 2014 at 05:04:19PM +0300, Kirill A. Shutemov wrote: > > > > > > But now I'm realizing that if this is the _only_ place which modifies > > > vm_flags with down_read, then it's "probably" safe. I've a vague > > > feeling that this was discussed before - is that so, Cyrill? > > > > Well, as far as I remember we were not talking before about vm_flags > > and read-lock in this function, maybe it was on some unrelated lkml thread > > without me CC'ed? Until I miss something obvious using read-lock here > > for vm_flags modification should be safe, since the only thing which is > > important (in context of vma-softdirty) is the vma's presence. Hugh, > > mind to refresh my memory, how long ago the discussion took place? > > It seems safe in vma-softdirty context. But if somebody else will decide that > it's fine to modify vm_flags without down_write (in their context), we > will get trouble. Sasha will come with weird bug report one day ;) > > At least vm_flags must be updated atomically to avoid race in middle of > load-modify-store. Which race you mean here? Two concurrent clear-refs? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, Aug 26, 2014 at 10:49:52AM +0400, Cyrill Gorcunov wrote: > On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote: > > > +static int clear_refs(struct mm_struct *mm, enum clear_refs_types type, > > > + int write) > > > +{ > ... > > > + > > > + if (write) > > > + down_write(>mmap_sem); > > > + else > > > + down_read(>mmap_sem); > > > + > > > + if (type == CLEAR_REFS_SOFT_DIRTY) > > > + mmu_notifier_invalidate_range_start(mm, 0, -1); > > > + > > > + for (vma = mm->mmap; vma; vma = vma->vm_next) { > > > + cp.vma = vma; > > > + if (is_vm_hugetlb_page(vma)) > > > + continue; > ... > > > + if (type == CLEAR_REFS_ANON && vma->vm_file) > > > + continue; > > > + if (type == CLEAR_REFS_MAPPED && !vma->vm_file) > > > + continue; > > > + if (type == CLEAR_REFS_SOFT_DIRTY && > > > + (vma->vm_flags & VM_SOFTDIRTY)) { > > > + if (!write) { > > > + r = -EAGAIN; > > > + break; > > > > Hmm. For a long time I thought you were fixing another important bug > > with down_write, since we "always" use down_write to modify vm_flags. > > > > But now I'm realizing that if this is the _only_ place which modifies > > vm_flags with down_read, then it's "probably" safe. I've a vague > > feeling that this was discussed before - is that so, Cyrill? > > Well, as far as I remember we were not talking before about vm_flags > and read-lock in this function, maybe it was on some unrelated lkml thread > without me CC'ed? Until I miss something obvious using read-lock here > for vm_flags modification should be safe, since the only thing which is > important (in context of vma-softdirty) is the vma's presence. Hugh, > mind to refresh my memory, how long ago the discussion took place? It seems safe in vma-softdirty context. But if somebody else will decide that it's fine to modify vm_flags without down_write (in their context), we will get trouble. Sasha will come with weird bug report one day ;) At least vm_flags must be updated atomically to avoid race in middle of load-modify-store. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote: > > +static int clear_refs(struct mm_struct *mm, enum clear_refs_types type, > > + int write) > > +{ ... > > + > > + if (write) > > + down_write(>mmap_sem); > > + else > > + down_read(>mmap_sem); > > + > > + if (type == CLEAR_REFS_SOFT_DIRTY) > > + mmu_notifier_invalidate_range_start(mm, 0, -1); > > + > > + for (vma = mm->mmap; vma; vma = vma->vm_next) { > > + cp.vma = vma; > > + if (is_vm_hugetlb_page(vma)) > > + continue; ... > > + if (type == CLEAR_REFS_ANON && vma->vm_file) > > + continue; > > + if (type == CLEAR_REFS_MAPPED && !vma->vm_file) > > + continue; > > + if (type == CLEAR_REFS_SOFT_DIRTY && > > + (vma->vm_flags & VM_SOFTDIRTY)) { > > + if (!write) { > > + r = -EAGAIN; > > + break; > > Hmm. For a long time I thought you were fixing another important bug > with down_write, since we "always" use down_write to modify vm_flags. > > But now I'm realizing that if this is the _only_ place which modifies > vm_flags with down_read, then it's "probably" safe. I've a vague > feeling that this was discussed before - is that so, Cyrill? Well, as far as I remember we were not talking before about vm_flags and read-lock in this function, maybe it was on some unrelated lkml thread without me CC'ed? Until I miss something obvious using read-lock here for vm_flags modification should be safe, since the only thing which is important (in context of vma-softdirty) is the vma's presence. Hugh, mind to refresh my memory, how long ago the discussion took place? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote: +static int clear_refs(struct mm_struct *mm, enum clear_refs_types type, + int write) +{ ... + + if (write) + down_write(mm-mmap_sem); + else + down_read(mm-mmap_sem); + + if (type == CLEAR_REFS_SOFT_DIRTY) + mmu_notifier_invalidate_range_start(mm, 0, -1); + + for (vma = mm-mmap; vma; vma = vma-vm_next) { + cp.vma = vma; + if (is_vm_hugetlb_page(vma)) + continue; ... + if (type == CLEAR_REFS_ANON vma-vm_file) + continue; + if (type == CLEAR_REFS_MAPPED !vma-vm_file) + continue; + if (type == CLEAR_REFS_SOFT_DIRTY + (vma-vm_flags VM_SOFTDIRTY)) { + if (!write) { + r = -EAGAIN; + break; Hmm. For a long time I thought you were fixing another important bug with down_write, since we always use down_write to modify vm_flags. But now I'm realizing that if this is the _only_ place which modifies vm_flags with down_read, then it's probably safe. I've a vague feeling that this was discussed before - is that so, Cyrill? Well, as far as I remember we were not talking before about vm_flags and read-lock in this function, maybe it was on some unrelated lkml thread without me CC'ed? Until I miss something obvious using read-lock here for vm_flags modification should be safe, since the only thing which is important (in context of vma-softdirty) is the vma's presence. Hugh, mind to refresh my memory, how long ago the discussion took place? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, Aug 26, 2014 at 10:49:52AM +0400, Cyrill Gorcunov wrote: On Mon, Aug 25, 2014 at 09:45:34PM -0700, Hugh Dickins wrote: +static int clear_refs(struct mm_struct *mm, enum clear_refs_types type, + int write) +{ ... + + if (write) + down_write(mm-mmap_sem); + else + down_read(mm-mmap_sem); + + if (type == CLEAR_REFS_SOFT_DIRTY) + mmu_notifier_invalidate_range_start(mm, 0, -1); + + for (vma = mm-mmap; vma; vma = vma-vm_next) { + cp.vma = vma; + if (is_vm_hugetlb_page(vma)) + continue; ... + if (type == CLEAR_REFS_ANON vma-vm_file) + continue; + if (type == CLEAR_REFS_MAPPED !vma-vm_file) + continue; + if (type == CLEAR_REFS_SOFT_DIRTY + (vma-vm_flags VM_SOFTDIRTY)) { + if (!write) { + r = -EAGAIN; + break; Hmm. For a long time I thought you were fixing another important bug with down_write, since we always use down_write to modify vm_flags. But now I'm realizing that if this is the _only_ place which modifies vm_flags with down_read, then it's probably safe. I've a vague feeling that this was discussed before - is that so, Cyrill? Well, as far as I remember we were not talking before about vm_flags and read-lock in this function, maybe it was on some unrelated lkml thread without me CC'ed? Until I miss something obvious using read-lock here for vm_flags modification should be safe, since the only thing which is important (in context of vma-softdirty) is the vma's presence. Hugh, mind to refresh my memory, how long ago the discussion took place? It seems safe in vma-softdirty context. But if somebody else will decide that it's fine to modify vm_flags without down_write (in their context), we will get trouble. Sasha will come with weird bug report one day ;) At least vm_flags must be updated atomically to avoid race in middle of load-modify-store. -- Kirill A. Shutemov -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, Aug 26, 2014 at 05:04:19PM +0300, Kirill A. Shutemov wrote: But now I'm realizing that if this is the _only_ place which modifies vm_flags with down_read, then it's probably safe. I've a vague feeling that this was discussed before - is that so, Cyrill? Well, as far as I remember we were not talking before about vm_flags and read-lock in this function, maybe it was on some unrelated lkml thread without me CC'ed? Until I miss something obvious using read-lock here for vm_flags modification should be safe, since the only thing which is important (in context of vma-softdirty) is the vma's presence. Hugh, mind to refresh my memory, how long ago the discussion took place? It seems safe in vma-softdirty context. But if somebody else will decide that it's fine to modify vm_flags without down_write (in their context), we will get trouble. Sasha will come with weird bug report one day ;) At least vm_flags must be updated atomically to avoid race in middle of load-modify-store. Which race you mean here? Two concurrent clear-refs? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, Aug 26, 2014 at 06:19:14PM +0400, Cyrill Gorcunov wrote: On Tue, Aug 26, 2014 at 05:04:19PM +0300, Kirill A. Shutemov wrote: But now I'm realizing that if this is the _only_ place which modifies vm_flags with down_read, then it's probably safe. I've a vague feeling that this was discussed before - is that so, Cyrill? Well, as far as I remember we were not talking before about vm_flags and read-lock in this function, maybe it was on some unrelated lkml thread without me CC'ed? Until I miss something obvious using read-lock here for vm_flags modification should be safe, since the only thing which is important (in context of vma-softdirty) is the vma's presence. Hugh, mind to refresh my memory, how long ago the discussion took place? It seems safe in vma-softdirty context. But if somebody else will decide that it's fine to modify vm_flags without down_write (in their context), we will get trouble. Sasha will come with weird bug report one day ;) At least vm_flags must be updated atomically to avoid race in middle of load-modify-store. Which race you mean here? Two concurrent clear-refs? Two concurent clear-refs is fine. But if somebody else will exploit the same approch to set/clear other VM_FOO and it will race with clear-refs we get trouble: some modifications can be lost. Basically, it's safe if only soft-dirty is allowed to modify vm_flags without down_write(). But why is soft-dirty so special? Should we consider moving protection of some vma fields under per-vma lock rather use over-loaded mmap_sem? -- Kirill A. Shutemov -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, Aug 26, 2014 at 05:56:12PM +0300, Kirill A. Shutemov wrote: It seems safe in vma-softdirty context. But if somebody else will decide that it's fine to modify vm_flags without down_write (in their context), we will get trouble. Sasha will come with weird bug report one day ;) At least vm_flags must be updated atomically to avoid race in middle of load-modify-store. Which race you mean here? Two concurrent clear-refs? Two concurent clear-refs is fine. But if somebody else will exploit the same approch to set/clear other VM_FOO and it will race with clear-refs we get trouble: some modifications can be lost. yup, i see Basically, it's safe if only soft-dirty is allowed to modify vm_flags without down_write(). But why is soft-dirty so special? because how we use this bit, i mean in normal workload this bit won't be used intensively i think so it's not widespread in kernel code Should we consider moving protection of some vma fields under per-vma lock rather use over-loaded mmap_sem? Hard to say, if vma-softdirty bit is the reason then I guess no, probably it worth to estimate how much profit we would have if using per-vma lock. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, Aug 26, 2014 at 07:18:13PM +0400, Cyrill Gorcunov wrote: Basically, it's safe if only soft-dirty is allowed to modify vm_flags without down_write(). But why is soft-dirty so special? because how we use this bit, i mean in normal workload this bit won't be used intensively i think so it's not widespread in kernel code Weak argument to me. What about walk through vmas twice: first with down_write() to modify vm_flags and vm_page_prot, then downgrade_write() and do walk_page_range() on every vma? -- Kirill A. Shutemov -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Tue, Aug 26, 2014 at 06:43:55PM +0300, Kirill A. Shutemov wrote: On Tue, Aug 26, 2014 at 07:18:13PM +0400, Cyrill Gorcunov wrote: Basically, it's safe if only soft-dirty is allowed to modify vm_flags without down_write(). But why is soft-dirty so special? because how we use this bit, i mean in normal workload this bit won't be used intensively i think so it's not widespread in kernel code Weak argument to me. What about walk through vmas twice: first with down_write() to modify vm_flags and vm_page_prot, then downgrade_write() and do walk_page_range() on every vma? I still it's undeeded, but for sure using write-lock/downgrade won't hurt, so no argues from my side. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Sun, 24 Aug 2014, Peter Feiner wrote: > For VMAs that don't want write notifications, PTEs created for read > faults have their write bit set. If the read fault happens after > VM_SOFTDIRTY is cleared, then the PTE's softdirty bit will remain > clear after subsequent writes. Good catch. Worrying that it's not been noticed until now. But I find quite a lot that needs thinking about in the fix. > > Here's a simple code snippet to demonstrate the bug: > > char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE, > MAP_ANONYMOUS | MAP_SHARED, -1, 0); > system("echo 4 > /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY */ > assert(*m == '\0'); /* new PTE allows write access */ > assert(!soft_dirty(x)); > *m = 'x'; /* should dirty the page */ > assert(soft_dirty(x)); /* fails */ > > With this patch, write notifications are enabled when VM_SOFTDIRTY is > cleared. Furthermore, to avoid unnecessary faults, write > notifications are disabled when VM_SOFTDIRTY is reset. "reset" is often a synonym for "cleared": "whenever VM_SOFTDIRTY is set"? > > As a side effect of enabling and disabling write notifications with > care, this patch fixes a bug in mprotect where vm_page_prot bits set > by drivers were zapped on mprotect. An analogous bug was fixed in mmap > by c9d0bf241451a3ab7d02e1652c22b80cd7d93e8f. Commit c9d0bf241451 ("mm: uncached vma support with writenotify"). Adding Magnus to the Cc list: I have some doubt as to whether his bugfix is in fact preserved below, and would like him to check. > > Reported-by: Peter Feiner > Suggested-by: Kirill A. Shutemov > Reviewed-by: Kirill A. Shutemov > Reviewed-by: Cyrill Gorcunov > Signed-off-by: Peter Feiner I like Kirill's suggestion to approach this via writenotify, but find the disable/enable rather confusing (partly because enabling writenotify amounts to disabling write access). I may be alone in my confusion. > > --- > > v1 -> v2: Instead of checking VM_SOFTDIRTY in the fault handler, > enable write notifications on vm_page_prot when we clear > VM_SOFTDIRTY. > > v2 -> v3: * Grab the mmap_sem in write mode if any VMAs have > VM_SOFTDIRTY set. This involved refactoring clear_refs_write > to make it less unwieldy. > > * In mprotect, don't inadvertently disable write notifications on > VMAs > that have had VM_SOFTDIRTY cleared > > * The mprotect fix and mmap cleanup that comprised the > second and third patches in v2 were swallowed by the main > patch because of vm_page_prot corner case handling. > > v3 -> v4: Handle !defined(CONFIG_MEM_SOFT_DIRTY): old patch would have > enabled write notifications for all VMAs in this case. > > v4 -> v5: IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) instead of #ifdef ... > --- > fs/proc/task_mmu.c | 113 > + > include/linux/mm.h | 14 +++ > mm/mmap.c | 24 +--- > mm/mprotect.c | 6 +-- > 4 files changed, 97 insertions(+), 60 deletions(-) > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index dfc791c..f5e75c6 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -785,13 +785,80 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned > long addr, > return 0; > } > > +static int clear_refs(struct mm_struct *mm, enum clear_refs_types type, > + int write) > +{ > + int r = 0; > + struct vm_area_struct *vma; > + struct clear_refs_private cp = { > + .type = type, > + }; > + struct mm_walk clear_refs_walk = { > + .pmd_entry = clear_refs_pte_range, > + .mm = mm, > + .private = , > + }; > + > + if (write) > + down_write(>mmap_sem); > + else > + down_read(>mmap_sem); > + > + if (type == CLEAR_REFS_SOFT_DIRTY) > + mmu_notifier_invalidate_range_start(mm, 0, -1); > + > + for (vma = mm->mmap; vma; vma = vma->vm_next) { > + cp.vma = vma; > + if (is_vm_hugetlb_page(vma)) > + continue; > + /* > + * Writing 1 to /proc/pid/clear_refs affects all pages. > + * > + * Writing 2 to /proc/pid/clear_refs only affects > + * Anonymous pages. > + * > + * Writing 3 to /proc/pid/clear_refs only affects file > + * mapped pages. > + * > + * Writing 4 to /proc/pid/clear_refs affects all pages. > + */ > + if (type == CLEAR_REFS_ANON && vma->vm_file) > + continue; > + if (type == CLEAR_REFS_MAPPED && !vma->vm_file) > + continue; > + if (type == CLEAR_REFS_SOFT_DIRTY && > + (vma->vm_flags & VM_SOFTDIRTY)) { > + if (!write) { > +
Re: [PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
On Sun, 24 Aug 2014, Peter Feiner wrote: For VMAs that don't want write notifications, PTEs created for read faults have their write bit set. If the read fault happens after VM_SOFTDIRTY is cleared, then the PTE's softdirty bit will remain clear after subsequent writes. Good catch. Worrying that it's not been noticed until now. But I find quite a lot that needs thinking about in the fix. Here's a simple code snippet to demonstrate the bug: char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0); system(echo 4 /proc/$PPID/clear_refs); /* clear VM_SOFTDIRTY */ assert(*m == '\0'); /* new PTE allows write access */ assert(!soft_dirty(x)); *m = 'x'; /* should dirty the page */ assert(soft_dirty(x)); /* fails */ With this patch, write notifications are enabled when VM_SOFTDIRTY is cleared. Furthermore, to avoid unnecessary faults, write notifications are disabled when VM_SOFTDIRTY is reset. reset is often a synonym for cleared: whenever VM_SOFTDIRTY is set? As a side effect of enabling and disabling write notifications with care, this patch fixes a bug in mprotect where vm_page_prot bits set by drivers were zapped on mprotect. An analogous bug was fixed in mmap by c9d0bf241451a3ab7d02e1652c22b80cd7d93e8f. Commit c9d0bf241451 (mm: uncached vma support with writenotify). Adding Magnus to the Cc list: I have some doubt as to whether his bugfix is in fact preserved below, and would like him to check. Reported-by: Peter Feiner pfei...@google.com Suggested-by: Kirill A. Shutemov kirill.shute...@linux.intel.com Reviewed-by: Kirill A. Shutemov kirill.shute...@linux.intel.com Reviewed-by: Cyrill Gorcunov gorcu...@openvz.org Signed-off-by: Peter Feiner pfei...@google.com I like Kirill's suggestion to approach this via writenotify, but find the disable/enable rather confusing (partly because enabling writenotify amounts to disabling write access). I may be alone in my confusion. --- v1 - v2: Instead of checking VM_SOFTDIRTY in the fault handler, enable write notifications on vm_page_prot when we clear VM_SOFTDIRTY. v2 - v3: * Grab the mmap_sem in write mode if any VMAs have VM_SOFTDIRTY set. This involved refactoring clear_refs_write to make it less unwieldy. * In mprotect, don't inadvertently disable write notifications on VMAs that have had VM_SOFTDIRTY cleared * The mprotect fix and mmap cleanup that comprised the second and third patches in v2 were swallowed by the main patch because of vm_page_prot corner case handling. v3 - v4: Handle !defined(CONFIG_MEM_SOFT_DIRTY): old patch would have enabled write notifications for all VMAs in this case. v4 - v5: IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) instead of #ifdef ... --- fs/proc/task_mmu.c | 113 + include/linux/mm.h | 14 +++ mm/mmap.c | 24 +--- mm/mprotect.c | 6 +-- 4 files changed, 97 insertions(+), 60 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index dfc791c..f5e75c6 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -785,13 +785,80 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr, return 0; } +static int clear_refs(struct mm_struct *mm, enum clear_refs_types type, + int write) +{ + int r = 0; + struct vm_area_struct *vma; + struct clear_refs_private cp = { + .type = type, + }; + struct mm_walk clear_refs_walk = { + .pmd_entry = clear_refs_pte_range, + .mm = mm, + .private = cp, + }; + + if (write) + down_write(mm-mmap_sem); + else + down_read(mm-mmap_sem); + + if (type == CLEAR_REFS_SOFT_DIRTY) + mmu_notifier_invalidate_range_start(mm, 0, -1); + + for (vma = mm-mmap; vma; vma = vma-vm_next) { + cp.vma = vma; + if (is_vm_hugetlb_page(vma)) + continue; + /* + * Writing 1 to /proc/pid/clear_refs affects all pages. + * + * Writing 2 to /proc/pid/clear_refs only affects + * Anonymous pages. + * + * Writing 3 to /proc/pid/clear_refs only affects file + * mapped pages. + * + * Writing 4 to /proc/pid/clear_refs affects all pages. + */ + if (type == CLEAR_REFS_ANON vma-vm_file) + continue; + if (type == CLEAR_REFS_MAPPED !vma-vm_file) + continue; + if (type == CLEAR_REFS_SOFT_DIRTY + (vma-vm_flags VM_SOFTDIRTY)) { + if (!write) { + r = -EAGAIN;
[PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
For VMAs that don't want write notifications, PTEs created for read faults have their write bit set. If the read fault happens after VM_SOFTDIRTY is cleared, then the PTE's softdirty bit will remain clear after subsequent writes. Here's a simple code snippet to demonstrate the bug: char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0); system("echo 4 > /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY */ assert(*m == '\0'); /* new PTE allows write access */ assert(!soft_dirty(x)); *m = 'x'; /* should dirty the page */ assert(soft_dirty(x)); /* fails */ With this patch, write notifications are enabled when VM_SOFTDIRTY is cleared. Furthermore, to avoid unnecessary faults, write notifications are disabled when VM_SOFTDIRTY is reset. As a side effect of enabling and disabling write notifications with care, this patch fixes a bug in mprotect where vm_page_prot bits set by drivers were zapped on mprotect. An analogous bug was fixed in mmap by c9d0bf241451a3ab7d02e1652c22b80cd7d93e8f. Reported-by: Peter Feiner Suggested-by: Kirill A. Shutemov Reviewed-by: Kirill A. Shutemov Reviewed-by: Cyrill Gorcunov Signed-off-by: Peter Feiner --- v1 -> v2: Instead of checking VM_SOFTDIRTY in the fault handler, enable write notifications on vm_page_prot when we clear VM_SOFTDIRTY. v2 -> v3: * Grab the mmap_sem in write mode if any VMAs have VM_SOFTDIRTY set. This involved refactoring clear_refs_write to make it less unwieldy. * In mprotect, don't inadvertently disable write notifications on VMAs that have had VM_SOFTDIRTY cleared * The mprotect fix and mmap cleanup that comprised the second and third patches in v2 were swallowed by the main patch because of vm_page_prot corner case handling. v3 -> v4: Handle !defined(CONFIG_MEM_SOFT_DIRTY): old patch would have enabled write notifications for all VMAs in this case. v4 -> v5: IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) instead of #ifdef ... --- fs/proc/task_mmu.c | 113 + include/linux/mm.h | 14 +++ mm/mmap.c | 24 +--- mm/mprotect.c | 6 +-- 4 files changed, 97 insertions(+), 60 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index dfc791c..f5e75c6 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -785,13 +785,80 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr, return 0; } +static int clear_refs(struct mm_struct *mm, enum clear_refs_types type, + int write) +{ + int r = 0; + struct vm_area_struct *vma; + struct clear_refs_private cp = { + .type = type, + }; + struct mm_walk clear_refs_walk = { + .pmd_entry = clear_refs_pte_range, + .mm = mm, + .private = , + }; + + if (write) + down_write(>mmap_sem); + else + down_read(>mmap_sem); + + if (type == CLEAR_REFS_SOFT_DIRTY) + mmu_notifier_invalidate_range_start(mm, 0, -1); + + for (vma = mm->mmap; vma; vma = vma->vm_next) { + cp.vma = vma; + if (is_vm_hugetlb_page(vma)) + continue; + /* +* Writing 1 to /proc/pid/clear_refs affects all pages. +* +* Writing 2 to /proc/pid/clear_refs only affects +* Anonymous pages. +* +* Writing 3 to /proc/pid/clear_refs only affects file +* mapped pages. +* +* Writing 4 to /proc/pid/clear_refs affects all pages. +*/ + if (type == CLEAR_REFS_ANON && vma->vm_file) + continue; + if (type == CLEAR_REFS_MAPPED && !vma->vm_file) + continue; + if (type == CLEAR_REFS_SOFT_DIRTY && + (vma->vm_flags & VM_SOFTDIRTY)) { + if (!write) { + r = -EAGAIN; + break; + } + vma->vm_flags &= ~VM_SOFTDIRTY; + vma_enable_writenotify(vma); + } + walk_page_range(vma->vm_start, vma->vm_end, + _refs_walk); + } + + if (type == CLEAR_REFS_SOFT_DIRTY) + mmu_notifier_invalidate_range_end(mm, 0, -1); + + if (!r) + flush_tlb_mm(mm); + + if (write) + up_write(>mmap_sem); + else + up_read(>mmap_sem); + + return r; +} + static ssize_t clear_refs_write(struct file *file, const char __user *buf, size_t count, loff_t *ppos) { struct task_struct *task; char
[PATCH v5] mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
For VMAs that don't want write notifications, PTEs created for read faults have their write bit set. If the read fault happens after VM_SOFTDIRTY is cleared, then the PTE's softdirty bit will remain clear after subsequent writes. Here's a simple code snippet to demonstrate the bug: char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0); system(echo 4 /proc/$PPID/clear_refs); /* clear VM_SOFTDIRTY */ assert(*m == '\0'); /* new PTE allows write access */ assert(!soft_dirty(x)); *m = 'x'; /* should dirty the page */ assert(soft_dirty(x)); /* fails */ With this patch, write notifications are enabled when VM_SOFTDIRTY is cleared. Furthermore, to avoid unnecessary faults, write notifications are disabled when VM_SOFTDIRTY is reset. As a side effect of enabling and disabling write notifications with care, this patch fixes a bug in mprotect where vm_page_prot bits set by drivers were zapped on mprotect. An analogous bug was fixed in mmap by c9d0bf241451a3ab7d02e1652c22b80cd7d93e8f. Reported-by: Peter Feiner pfei...@google.com Suggested-by: Kirill A. Shutemov kirill.shute...@linux.intel.com Reviewed-by: Kirill A. Shutemov kirill.shute...@linux.intel.com Reviewed-by: Cyrill Gorcunov gorcu...@openvz.org Signed-off-by: Peter Feiner pfei...@google.com --- v1 - v2: Instead of checking VM_SOFTDIRTY in the fault handler, enable write notifications on vm_page_prot when we clear VM_SOFTDIRTY. v2 - v3: * Grab the mmap_sem in write mode if any VMAs have VM_SOFTDIRTY set. This involved refactoring clear_refs_write to make it less unwieldy. * In mprotect, don't inadvertently disable write notifications on VMAs that have had VM_SOFTDIRTY cleared * The mprotect fix and mmap cleanup that comprised the second and third patches in v2 were swallowed by the main patch because of vm_page_prot corner case handling. v3 - v4: Handle !defined(CONFIG_MEM_SOFT_DIRTY): old patch would have enabled write notifications for all VMAs in this case. v4 - v5: IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) instead of #ifdef ... --- fs/proc/task_mmu.c | 113 + include/linux/mm.h | 14 +++ mm/mmap.c | 24 +--- mm/mprotect.c | 6 +-- 4 files changed, 97 insertions(+), 60 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index dfc791c..f5e75c6 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -785,13 +785,80 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr, return 0; } +static int clear_refs(struct mm_struct *mm, enum clear_refs_types type, + int write) +{ + int r = 0; + struct vm_area_struct *vma; + struct clear_refs_private cp = { + .type = type, + }; + struct mm_walk clear_refs_walk = { + .pmd_entry = clear_refs_pte_range, + .mm = mm, + .private = cp, + }; + + if (write) + down_write(mm-mmap_sem); + else + down_read(mm-mmap_sem); + + if (type == CLEAR_REFS_SOFT_DIRTY) + mmu_notifier_invalidate_range_start(mm, 0, -1); + + for (vma = mm-mmap; vma; vma = vma-vm_next) { + cp.vma = vma; + if (is_vm_hugetlb_page(vma)) + continue; + /* +* Writing 1 to /proc/pid/clear_refs affects all pages. +* +* Writing 2 to /proc/pid/clear_refs only affects +* Anonymous pages. +* +* Writing 3 to /proc/pid/clear_refs only affects file +* mapped pages. +* +* Writing 4 to /proc/pid/clear_refs affects all pages. +*/ + if (type == CLEAR_REFS_ANON vma-vm_file) + continue; + if (type == CLEAR_REFS_MAPPED !vma-vm_file) + continue; + if (type == CLEAR_REFS_SOFT_DIRTY + (vma-vm_flags VM_SOFTDIRTY)) { + if (!write) { + r = -EAGAIN; + break; + } + vma-vm_flags = ~VM_SOFTDIRTY; + vma_enable_writenotify(vma); + } + walk_page_range(vma-vm_start, vma-vm_end, + clear_refs_walk); + } + + if (type == CLEAR_REFS_SOFT_DIRTY) + mmu_notifier_invalidate_range_end(mm, 0, -1); + + if (!r) + flush_tlb_mm(mm); + + if (write) + up_write(mm-mmap_sem); + else + up_read(mm-mmap_sem); + + return r; +} + static ssize_t clear_refs_write(struct file *file, const char __user *buf,