Re: [PATCH 01/13] dax: update to new mmu_notifier semantic

2017-09-02 Thread Andrea Arcangeli
On Thu, Aug 31, 2017 at 05:17:26PM -0400, Jerome Glisse wrote: > + if (start && end) { "&& end" can be dropped from above and the other places but it can be optimized later.. Thanks, Andrea

Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback v2

2017-09-02 Thread Andrea Arcangeli
date to new mmu_notifier semantic > xen/gntdev: update to new mmu_notifier semantic > KVM: update to new mmu_notifier semantic > mm/mmu_notifier: kill invalidate_page Reviewed-by: Andrea Arcangeli <aarca...@redhat.com>

Re: [PATCH 00/13] mmu_notifier kill invalidate_page callback v2

2017-09-02 Thread Andrea Arcangeli
date to new mmu_notifier semantic > xen/gntdev: update to new mmu_notifier semantic > KVM: update to new mmu_notifier semantic > mm/mmu_notifier: kill invalidate_page Reviewed-by: Andrea Arcangeli

Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

2017-08-31 Thread Andrea Arcangeli
On Wed, Aug 30, 2017 at 08:47:19PM -0400, Jerome Glisse wrote: > On Wed, Aug 30, 2017 at 04:25:54PM -0700, Nadav Amit wrote: > > For both CoW and KSM, the correctness is maintained by calling > > ptep_clear_flush_notify(). If you defer the secondary MMU invalidation > > (i.e., replacing

Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

2017-08-31 Thread Andrea Arcangeli
On Wed, Aug 30, 2017 at 08:47:19PM -0400, Jerome Glisse wrote: > On Wed, Aug 30, 2017 at 04:25:54PM -0700, Nadav Amit wrote: > > For both CoW and KSM, the correctness is maintained by calling > > ptep_clear_flush_notify(). If you defer the secondary MMU invalidation > > (i.e., replacing

Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

2017-08-30 Thread Andrea Arcangeli
On Wed, Aug 30, 2017 at 02:53:38PM -0700, Linus Torvalds wrote: > On Wed, Aug 30, 2017 at 9:52 AM, Andrea Arcangeli <aarca...@redhat.com> wrote: > > > > I pointed out in earlier email ->invalidate_range can only be > > implemented (as mutually exclusive alternative t

Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

2017-08-30 Thread Andrea Arcangeli
On Wed, Aug 30, 2017 at 02:53:38PM -0700, Linus Torvalds wrote: > On Wed, Aug 30, 2017 at 9:52 AM, Andrea Arcangeli wrote: > > > > I pointed out in earlier email ->invalidate_range can only be > > implemented (as mutually exclusive alternative to > > ->invalid

Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

2017-08-30 Thread Andrea Arcangeli
On Wed, Aug 30, 2017 at 04:45:49PM -0400, Jerome Glisse wrote: > So i look at both AMD and Intel IOMMU. AMD always flush and current pte value > do not matter AFAICT (i doubt that hardware rewalk the page table just to > decide not to flush that would be terribly dumb for hardware engineer to do >

Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

2017-08-30 Thread Andrea Arcangeli
On Wed, Aug 30, 2017 at 04:45:49PM -0400, Jerome Glisse wrote: > So i look at both AMD and Intel IOMMU. AMD always flush and current pte value > do not matter AFAICT (i doubt that hardware rewalk the page table just to > decide not to flush that would be terribly dumb for hardware engineer to do >

Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

2017-08-30 Thread Andrea Arcangeli
On Wed, Aug 30, 2017 at 11:00:32AM -0700, Nadav Amit wrote: > It is not trivial to flush TLBs (primary or secondary) without holding the > page-table lock, and as we recently encountered this resulted in several > bugs [1]. The main problem is that even if you perform the TLB flush > immediately

Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

2017-08-30 Thread Andrea Arcangeli
On Wed, Aug 30, 2017 at 11:00:32AM -0700, Nadav Amit wrote: > It is not trivial to flush TLBs (primary or secondary) without holding the > page-table lock, and as we recently encountered this resulted in several > bugs [1]. The main problem is that even if you perform the TLB flush > immediately

Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

2017-08-30 Thread Andrea Arcangeli
On Wed, Aug 30, 2017 at 11:40:08AM -0700, Nadav Amit wrote: > The mmu_notifier users would have to be aware that invalidations may be > deferred. If they perform their ``invalidations’’ unconditionally, it may be > ok. If the notifier users avoid invalidations based on the PTE in the > secondary

Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

2017-08-30 Thread Andrea Arcangeli
On Wed, Aug 30, 2017 at 11:40:08AM -0700, Nadav Amit wrote: > The mmu_notifier users would have to be aware that invalidations may be > deferred. If they perform their ``invalidations’’ unconditionally, it may be > ok. If the notifier users avoid invalidations based on the PTE in the > secondary

Re: [RFC PATCH] mm, oom_reaper: skip mm structs with mmu notifiers

2017-08-30 Thread Andrea Arcangeli
Hello Michal, On Wed, Aug 30, 2017 at 10:46:00AM +0200, Michal Hocko wrote: > + * TODO: we really want to get rid of this ugly hack and make sure that > + * notifiers cannot block for unbounded amount of time and add > + * mmu_notifier_invalidate_range_{start,end} around

Re: [RFC PATCH] mm, oom_reaper: skip mm structs with mmu notifiers

2017-08-30 Thread Andrea Arcangeli
Hello Michal, On Wed, Aug 30, 2017 at 10:46:00AM +0200, Michal Hocko wrote: > + * TODO: we really want to get rid of this ugly hack and make sure that > + * notifiers cannot block for unbounded amount of time and add > + * mmu_notifier_invalidate_range_{start,end} around

Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

2017-08-30 Thread Andrea Arcangeli
On Tue, Aug 29, 2017 at 07:46:07PM -0700, Nadav Amit wrote: > Therefore, IIUC, try_to_umap_one() should only call > mmu_notifier_invalidate_range() after ptep_get_and_clear() and That would trigger an unnecessarily double call to ->invalidate_range() both from mmu_notifier_invalidate_range()

Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

2017-08-30 Thread Andrea Arcangeli
On Tue, Aug 29, 2017 at 07:46:07PM -0700, Nadav Amit wrote: > Therefore, IIUC, try_to_umap_one() should only call > mmu_notifier_invalidate_range() after ptep_get_and_clear() and That would trigger an unnecessarily double call to ->invalidate_range() both from mmu_notifier_invalidate_range()

Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

2017-08-30 Thread Andrea Arcangeli
Hello Jerome, On Tue, Aug 29, 2017 at 07:54:36PM -0400, Jerome Glisse wrote: > Replacing all mmu_notifier_invalidate_page() by mmu_notifier_invalidat_range() > and making sure it is bracketed by call to > mmu_notifier_invalidate_range_start/ > end. > > Note that because we can not presume the

Re: [PATCH 02/13] mm/rmap: update to new mmu_notifier semantic

2017-08-30 Thread Andrea Arcangeli
Hello Jerome, On Tue, Aug 29, 2017 at 07:54:36PM -0400, Jerome Glisse wrote: > Replacing all mmu_notifier_invalidate_page() by mmu_notifier_invalidat_range() > and making sure it is bracketed by call to > mmu_notifier_invalidate_range_start/ > end. > > Note that because we can not presume the

Re: kvm splat in mmu_spte_clear_track_bits

2017-08-29 Thread Andrea Arcangeli
Hello Linus, On Tue, Aug 29, 2017 at 12:38:43PM -0700, Linus Torvalds wrote: > On Tue, Aug 29, 2017 at 12:13 PM, Jerome Glisse wrote: > > > > Yes and i am fine with page traversal being under spinlock and not > > being able to sleep during that. I agree doing otherwise would

Re: kvm splat in mmu_spte_clear_track_bits

2017-08-29 Thread Andrea Arcangeli
Hello Linus, On Tue, Aug 29, 2017 at 12:38:43PM -0700, Linus Torvalds wrote: > On Tue, Aug 29, 2017 at 12:13 PM, Jerome Glisse wrote: > > > > Yes and i am fine with page traversal being under spinlock and not > > being able to sleep during that. I agree doing otherwise would be > > insane. It is

Re: kvm splat in mmu_spte_clear_track_bits

2017-08-29 Thread Andrea Arcangeli
Hello, On Tue, Aug 29, 2017 at 02:59:23PM +0200, Adam Borowski wrote: > On Tue, Aug 29, 2017 at 02:45:41PM +0200, Takashi Iwai wrote: > > [Put more people to Cc, sorry for growing too much...] > > We're all interested in 4.13.0 not crashing on us, so that's ok. > > > On Tue, 29 Aug 2017

Re: kvm splat in mmu_spte_clear_track_bits

2017-08-29 Thread Andrea Arcangeli
Hello, On Tue, Aug 29, 2017 at 02:59:23PM +0200, Adam Borowski wrote: > On Tue, Aug 29, 2017 at 02:45:41PM +0200, Takashi Iwai wrote: > > [Put more people to Cc, sorry for growing too much...] > > We're all interested in 4.13.0 not crashing on us, so that's ok. > > > On Tue, 29 Aug 2017

Re: [PATCH 2/2] mm, oom: fix potential data corruption when oom_reaper races with writer

2017-08-11 Thread Andrea Arcangeli
On Fri, Aug 11, 2017 at 12:22:56PM +0200, Andrea Arcangeli wrote: > disk block? This would happen on ext4 as well if mounted with -o > journal=data instead of -o journal=ordered in fact, perhaps you simply Oops above I meant journal=writeback, journal=data is even stronger than journal=o

Re: [PATCH 2/2] mm, oom: fix potential data corruption when oom_reaper races with writer

2017-08-11 Thread Andrea Arcangeli
On Fri, Aug 11, 2017 at 12:22:56PM +0200, Andrea Arcangeli wrote: > disk block? This would happen on ext4 as well if mounted with -o > journal=data instead of -o journal=ordered in fact, perhaps you simply Oops above I meant journal=writeback, journal=data is even stronger than journal=o

Re: [PATCH 2/2] mm, oom: fix potential data corruption when oom_reaper races with writer

2017-08-11 Thread Andrea Arcangeli
On Fri, Aug 11, 2017 at 04:54:36PM +0900, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Fri 11-08-17 11:28:52, Tetsuo Handa wrote: > > > Michal Hocko wrote: > > > > +/* > > > > + * Checks whether a page fault on the given mm is still reliable. > > > > + * This is no longer true if the oom

Re: [PATCH 2/2] mm, oom: fix potential data corruption when oom_reaper races with writer

2017-08-11 Thread Andrea Arcangeli
On Fri, Aug 11, 2017 at 04:54:36PM +0900, Tetsuo Handa wrote: > Michal Hocko wrote: > > On Fri 11-08-17 11:28:52, Tetsuo Handa wrote: > > > Michal Hocko wrote: > > > > +/* > > > > + * Checks whether a page fault on the given mm is still reliable. > > > > + * This is no longer true if the oom

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-08-10 Thread Andrea Arcangeli
On Thu, Aug 10, 2017 at 10:16:32AM +0200, Michal Hocko wrote: > Andrea has proposed and alternative solution [4] which should be > equivalent functionally similar to {ksm,khugepaged}_exit. I have to > confess I really don't like that approach but I can live with it if > that is a preferred way (to

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-08-10 Thread Andrea Arcangeli
On Thu, Aug 10, 2017 at 10:16:32AM +0200, Michal Hocko wrote: > Andrea has proposed and alternative solution [4] which should be > equivalent functionally similar to {ksm,khugepaged}_exit. I have to > confess I really don't like that approach but I can live with it if > that is a preferred way (to

Re: [PATCH 2/2] mm, oom: fix potential data corruption when oom_reaper races with writer

2017-08-09 Thread Andrea Arcangeli
On Wed, Aug 09, 2017 at 08:35:36AM +0900, Tetsuo Handa wrote: > I don't think so. We spent a lot of time in order to remove possible locations > which can lead to failing to invoke the OOM killer when out_of_memory() is > called. It's not clear the connection between failing to invoke the OOM

Re: [PATCH 2/2] mm, oom: fix potential data corruption when oom_reaper races with writer

2017-08-09 Thread Andrea Arcangeli
On Wed, Aug 09, 2017 at 08:35:36AM +0900, Tetsuo Handa wrote: > I don't think so. We spent a lot of time in order to remove possible locations > which can lead to failing to invoke the OOM killer when out_of_memory() is > called. It's not clear the connection between failing to invoke the OOM

Re: [PATCH 2/2] mm, oom: fix potential data corruption when oom_reaper races with writer

2017-08-08 Thread Andrea Arcangeli
Hello, On Mon, Aug 07, 2017 at 01:38:39PM +0200, Michal Hocko wrote: > From: Michal Hocko > > Wenwei Tao has noticed that our current assumption that the oom victim > is dying and never doing any visible changes after it dies, and so the > oom_reaper can tear it down, is not

Re: [PATCH 2/2] mm, oom: fix potential data corruption when oom_reaper races with writer

2017-08-08 Thread Andrea Arcangeli
Hello, On Mon, Aug 07, 2017 at 01:38:39PM +0200, Michal Hocko wrote: > From: Michal Hocko > > Wenwei Tao has noticed that our current assumption that the oom victim > is dying and never doing any visible changes after it dies, and so the > oom_reaper can tear it down, is not entirely true. > >

Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

2017-08-03 Thread Andrea Arcangeli
On Thu, Aug 03, 2017 at 08:24:43PM +0300, Mike Rapoport wrote: > Now, seriously, I believe there are not many users of non-cooperative uffd > if at all and it is very unlikely anybody has it in production. > > I'll send a patch with s/ENOSPC/ESRCH in the next few days. Ok. Some more thought on

Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

2017-08-03 Thread Andrea Arcangeli
On Thu, Aug 03, 2017 at 08:24:43PM +0300, Mike Rapoport wrote: > Now, seriously, I believe there are not many users of non-cooperative uffd > if at all and it is very unlikely anybody has it in production. > > I'll send a patch with s/ENOSPC/ESRCH in the next few days. Ok. Some more thought on

Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

2017-08-02 Thread Andrea Arcangeli
On Wed, Aug 02, 2017 at 06:22:49PM +0200, Michal Hocko wrote: > ESRCH refers to "no such process". Strictly speaking userfaultfd code is > about a mm which is gone but that is a mere detail. In fact the owner of Well this whole issue about which retval, is about a mere detail in the first place,

Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

2017-08-02 Thread Andrea Arcangeli
On Wed, Aug 02, 2017 at 06:22:49PM +0200, Michal Hocko wrote: > ESRCH refers to "no such process". Strictly speaking userfaultfd code is > about a mm which is gone but that is a mere detail. In fact the owner of Well this whole issue about which retval, is about a mere detail in the first place,

Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

2017-08-02 Thread Andrea Arcangeli
On Wed, Aug 02, 2017 at 03:34:41PM +0300, Mike Rapoport wrote: > I surely can take care of CRIU, but I don't know if QEMU or certain > database application that uses userfaultfd rely on this API, not mentioning > there maybe other unknown users. > > Andrea, what do you think? The manpage would

Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

2017-08-02 Thread Andrea Arcangeli
On Wed, Aug 02, 2017 at 03:34:41PM +0300, Mike Rapoport wrote: > I surely can take care of CRIU, but I don't know if QEMU or certain > database application that uses userfaultfd rely on this API, not mentioning > there maybe other unknown users. > > Andrea, what do you think? The manpage would

Re: [PATCH v2 4/4] mm: fix KSM data corruption

2017-08-01 Thread Andrea Arcangeli
[1] http://lkml.kernel.org/r/bd3a0ebe-ecf4-41d4-87fa-c755ea9ab...@gmail.com > > Note: > I failed to reproduce this problem through Nadav's test program which > need to tune timing in my system speed so didn't confirm it work. > Nadav, Could you test this patch on your test machine? Reviewed-by: Andrea Arcangeli <aarca...@redhat.com>

Re: [PATCH v2 4/4] mm: fix KSM data corruption

2017-08-01 Thread Andrea Arcangeli
[1] http://lkml.kernel.org/r/bd3a0ebe-ecf4-41d4-87fa-c755ea9ab...@gmail.com > > Note: > I failed to reproduce this problem through Nadav's test program which > need to tune timing in my system speed so didn't confirm it work. > Nadav, Could you test this patch on your test machine? Reviewed-by: Andrea Arcangeli

Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

2017-07-31 Thread Andrea Arcangeli
On Mon, Jul 31, 2017 at 02:22:04PM +0200, Michal Hocko wrote: > On Thu 27-07-17 09:26:59, Mike Rapoport wrote: > > In the non-cooperative userfaultfd case, the process exit may race with > > outstanding mcopy_atomic called by the uffd monitor. Returning -ENOSPC > > instead of -EINVAL when mm is

Re: [PATCH] userfaultfd_zeropage: return -ENOSPC in case mm has gone

2017-07-31 Thread Andrea Arcangeli
On Mon, Jul 31, 2017 at 02:22:04PM +0200, Michal Hocko wrote: > On Thu 27-07-17 09:26:59, Mike Rapoport wrote: > > In the non-cooperative userfaultfd case, the process exit may race with > > outstanding mcopy_atomic called by the uffd monitor. Returning -ENOSPC > > instead of -EINVAL when mm is

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-27 Thread Andrea Arcangeli
On Thu, Jul 27, 2017 at 08:50:24AM +0200, Michal Hocko wrote: > Yes this will work and it won't depend on the oom_lock. But isn't it > just more ugly than simply doing > > if (tsk_is_oom_victim) { > down_write(>mmap_sem); > locked = true; > } >

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-27 Thread Andrea Arcangeli
On Thu, Jul 27, 2017 at 08:50:24AM +0200, Michal Hocko wrote: > Yes this will work and it won't depend on the oom_lock. But isn't it > just more ugly than simply doing > > if (tsk_is_oom_victim) { > down_write(>mmap_sem); > locked = true; > } >

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-26 Thread Andrea Arcangeli
On Wed, Jul 26, 2017 at 06:29:12PM +0200, Andrea Arcangeli wrote: > From 3d9001490ee1a71f39c7bfaf19e96821f9d3ff16 Mon Sep 17 00:00:00 2001 > From: Andrea Arcangeli <aarca...@redhat.com> > Date: Tue, 25 Jul 2017 20:02:27 +0200 > Subject: [PATCH 1/1] mm: oom: let oom_reap_task an

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-26 Thread Andrea Arcangeli
On Wed, Jul 26, 2017 at 06:29:12PM +0200, Andrea Arcangeli wrote: > From 3d9001490ee1a71f39c7bfaf19e96821f9d3ff16 Mon Sep 17 00:00:00 2001 > From: Andrea Arcangeli > Date: Tue, 25 Jul 2017 20:02:27 +0200 > Subject: [PATCH 1/1] mm: oom: let oom_reap_task and exit_mmap to run >

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-26 Thread Andrea Arcangeli
On Wed, Jul 26, 2017 at 07:45:33AM +0200, Michal Hocko wrote: > Yes, exit_aio is the only blocking call I know of currently. But I would > like this to be as robust as possible and so I do not want to rely on > the current implementation. This can change in future and I can > guarantee that nobody

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-26 Thread Andrea Arcangeli
On Wed, Jul 26, 2017 at 07:45:33AM +0200, Michal Hocko wrote: > Yes, exit_aio is the only blocking call I know of currently. But I would > like this to be as robust as possible and so I do not want to rely on > the current implementation. This can change in future and I can > guarantee that nobody

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-26 Thread Andrea Arcangeli
On Wed, Jul 26, 2017 at 07:45:57AM +0200, Michal Hocko wrote: > On Tue 25-07-17 21:19:52, Andrea Arcangeli wrote: > > On Tue, Jul 25, 2017 at 06:04:00PM +0200, Michal Hocko wrote: > > > - down_write(>mmap_sem); > > > + if (tsk_is_oom_victim(current)) > &

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-26 Thread Andrea Arcangeli
On Wed, Jul 26, 2017 at 07:45:57AM +0200, Michal Hocko wrote: > On Tue 25-07-17 21:19:52, Andrea Arcangeli wrote: > > On Tue, Jul 25, 2017 at 06:04:00PM +0200, Michal Hocko wrote: > > > - down_write(>mmap_sem); > > > + if (tsk_is_oom_victim(current)) > &

Re: [RESEND PATCH 1/2] userfaultfd: Add feature to request for a signal delivery

2017-07-26 Thread Andrea Arcangeli
; leading to this proposal as suggested by Andrea. > > http://www.spinics.net/lists/linux-mm/msg129224.html > > Signed-off-by: Prakash Sangappa <prakash.sanga...@oracle.com> > --- > fs/userfaultfd.c |3 +++ > include/uapi/linux/userfaultfd.h | 10 +- > 2 files changed, 12 insertions(+), 1 deletions(-) Reviewed-by: Andrea Arcangeli <aarca...@redhat.com>

Re: [RESEND PATCH 1/2] userfaultfd: Add feature to request for a signal delivery

2017-07-26 Thread Andrea Arcangeli
; leading to this proposal as suggested by Andrea. > > http://www.spinics.net/lists/linux-mm/msg129224.html > > Signed-off-by: Prakash Sangappa > --- > fs/userfaultfd.c |3 +++ > include/uapi/linux/userfaultfd.h | 10 +- > 2 files changed, 12 insertions(+), 1 deletions(-) Reviewed-by: Andrea Arcangeli

Re: [RESEND PATCH 2/2] userfaultfd: selftest: Add tests for UFFD_FREATURE_SIGBUS

2017-07-26 Thread Andrea Arcangeli
On Tue, Jul 25, 2017 at 12:47:42AM -0400, Prakash Sangappa wrote: > Signed-off-by: Prakash Sangappa > --- > tools/testing/selftests/vm/userfaultfd.c | 121 > +- > 1 files changed, 118 insertions(+), 3 deletions(-) Like Mike said, some

Re: [RESEND PATCH 2/2] userfaultfd: selftest: Add tests for UFFD_FREATURE_SIGBUS

2017-07-26 Thread Andrea Arcangeli
On Tue, Jul 25, 2017 at 12:47:42AM -0400, Prakash Sangappa wrote: > Signed-off-by: Prakash Sangappa > --- > tools/testing/selftests/vm/userfaultfd.c | 121 > +- > 1 files changed, 118 insertions(+), 3 deletions(-) Like Mike said, some comment about the test would

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-25 Thread Andrea Arcangeli
On Tue, Jul 25, 2017 at 06:04:00PM +0200, Michal Hocko wrote: > - down_write(>mmap_sem); > + if (tsk_is_oom_victim(current)) > + down_write(>mmap_sem); > free_pgtables(, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); > tlb_finish_mmu(, 0, -1); > > @@ -3012,7

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-25 Thread Andrea Arcangeli
On Tue, Jul 25, 2017 at 06:04:00PM +0200, Michal Hocko wrote: > - down_write(>mmap_sem); > + if (tsk_is_oom_victim(current)) > + down_write(>mmap_sem); > free_pgtables(, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING); > tlb_finish_mmu(, 0, -1); > > @@ -3012,7

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-25 Thread Andrea Arcangeli
urther adjustment as the bit isn't used only for the test_and_set_bit locking, but I didn't see much issues with other set_bit/test_bit. >From f414244480fdc1f771b3148feb3fac77ec813e4c Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli <aarca...@redhat.com> Date: Tue, 25 Jul 2017 20:02:27 +0

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-25 Thread Andrea Arcangeli
urther adjustment as the bit isn't used only for the test_and_set_bit locking, but I didn't see much issues with other set_bit/test_bit. >From f414244480fdc1f771b3148feb3fac77ec813e4c Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli Date: Tue, 25 Jul 2017 20:02:27 +0200 Subject: [PATCH 1/1] mm: oom:

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-25 Thread Andrea Arcangeli
On Mon, Jul 24, 2017 at 09:23:32AM +0200, Michal Hocko wrote: > From: Michal Hocko > > David has noticed that the oom killer might kill additional tasks while > the exiting oom victim hasn't terminated yet because the oom_reaper marks > the curent victim MMF_OOM_SKIP too early

Re: [PATCH] mm, oom: allow oom reaper to race with exit_mmap

2017-07-25 Thread Andrea Arcangeli
On Mon, Jul 24, 2017 at 09:23:32AM +0200, Michal Hocko wrote: > From: Michal Hocko > > David has noticed that the oom killer might kill additional tasks while > the exiting oom victim hasn't terminated yet because the oom_reaper marks > the curent victim MMF_OOM_SKIP too early when mm->mm_users

Re: mm, something wrong in page_lock_anon_vma_read()?

2017-07-20 Thread Andrea Arcangeli
On Thu, Jul 20, 2017 at 02:58:35PM +0200, Andrea Arcangeli wrote: > but if zap_pte in a fremap fails to drop the anon page that was under > memory migration/compaction the exact same thing will happen. Either ... except it is ok to clear a migration entry, it will be migration that wil

Re: mm, something wrong in page_lock_anon_vma_read()?

2017-07-20 Thread Andrea Arcangeli
On Thu, Jul 20, 2017 at 02:58:35PM +0200, Andrea Arcangeli wrote: > but if zap_pte in a fremap fails to drop the anon page that was under > memory migration/compaction the exact same thing will happen. Either ... except it is ok to clear a migration entry, it will be migration that wil

Re: mm, something wrong in page_lock_anon_vma_read()?

2017-07-20 Thread Andrea Arcangeli
On Wed, Jul 19, 2017 at 05:59:01PM +0800, Xishi Qiu wrote: > I find two patches from upstream. > 887843961c4b4681ee993c36d4997bf4b4aa8253 Do you use the remap_file_pages syscall? Such syscall has been dropped upstream so very few apps should possibly try to use it on 64bit archs. It would also

Re: mm, something wrong in page_lock_anon_vma_read()?

2017-07-20 Thread Andrea Arcangeli
On Wed, Jul 19, 2017 at 05:59:01PM +0800, Xishi Qiu wrote: > I find two patches from upstream. > 887843961c4b4681ee993c36d4997bf4b4aa8253 Do you use the remap_file_pages syscall? Such syscall has been dropped upstream so very few apps should possibly try to use it on 64bit archs. It would also

Re: [PATCH v2] KVM: arm/arm64: Handle hva aging while destroying the vm

2017-07-17 Thread Andrea Arcangeli
On Mon, Jul 17, 2017 at 04:45:10PM +0200, Christoffer Dall wrote: > I would also very much like to get to the bottom of this, and at the > very least try to get a valid explanation as to how a thread can be > *running* for a process where there are zero references to the struct > mm? A thread

Re: [PATCH v2] KVM: arm/arm64: Handle hva aging while destroying the vm

2017-07-17 Thread Andrea Arcangeli
On Mon, Jul 17, 2017 at 04:45:10PM +0200, Christoffer Dall wrote: > I would also very much like to get to the bottom of this, and at the > very least try to get a valid explanation as to how a thread can be > *running* for a process where there are zero references to the struct > mm? A thread

Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality

2017-07-13 Thread Andrea Arcangeli
On Thu, Jul 13, 2017 at 11:11:37AM -0700, Mike Kravetz wrote: > Here is my understanding of how things work for old_len == 0 of anon > mappings: > - shared mappings > - New vma is created at new virtual address > - vma refers to the same underlying object/pages as old vma > -

Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality

2017-07-13 Thread Andrea Arcangeli
On Thu, Jul 13, 2017 at 11:11:37AM -0700, Mike Kravetz wrote: > Here is my understanding of how things work for old_len == 0 of anon > mappings: > - shared mappings > - New vma is created at new virtual address > - vma refers to the same underlying object/pages as old vma > -

Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality

2017-07-13 Thread Andrea Arcangeli
On Thu, Jul 13, 2017 at 09:01:54AM -0700, Mike Kravetz wrote: > Sent a patch (in separate e-mail thread) to return EINVAL for private > mappings. The way old_len == 0 behaves for MAP_PRIVATE seems more sane to me than the alternative of copying pagetables for anon pages (as behaving the way that

Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality

2017-07-13 Thread Andrea Arcangeli
On Thu, Jul 13, 2017 at 09:01:54AM -0700, Mike Kravetz wrote: > Sent a patch (in separate e-mail thread) to return EINVAL for private > mappings. The way old_len == 0 behaves for MAP_PRIVATE seems more sane to me than the alternative of copying pagetables for anon pages (as behaving the way that

Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality

2017-07-11 Thread Andrea Arcangeli
On Tue, Jul 11, 2017 at 02:57:38PM -0700, Mike Kravetz wrote: > Well, the JVM has had a config option for the use of hugetlbfs for quite > some time. I assume they have already had to deal with these issues. Yes, the config tweak exists well before THP existed but in production I know nobody who

Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality

2017-07-11 Thread Andrea Arcangeli
On Tue, Jul 11, 2017 at 02:57:38PM -0700, Mike Kravetz wrote: > Well, the JVM has had a config option for the use of hugetlbfs for quite > some time. I assume they have already had to deal with these issues. Yes, the config tweak exists well before THP existed but in production I know nobody who

Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality

2017-07-11 Thread Andrea Arcangeli
On Tue, Jul 11, 2017 at 11:23:19AM -0700, Mike Kravetz wrote: > I was surprised as well when a JVM developer pointed this out. > > From the old e-mail thread, here is original use case: > shmget(IPC_PRIVATE, 31498240, 0x1c0|0600) = 11337732 > shmat(11337732, 0, 0) = 0x40299000 >

Re: [RFC PATCH 1/1] mm/mremap: add MREMAP_MIRROR flag for existing mirroring functionality

2017-07-11 Thread Andrea Arcangeli
On Tue, Jul 11, 2017 at 11:23:19AM -0700, Mike Kravetz wrote: > I was surprised as well when a JVM developer pointed this out. > > From the old e-mail thread, here is original use case: > shmget(IPC_PRIVATE, 31498240, 0x1c0|0600) = 11337732 > shmat(11337732, 0, 0) = 0x40299000 >

Re: [PATCH v2] KVM: arm/arm64: Handle hva aging while destroying the vm

2017-07-06 Thread Andrea Arcangeli
Hello, On Thu, Jul 06, 2017 at 09:45:13AM +0200, Christoffer Dall wrote: > Let's look at the callers to stage2_get_pmd, which is the only caller of > stage2_get_pud, where the problem was observed: > > user_mem_abort >-> stage2_set_pmd_huge > -> stage2_get_pmd > > user_mem_abort

Re: [PATCH v2] KVM: arm/arm64: Handle hva aging while destroying the vm

2017-07-06 Thread Andrea Arcangeli
Hello, On Thu, Jul 06, 2017 at 09:45:13AM +0200, Christoffer Dall wrote: > Let's look at the callers to stage2_get_pmd, which is the only caller of > stage2_get_pud, where the problem was observed: > > user_mem_abort >-> stage2_set_pmd_huge > -> stage2_get_pmd > > user_mem_abort

Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery

2017-07-04 Thread Andrea Arcangeli
On Fri, Jun 30, 2017 at 05:55:08PM -0700, prakash sangappa wrote: > Interesting that UFFDIO_COPY is faster then fallocate(). In the DB use case > the page does not need to be allocated at the time a process trips on > the hugetlbfs > file hole and receives SIGBUS. fallocate() is called on the

Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery

2017-07-04 Thread Andrea Arcangeli
On Fri, Jun 30, 2017 at 05:55:08PM -0700, prakash sangappa wrote: > Interesting that UFFDIO_COPY is faster then fallocate(). In the DB use case > the page does not need to be allocated at the time a process trips on > the hugetlbfs > file hole and receives SIGBUS. fallocate() is called on the

Re: [PATCH] KVM: arm/arm64: Handle hva aging while destroying the vm

2017-07-03 Thread Andrea Arcangeli
Hello, On Mon, Jul 03, 2017 at 10:48:03AM +0200, Alexander Graf wrote: > On 07/03/2017 10:03 AM, Christoffer Dall wrote: > > Hi Alex, > > > > On Fri, Jun 23, 2017 at 05:21:59PM +0200, Alexander Graf wrote: > >> If we want to age an HVA while the VM is getting destroyed, we have a > >> tiny race

Re: [PATCH] KVM: arm/arm64: Handle hva aging while destroying the vm

2017-07-03 Thread Andrea Arcangeli
Hello, On Mon, Jul 03, 2017 at 10:48:03AM +0200, Alexander Graf wrote: > On 07/03/2017 10:03 AM, Christoffer Dall wrote: > > Hi Alex, > > > > On Fri, Jun 23, 2017 at 05:21:59PM +0200, Alexander Graf wrote: > >> If we want to age an HVA while the VM is getting destroyed, we have a > >> tiny race

Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery

2017-06-30 Thread Andrea Arcangeli
On Fri, Jun 30, 2017 at 11:47:35AM +0200, Michal Hocko wrote: > [CC John, the thread started > http://lkml.kernel.org/r/9363561f-a9cd-7ab6-9c11-ab9a99dc8...@oracle.com] > > On Thu 29-06-17 14:41:22, prakash.sangappa wrote: > > > > > > On 06/29/2017 01:09 AM, Michal Hocko wrote: > > >On Wed

Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery

2017-06-30 Thread Andrea Arcangeli
On Fri, Jun 30, 2017 at 11:47:35AM +0200, Michal Hocko wrote: > [CC John, the thread started > http://lkml.kernel.org/r/9363561f-a9cd-7ab6-9c11-ab9a99dc8...@oracle.com] > > On Thu 29-06-17 14:41:22, prakash.sangappa wrote: > > > > > > On 06/29/2017 01:09 AM, Michal Hocko wrote: > > >On Wed

Re: [PATCHv2 3/3] mm: Use updated pmdp_invalidate() inteface to track dirty/accessed bits

2017-06-16 Thread Andrea Arcangeli
Hello Minchan, On Fri, Jun 16, 2017 at 10:52:09PM +0900, Minchan Kim wrote: > > > > @@ -1995,8 +1984,6 @@ static void __split_huge_pmd_locked(struct > > > > vm_area_struct *vma, pmd_t *pmd, > > > > if (soft_dirty) > > > > entry =

Re: [PATCHv2 3/3] mm: Use updated pmdp_invalidate() inteface to track dirty/accessed bits

2017-06-16 Thread Andrea Arcangeli
Hello Minchan, On Fri, Jun 16, 2017 at 10:52:09PM +0900, Minchan Kim wrote: > > > > @@ -1995,8 +1984,6 @@ static void __split_huge_pmd_locked(struct > > > > vm_area_struct *vma, pmd_t *pmd, > > > > if (soft_dirty) > > > > entry =

Re: [PATCHv2 2/3] mm: Do not loose dirty and access bits in pmdp_invalidate()

2017-06-16 Thread Andrea Arcangeli
On Thu, Jun 15, 2017 at 05:52:23PM +0300, Kirill A. Shutemov wrote: > -void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, > +pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, >pmd_t *pmdp) > { > - pmd_t entry = *pmdp; > -

Re: [PATCHv2 2/3] mm: Do not loose dirty and access bits in pmdp_invalidate()

2017-06-16 Thread Andrea Arcangeli
On Thu, Jun 15, 2017 at 05:52:23PM +0300, Kirill A. Shutemov wrote: > -void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, > +pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, >pmd_t *pmdp) > { > - pmd_t entry = *pmdp; > -

Re: [PATCHv2 1/3] x86/mm: Provide pmdp_establish() helper

2017-06-16 Thread Andrea Arcangeli
Hello Krill, On Thu, Jun 15, 2017 at 05:52:22PM +0300, Kirill A. Shutemov wrote: > +static inline pmd_t pmdp_establish(pmd_t *pmdp, pmd_t pmd) > +{ > + pmd_t old; > + > + /* > + * We cannot assume what is value of pmd here, so there's no easy way > + * to set if half by half. We

Re: [PATCHv2 1/3] x86/mm: Provide pmdp_establish() helper

2017-06-16 Thread Andrea Arcangeli
Hello Krill, On Thu, Jun 15, 2017 at 05:52:22PM +0300, Kirill A. Shutemov wrote: > +static inline pmd_t pmdp_establish(pmd_t *pmdp, pmd_t pmd) > +{ > + pmd_t old; > + > + /* > + * We cannot assume what is value of pmd here, so there's no easy way > + * to set if half by half. We

Re: [PATCH RFC] hugetlbfs 'noautofill' mount option

2017-06-16 Thread Andrea Arcangeli
Hello Prakash, On Tue, May 09, 2017 at 01:59:34PM -0700, Prakash Sangappa wrote: > > > On 5/9/17 1:58 AM, Christoph Hellwig wrote: > > On Mon, May 08, 2017 at 03:12:42PM -0700, prakash.sangappa wrote: > >> Regarding #3 as a general feature, do we want to > >> consider this and the complexity

Re: [PATCH RFC] hugetlbfs 'noautofill' mount option

2017-06-16 Thread Andrea Arcangeli
Hello Prakash, On Tue, May 09, 2017 at 01:59:34PM -0700, Prakash Sangappa wrote: > > > On 5/9/17 1:58 AM, Christoph Hellwig wrote: > > On Mon, May 08, 2017 at 03:12:42PM -0700, prakash.sangappa wrote: > >> Regarding #3 as a general feature, do we want to > >> consider this and the complexity

Re: [PATCH 1/3] x86/mm: Provide pmdp_mknotpresent() helper

2017-06-14 Thread Andrea Arcangeli
On Wed, Jun 14, 2017 at 04:51:41PM +0300, Kirill A. Shutemov wrote: > We need an atomic way to make pmd page table entry not-present. > This is required to implement pmdp_invalidate() that doesn't loose dirty > or access bits. What does the cmpxchg() loop achieves compared to xchg() and then

Re: [PATCH 1/3] x86/mm: Provide pmdp_mknotpresent() helper

2017-06-14 Thread Andrea Arcangeli
On Wed, Jun 14, 2017 at 04:51:41PM +0300, Kirill A. Shutemov wrote: > We need an atomic way to make pmd page table entry not-present. > This is required to implement pmdp_invalidate() that doesn't loose dirty > or access bits. What does the cmpxchg() loop achieves compared to xchg() and then

Re: [PATCH 3/3] mm, thp: Do not loose dirty bit in __split_huge_pmd_locked()

2017-06-14 Thread Andrea Arcangeli
Hello, On Wed, Jun 14, 2017 at 04:18:57PM +0200, Martin Schwidefsky wrote: > Could we change pmdp_invalidate to make it return the old pmd entry? That to me seems the simplest fix to avoid losing the dirty bit. I earlier suggested to replace pmdp_invalidate with something like old_pmd =

Re: [PATCH 3/3] mm, thp: Do not loose dirty bit in __split_huge_pmd_locked()

2017-06-14 Thread Andrea Arcangeli
Hello, On Wed, Jun 14, 2017 at 04:18:57PM +0200, Martin Schwidefsky wrote: > Could we change pmdp_invalidate to make it return the old pmd entry? That to me seems the simplest fix to avoid losing the dirty bit. I earlier suggested to replace pmdp_invalidate with something like old_pmd =

Re: [PATCH] mm: introduce MADV_CLR_HUGEPAGE

2017-06-01 Thread Andrea Arcangeli
On Thu, Jun 01, 2017 at 10:09:09AM +0200, Michal Hocko wrote: > That is a bit surprising. I didn't think that the userfault syscall > (ioctl) can be faster than a regular #PF but considering that > __mcopy_atomic bypasses the page fault path and it can be optimized for > the anon case suggests

Re: [PATCH] mm: introduce MADV_CLR_HUGEPAGE

2017-06-01 Thread Andrea Arcangeli
On Thu, Jun 01, 2017 at 10:09:09AM +0200, Michal Hocko wrote: > That is a bit surprising. I didn't think that the userfault syscall > (ioctl) can be faster than a regular #PF but considering that > __mcopy_atomic bypasses the page fault path and it can be optimized for > the anon case suggests

Re: [PATCH] mm: introduce MADV_CLR_HUGEPAGE

2017-05-31 Thread Andrea Arcangeli
On Wed, May 31, 2017 at 04:32:17PM +0200, Michal Hocko wrote: > I would assume such a patch would be backported to stable trees because > to me it sounds like the current semantic is simply broken and needs > fixing anyway but it shouldn't be much different from any other bugs. So the program

Re: [PATCH] mm: introduce MADV_CLR_HUGEPAGE

2017-05-31 Thread Andrea Arcangeli
On Wed, May 31, 2017 at 04:32:17PM +0200, Michal Hocko wrote: > I would assume such a patch would be backported to stable trees because > to me it sounds like the current semantic is simply broken and needs > fixing anyway but it shouldn't be much different from any other bugs. So the program

Re: [PATCH] mm: introduce MADV_CLR_HUGEPAGE

2017-05-31 Thread Andrea Arcangeli
On Wed, May 31, 2017 at 03:39:22PM +0300, Mike Rapoport wrote: > For the CRIU usecase, disabling THP for a while and re-enabling it > back will do the trick, provided VMAs flags are not affected, like > in the patch you've sent. Moreover, we may even get away with Are you going to check uname -r

<    1   2   3   4   5   6   7   8   9   10   >