On Thu, Aug 31, 2017 at 05:17:26PM -0400, Jerome Glisse wrote:
> + if (start && end) {
"&& end" can be dropped from above and the other places but it can be
optimized later..
Thanks,
Andrea
date to new mmu_notifier semantic
> xen/gntdev: update to new mmu_notifier semantic
> KVM: update to new mmu_notifier semantic
> mm/mmu_notifier: kill invalidate_page
Reviewed-by: Andrea Arcangeli <aarca...@redhat.com>
date to new mmu_notifier semantic
> xen/gntdev: update to new mmu_notifier semantic
> KVM: update to new mmu_notifier semantic
> mm/mmu_notifier: kill invalidate_page
Reviewed-by: Andrea Arcangeli
On Wed, Aug 30, 2017 at 08:47:19PM -0400, Jerome Glisse wrote:
> On Wed, Aug 30, 2017 at 04:25:54PM -0700, Nadav Amit wrote:
> > For both CoW and KSM, the correctness is maintained by calling
> > ptep_clear_flush_notify(). If you defer the secondary MMU invalidation
> > (i.e., replacing
On Wed, Aug 30, 2017 at 08:47:19PM -0400, Jerome Glisse wrote:
> On Wed, Aug 30, 2017 at 04:25:54PM -0700, Nadav Amit wrote:
> > For both CoW and KSM, the correctness is maintained by calling
> > ptep_clear_flush_notify(). If you defer the secondary MMU invalidation
> > (i.e., replacing
On Wed, Aug 30, 2017 at 02:53:38PM -0700, Linus Torvalds wrote:
> On Wed, Aug 30, 2017 at 9:52 AM, Andrea Arcangeli <aarca...@redhat.com> wrote:
> >
> > I pointed out in earlier email ->invalidate_range can only be
> > implemented (as mutually exclusive alternative t
On Wed, Aug 30, 2017 at 02:53:38PM -0700, Linus Torvalds wrote:
> On Wed, Aug 30, 2017 at 9:52 AM, Andrea Arcangeli wrote:
> >
> > I pointed out in earlier email ->invalidate_range can only be
> > implemented (as mutually exclusive alternative to
> > ->invalid
On Wed, Aug 30, 2017 at 04:45:49PM -0400, Jerome Glisse wrote:
> So i look at both AMD and Intel IOMMU. AMD always flush and current pte value
> do not matter AFAICT (i doubt that hardware rewalk the page table just to
> decide not to flush that would be terribly dumb for hardware engineer to do
>
On Wed, Aug 30, 2017 at 04:45:49PM -0400, Jerome Glisse wrote:
> So i look at both AMD and Intel IOMMU. AMD always flush and current pte value
> do not matter AFAICT (i doubt that hardware rewalk the page table just to
> decide not to flush that would be terribly dumb for hardware engineer to do
>
On Wed, Aug 30, 2017 at 11:00:32AM -0700, Nadav Amit wrote:
> It is not trivial to flush TLBs (primary or secondary) without holding the
> page-table lock, and as we recently encountered this resulted in several
> bugs [1]. The main problem is that even if you perform the TLB flush
> immediately
On Wed, Aug 30, 2017 at 11:00:32AM -0700, Nadav Amit wrote:
> It is not trivial to flush TLBs (primary or secondary) without holding the
> page-table lock, and as we recently encountered this resulted in several
> bugs [1]. The main problem is that even if you perform the TLB flush
> immediately
On Wed, Aug 30, 2017 at 11:40:08AM -0700, Nadav Amit wrote:
> The mmu_notifier users would have to be aware that invalidations may be
> deferred. If they perform their ``invalidations’’ unconditionally, it may be
> ok. If the notifier users avoid invalidations based on the PTE in the
> secondary
On Wed, Aug 30, 2017 at 11:40:08AM -0700, Nadav Amit wrote:
> The mmu_notifier users would have to be aware that invalidations may be
> deferred. If they perform their ``invalidations’’ unconditionally, it may be
> ok. If the notifier users avoid invalidations based on the PTE in the
> secondary
Hello Michal,
On Wed, Aug 30, 2017 at 10:46:00AM +0200, Michal Hocko wrote:
> + * TODO: we really want to get rid of this ugly hack and make sure that
> + * notifiers cannot block for unbounded amount of time and add
> + * mmu_notifier_invalidate_range_{start,end} around
Hello Michal,
On Wed, Aug 30, 2017 at 10:46:00AM +0200, Michal Hocko wrote:
> + * TODO: we really want to get rid of this ugly hack and make sure that
> + * notifiers cannot block for unbounded amount of time and add
> + * mmu_notifier_invalidate_range_{start,end} around
On Tue, Aug 29, 2017 at 07:46:07PM -0700, Nadav Amit wrote:
> Therefore, IIUC, try_to_umap_one() should only call
> mmu_notifier_invalidate_range() after ptep_get_and_clear() and
That would trigger an unnecessarily double call to
->invalidate_range() both from mmu_notifier_invalidate_range()
On Tue, Aug 29, 2017 at 07:46:07PM -0700, Nadav Amit wrote:
> Therefore, IIUC, try_to_umap_one() should only call
> mmu_notifier_invalidate_range() after ptep_get_and_clear() and
That would trigger an unnecessarily double call to
->invalidate_range() both from mmu_notifier_invalidate_range()
Hello Jerome,
On Tue, Aug 29, 2017 at 07:54:36PM -0400, Jerome Glisse wrote:
> Replacing all mmu_notifier_invalidate_page() by mmu_notifier_invalidat_range()
> and making sure it is bracketed by call to
> mmu_notifier_invalidate_range_start/
> end.
>
> Note that because we can not presume the
Hello Jerome,
On Tue, Aug 29, 2017 at 07:54:36PM -0400, Jerome Glisse wrote:
> Replacing all mmu_notifier_invalidate_page() by mmu_notifier_invalidat_range()
> and making sure it is bracketed by call to
> mmu_notifier_invalidate_range_start/
> end.
>
> Note that because we can not presume the
Hello Linus,
On Tue, Aug 29, 2017 at 12:38:43PM -0700, Linus Torvalds wrote:
> On Tue, Aug 29, 2017 at 12:13 PM, Jerome Glisse wrote:
> >
> > Yes and i am fine with page traversal being under spinlock and not
> > being able to sleep during that. I agree doing otherwise would
Hello Linus,
On Tue, Aug 29, 2017 at 12:38:43PM -0700, Linus Torvalds wrote:
> On Tue, Aug 29, 2017 at 12:13 PM, Jerome Glisse wrote:
> >
> > Yes and i am fine with page traversal being under spinlock and not
> > being able to sleep during that. I agree doing otherwise would be
> > insane. It is
Hello,
On Tue, Aug 29, 2017 at 02:59:23PM +0200, Adam Borowski wrote:
> On Tue, Aug 29, 2017 at 02:45:41PM +0200, Takashi Iwai wrote:
> > [Put more people to Cc, sorry for growing too much...]
>
> We're all interested in 4.13.0 not crashing on us, so that's ok.
>
> > On Tue, 29 Aug 2017
Hello,
On Tue, Aug 29, 2017 at 02:59:23PM +0200, Adam Borowski wrote:
> On Tue, Aug 29, 2017 at 02:45:41PM +0200, Takashi Iwai wrote:
> > [Put more people to Cc, sorry for growing too much...]
>
> We're all interested in 4.13.0 not crashing on us, so that's ok.
>
> > On Tue, 29 Aug 2017
On Fri, Aug 11, 2017 at 12:22:56PM +0200, Andrea Arcangeli wrote:
> disk block? This would happen on ext4 as well if mounted with -o
> journal=data instead of -o journal=ordered in fact, perhaps you simply
Oops above I meant journal=writeback, journal=data is even stronger
than journal=o
On Fri, Aug 11, 2017 at 12:22:56PM +0200, Andrea Arcangeli wrote:
> disk block? This would happen on ext4 as well if mounted with -o
> journal=data instead of -o journal=ordered in fact, perhaps you simply
Oops above I meant journal=writeback, journal=data is even stronger
than journal=o
On Fri, Aug 11, 2017 at 04:54:36PM +0900, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Fri 11-08-17 11:28:52, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > +/*
> > > > + * Checks whether a page fault on the given mm is still reliable.
> > > > + * This is no longer true if the oom
On Fri, Aug 11, 2017 at 04:54:36PM +0900, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Fri 11-08-17 11:28:52, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > +/*
> > > > + * Checks whether a page fault on the given mm is still reliable.
> > > > + * This is no longer true if the oom
On Thu, Aug 10, 2017 at 10:16:32AM +0200, Michal Hocko wrote:
> Andrea has proposed and alternative solution [4] which should be
> equivalent functionally similar to {ksm,khugepaged}_exit. I have to
> confess I really don't like that approach but I can live with it if
> that is a preferred way (to
On Thu, Aug 10, 2017 at 10:16:32AM +0200, Michal Hocko wrote:
> Andrea has proposed and alternative solution [4] which should be
> equivalent functionally similar to {ksm,khugepaged}_exit. I have to
> confess I really don't like that approach but I can live with it if
> that is a preferred way (to
On Wed, Aug 09, 2017 at 08:35:36AM +0900, Tetsuo Handa wrote:
> I don't think so. We spent a lot of time in order to remove possible locations
> which can lead to failing to invoke the OOM killer when out_of_memory() is
> called.
It's not clear the connection between failing to invoke the OOM
On Wed, Aug 09, 2017 at 08:35:36AM +0900, Tetsuo Handa wrote:
> I don't think so. We spent a lot of time in order to remove possible locations
> which can lead to failing to invoke the OOM killer when out_of_memory() is
> called.
It's not clear the connection between failing to invoke the OOM
Hello,
On Mon, Aug 07, 2017 at 01:38:39PM +0200, Michal Hocko wrote:
> From: Michal Hocko
>
> Wenwei Tao has noticed that our current assumption that the oom victim
> is dying and never doing any visible changes after it dies, and so the
> oom_reaper can tear it down, is not
Hello,
On Mon, Aug 07, 2017 at 01:38:39PM +0200, Michal Hocko wrote:
> From: Michal Hocko
>
> Wenwei Tao has noticed that our current assumption that the oom victim
> is dying and never doing any visible changes after it dies, and so the
> oom_reaper can tear it down, is not entirely true.
>
>
On Thu, Aug 03, 2017 at 08:24:43PM +0300, Mike Rapoport wrote:
> Now, seriously, I believe there are not many users of non-cooperative uffd
> if at all and it is very unlikely anybody has it in production.
>
> I'll send a patch with s/ENOSPC/ESRCH in the next few days.
Ok.
Some more thought on
On Thu, Aug 03, 2017 at 08:24:43PM +0300, Mike Rapoport wrote:
> Now, seriously, I believe there are not many users of non-cooperative uffd
> if at all and it is very unlikely anybody has it in production.
>
> I'll send a patch with s/ENOSPC/ESRCH in the next few days.
Ok.
Some more thought on
On Wed, Aug 02, 2017 at 06:22:49PM +0200, Michal Hocko wrote:
> ESRCH refers to "no such process". Strictly speaking userfaultfd code is
> about a mm which is gone but that is a mere detail. In fact the owner of
Well this whole issue about which retval, is about a mere detail in
the first place,
On Wed, Aug 02, 2017 at 06:22:49PM +0200, Michal Hocko wrote:
> ESRCH refers to "no such process". Strictly speaking userfaultfd code is
> about a mm which is gone but that is a mere detail. In fact the owner of
Well this whole issue about which retval, is about a mere detail in
the first place,
On Wed, Aug 02, 2017 at 03:34:41PM +0300, Mike Rapoport wrote:
> I surely can take care of CRIU, but I don't know if QEMU or certain
> database application that uses userfaultfd rely on this API, not mentioning
> there maybe other unknown users.
>
> Andrea, what do you think?
The manpage would
On Wed, Aug 02, 2017 at 03:34:41PM +0300, Mike Rapoport wrote:
> I surely can take care of CRIU, but I don't know if QEMU or certain
> database application that uses userfaultfd rely on this API, not mentioning
> there maybe other unknown users.
>
> Andrea, what do you think?
The manpage would
[1] http://lkml.kernel.org/r/bd3a0ebe-ecf4-41d4-87fa-c755ea9ab...@gmail.com
>
> Note:
> I failed to reproduce this problem through Nadav's test program which
> need to tune timing in my system speed so didn't confirm it work.
> Nadav, Could you test this patch on your test machine?
Reviewed-by: Andrea Arcangeli <aarca...@redhat.com>
[1] http://lkml.kernel.org/r/bd3a0ebe-ecf4-41d4-87fa-c755ea9ab...@gmail.com
>
> Note:
> I failed to reproduce this problem through Nadav's test program which
> need to tune timing in my system speed so didn't confirm it work.
> Nadav, Could you test this patch on your test machine?
Reviewed-by: Andrea Arcangeli
On Mon, Jul 31, 2017 at 02:22:04PM +0200, Michal Hocko wrote:
> On Thu 27-07-17 09:26:59, Mike Rapoport wrote:
> > In the non-cooperative userfaultfd case, the process exit may race with
> > outstanding mcopy_atomic called by the uffd monitor. Returning -ENOSPC
> > instead of -EINVAL when mm is
On Mon, Jul 31, 2017 at 02:22:04PM +0200, Michal Hocko wrote:
> On Thu 27-07-17 09:26:59, Mike Rapoport wrote:
> > In the non-cooperative userfaultfd case, the process exit may race with
> > outstanding mcopy_atomic called by the uffd monitor. Returning -ENOSPC
> > instead of -EINVAL when mm is
On Thu, Jul 27, 2017 at 08:50:24AM +0200, Michal Hocko wrote:
> Yes this will work and it won't depend on the oom_lock. But isn't it
> just more ugly than simply doing
>
> if (tsk_is_oom_victim) {
> down_write(>mmap_sem);
> locked = true;
> }
>
On Thu, Jul 27, 2017 at 08:50:24AM +0200, Michal Hocko wrote:
> Yes this will work and it won't depend on the oom_lock. But isn't it
> just more ugly than simply doing
>
> if (tsk_is_oom_victim) {
> down_write(>mmap_sem);
> locked = true;
> }
>
On Wed, Jul 26, 2017 at 06:29:12PM +0200, Andrea Arcangeli wrote:
> From 3d9001490ee1a71f39c7bfaf19e96821f9d3ff16 Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli <aarca...@redhat.com>
> Date: Tue, 25 Jul 2017 20:02:27 +0200
> Subject: [PATCH 1/1] mm: oom: let oom_reap_task an
On Wed, Jul 26, 2017 at 06:29:12PM +0200, Andrea Arcangeli wrote:
> From 3d9001490ee1a71f39c7bfaf19e96821f9d3ff16 Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli
> Date: Tue, 25 Jul 2017 20:02:27 +0200
> Subject: [PATCH 1/1] mm: oom: let oom_reap_task and exit_mmap to run
>
On Wed, Jul 26, 2017 at 07:45:33AM +0200, Michal Hocko wrote:
> Yes, exit_aio is the only blocking call I know of currently. But I would
> like this to be as robust as possible and so I do not want to rely on
> the current implementation. This can change in future and I can
> guarantee that nobody
On Wed, Jul 26, 2017 at 07:45:33AM +0200, Michal Hocko wrote:
> Yes, exit_aio is the only blocking call I know of currently. But I would
> like this to be as robust as possible and so I do not want to rely on
> the current implementation. This can change in future and I can
> guarantee that nobody
On Wed, Jul 26, 2017 at 07:45:57AM +0200, Michal Hocko wrote:
> On Tue 25-07-17 21:19:52, Andrea Arcangeli wrote:
> > On Tue, Jul 25, 2017 at 06:04:00PM +0200, Michal Hocko wrote:
> > > - down_write(>mmap_sem);
> > > + if (tsk_is_oom_victim(current))
> &
On Wed, Jul 26, 2017 at 07:45:57AM +0200, Michal Hocko wrote:
> On Tue 25-07-17 21:19:52, Andrea Arcangeli wrote:
> > On Tue, Jul 25, 2017 at 06:04:00PM +0200, Michal Hocko wrote:
> > > - down_write(>mmap_sem);
> > > + if (tsk_is_oom_victim(current))
> &
; leading to this proposal as suggested by Andrea.
>
> http://www.spinics.net/lists/linux-mm/msg129224.html
>
> Signed-off-by: Prakash Sangappa <prakash.sanga...@oracle.com>
> ---
> fs/userfaultfd.c |3 +++
> include/uapi/linux/userfaultfd.h | 10 +-
> 2 files changed, 12 insertions(+), 1 deletions(-)
Reviewed-by: Andrea Arcangeli <aarca...@redhat.com>
; leading to this proposal as suggested by Andrea.
>
> http://www.spinics.net/lists/linux-mm/msg129224.html
>
> Signed-off-by: Prakash Sangappa
> ---
> fs/userfaultfd.c |3 +++
> include/uapi/linux/userfaultfd.h | 10 +-
> 2 files changed, 12 insertions(+), 1 deletions(-)
Reviewed-by: Andrea Arcangeli
On Tue, Jul 25, 2017 at 12:47:42AM -0400, Prakash Sangappa wrote:
> Signed-off-by: Prakash Sangappa
> ---
> tools/testing/selftests/vm/userfaultfd.c | 121
> +-
> 1 files changed, 118 insertions(+), 3 deletions(-)
Like Mike said, some
On Tue, Jul 25, 2017 at 12:47:42AM -0400, Prakash Sangappa wrote:
> Signed-off-by: Prakash Sangappa
> ---
> tools/testing/selftests/vm/userfaultfd.c | 121
> +-
> 1 files changed, 118 insertions(+), 3 deletions(-)
Like Mike said, some comment about the test would
On Tue, Jul 25, 2017 at 06:04:00PM +0200, Michal Hocko wrote:
> - down_write(>mmap_sem);
> + if (tsk_is_oom_victim(current))
> + down_write(>mmap_sem);
> free_pgtables(, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
> tlb_finish_mmu(, 0, -1);
>
> @@ -3012,7
On Tue, Jul 25, 2017 at 06:04:00PM +0200, Michal Hocko wrote:
> - down_write(>mmap_sem);
> + if (tsk_is_oom_victim(current))
> + down_write(>mmap_sem);
> free_pgtables(, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
> tlb_finish_mmu(, 0, -1);
>
> @@ -3012,7
urther adjustment as
the bit isn't used only for the test_and_set_bit locking, but I didn't
see much issues with other set_bit/test_bit.
>From f414244480fdc1f771b3148feb3fac77ec813e4c Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarca...@redhat.com>
Date: Tue, 25 Jul 2017 20:02:27 +0
urther adjustment as
the bit isn't used only for the test_and_set_bit locking, but I didn't
see much issues with other set_bit/test_bit.
>From f414244480fdc1f771b3148feb3fac77ec813e4c Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli
Date: Tue, 25 Jul 2017 20:02:27 +0200
Subject: [PATCH 1/1] mm: oom:
On Mon, Jul 24, 2017 at 09:23:32AM +0200, Michal Hocko wrote:
> From: Michal Hocko
>
> David has noticed that the oom killer might kill additional tasks while
> the exiting oom victim hasn't terminated yet because the oom_reaper marks
> the curent victim MMF_OOM_SKIP too early
On Mon, Jul 24, 2017 at 09:23:32AM +0200, Michal Hocko wrote:
> From: Michal Hocko
>
> David has noticed that the oom killer might kill additional tasks while
> the exiting oom victim hasn't terminated yet because the oom_reaper marks
> the curent victim MMF_OOM_SKIP too early when mm->mm_users
On Thu, Jul 20, 2017 at 02:58:35PM +0200, Andrea Arcangeli wrote:
> but if zap_pte in a fremap fails to drop the anon page that was under
> memory migration/compaction the exact same thing will happen. Either
... except it is ok to clear a migration entry, it will be migration
that wil
On Thu, Jul 20, 2017 at 02:58:35PM +0200, Andrea Arcangeli wrote:
> but if zap_pte in a fremap fails to drop the anon page that was under
> memory migration/compaction the exact same thing will happen. Either
... except it is ok to clear a migration entry, it will be migration
that wil
On Wed, Jul 19, 2017 at 05:59:01PM +0800, Xishi Qiu wrote:
> I find two patches from upstream.
> 887843961c4b4681ee993c36d4997bf4b4aa8253
Do you use the remap_file_pages syscall? Such syscall has been dropped
upstream so very few apps should possibly try to use it on 64bit
archs.
It would also
On Wed, Jul 19, 2017 at 05:59:01PM +0800, Xishi Qiu wrote:
> I find two patches from upstream.
> 887843961c4b4681ee993c36d4997bf4b4aa8253
Do you use the remap_file_pages syscall? Such syscall has been dropped
upstream so very few apps should possibly try to use it on 64bit
archs.
It would also
On Mon, Jul 17, 2017 at 04:45:10PM +0200, Christoffer Dall wrote:
> I would also very much like to get to the bottom of this, and at the
> very least try to get a valid explanation as to how a thread can be
> *running* for a process where there are zero references to the struct
> mm?
A thread
On Mon, Jul 17, 2017 at 04:45:10PM +0200, Christoffer Dall wrote:
> I would also very much like to get to the bottom of this, and at the
> very least try to get a valid explanation as to how a thread can be
> *running* for a process where there are zero references to the struct
> mm?
A thread
On Thu, Jul 13, 2017 at 11:11:37AM -0700, Mike Kravetz wrote:
> Here is my understanding of how things work for old_len == 0 of anon
> mappings:
> - shared mappings
> - New vma is created at new virtual address
> - vma refers to the same underlying object/pages as old vma
> -
On Thu, Jul 13, 2017 at 11:11:37AM -0700, Mike Kravetz wrote:
> Here is my understanding of how things work for old_len == 0 of anon
> mappings:
> - shared mappings
> - New vma is created at new virtual address
> - vma refers to the same underlying object/pages as old vma
> -
On Thu, Jul 13, 2017 at 09:01:54AM -0700, Mike Kravetz wrote:
> Sent a patch (in separate e-mail thread) to return EINVAL for private
> mappings.
The way old_len == 0 behaves for MAP_PRIVATE seems more sane to me
than the alternative of copying pagetables for anon pages (as behaving
the way that
On Thu, Jul 13, 2017 at 09:01:54AM -0700, Mike Kravetz wrote:
> Sent a patch (in separate e-mail thread) to return EINVAL for private
> mappings.
The way old_len == 0 behaves for MAP_PRIVATE seems more sane to me
than the alternative of copying pagetables for anon pages (as behaving
the way that
On Tue, Jul 11, 2017 at 02:57:38PM -0700, Mike Kravetz wrote:
> Well, the JVM has had a config option for the use of hugetlbfs for quite
> some time. I assume they have already had to deal with these issues.
Yes, the config tweak exists well before THP existed but in production
I know nobody who
On Tue, Jul 11, 2017 at 02:57:38PM -0700, Mike Kravetz wrote:
> Well, the JVM has had a config option for the use of hugetlbfs for quite
> some time. I assume they have already had to deal with these issues.
Yes, the config tweak exists well before THP existed but in production
I know nobody who
On Tue, Jul 11, 2017 at 11:23:19AM -0700, Mike Kravetz wrote:
> I was surprised as well when a JVM developer pointed this out.
>
> From the old e-mail thread, here is original use case:
> shmget(IPC_PRIVATE, 31498240, 0x1c0|0600) = 11337732
> shmat(11337732, 0, 0) = 0x40299000
>
On Tue, Jul 11, 2017 at 11:23:19AM -0700, Mike Kravetz wrote:
> I was surprised as well when a JVM developer pointed this out.
>
> From the old e-mail thread, here is original use case:
> shmget(IPC_PRIVATE, 31498240, 0x1c0|0600) = 11337732
> shmat(11337732, 0, 0) = 0x40299000
>
Hello,
On Thu, Jul 06, 2017 at 09:45:13AM +0200, Christoffer Dall wrote:
> Let's look at the callers to stage2_get_pmd, which is the only caller of
> stage2_get_pud, where the problem was observed:
>
> user_mem_abort
>-> stage2_set_pmd_huge
> -> stage2_get_pmd
>
> user_mem_abort
Hello,
On Thu, Jul 06, 2017 at 09:45:13AM +0200, Christoffer Dall wrote:
> Let's look at the callers to stage2_get_pmd, which is the only caller of
> stage2_get_pud, where the problem was observed:
>
> user_mem_abort
>-> stage2_set_pmd_huge
> -> stage2_get_pmd
>
> user_mem_abort
On Fri, Jun 30, 2017 at 05:55:08PM -0700, prakash sangappa wrote:
> Interesting that UFFDIO_COPY is faster then fallocate(). In the DB use case
> the page does not need to be allocated at the time a process trips on
> the hugetlbfs
> file hole and receives SIGBUS. fallocate() is called on the
On Fri, Jun 30, 2017 at 05:55:08PM -0700, prakash sangappa wrote:
> Interesting that UFFDIO_COPY is faster then fallocate(). In the DB use case
> the page does not need to be allocated at the time a process trips on
> the hugetlbfs
> file hole and receives SIGBUS. fallocate() is called on the
Hello,
On Mon, Jul 03, 2017 at 10:48:03AM +0200, Alexander Graf wrote:
> On 07/03/2017 10:03 AM, Christoffer Dall wrote:
> > Hi Alex,
> >
> > On Fri, Jun 23, 2017 at 05:21:59PM +0200, Alexander Graf wrote:
> >> If we want to age an HVA while the VM is getting destroyed, we have a
> >> tiny race
Hello,
On Mon, Jul 03, 2017 at 10:48:03AM +0200, Alexander Graf wrote:
> On 07/03/2017 10:03 AM, Christoffer Dall wrote:
> > Hi Alex,
> >
> > On Fri, Jun 23, 2017 at 05:21:59PM +0200, Alexander Graf wrote:
> >> If we want to age an HVA while the VM is getting destroyed, we have a
> >> tiny race
On Fri, Jun 30, 2017 at 11:47:35AM +0200, Michal Hocko wrote:
> [CC John, the thread started
> http://lkml.kernel.org/r/9363561f-a9cd-7ab6-9c11-ab9a99dc8...@oracle.com]
>
> On Thu 29-06-17 14:41:22, prakash.sangappa wrote:
> >
> >
> > On 06/29/2017 01:09 AM, Michal Hocko wrote:
> > >On Wed
On Fri, Jun 30, 2017 at 11:47:35AM +0200, Michal Hocko wrote:
> [CC John, the thread started
> http://lkml.kernel.org/r/9363561f-a9cd-7ab6-9c11-ab9a99dc8...@oracle.com]
>
> On Thu 29-06-17 14:41:22, prakash.sangappa wrote:
> >
> >
> > On 06/29/2017 01:09 AM, Michal Hocko wrote:
> > >On Wed
Hello Minchan,
On Fri, Jun 16, 2017 at 10:52:09PM +0900, Minchan Kim wrote:
> > > > @@ -1995,8 +1984,6 @@ static void __split_huge_pmd_locked(struct
> > > > vm_area_struct *vma, pmd_t *pmd,
> > > > if (soft_dirty)
> > > > entry =
Hello Minchan,
On Fri, Jun 16, 2017 at 10:52:09PM +0900, Minchan Kim wrote:
> > > > @@ -1995,8 +1984,6 @@ static void __split_huge_pmd_locked(struct
> > > > vm_area_struct *vma, pmd_t *pmd,
> > > > if (soft_dirty)
> > > > entry =
On Thu, Jun 15, 2017 at 05:52:23PM +0300, Kirill A. Shutemov wrote:
> -void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> +pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>pmd_t *pmdp)
> {
> - pmd_t entry = *pmdp;
> -
On Thu, Jun 15, 2017 at 05:52:23PM +0300, Kirill A. Shutemov wrote:
> -void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
> +pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>pmd_t *pmdp)
> {
> - pmd_t entry = *pmdp;
> -
Hello Krill,
On Thu, Jun 15, 2017 at 05:52:22PM +0300, Kirill A. Shutemov wrote:
> +static inline pmd_t pmdp_establish(pmd_t *pmdp, pmd_t pmd)
> +{
> + pmd_t old;
> +
> + /*
> + * We cannot assume what is value of pmd here, so there's no easy way
> + * to set if half by half. We
Hello Krill,
On Thu, Jun 15, 2017 at 05:52:22PM +0300, Kirill A. Shutemov wrote:
> +static inline pmd_t pmdp_establish(pmd_t *pmdp, pmd_t pmd)
> +{
> + pmd_t old;
> +
> + /*
> + * We cannot assume what is value of pmd here, so there's no easy way
> + * to set if half by half. We
Hello Prakash,
On Tue, May 09, 2017 at 01:59:34PM -0700, Prakash Sangappa wrote:
>
>
> On 5/9/17 1:58 AM, Christoph Hellwig wrote:
> > On Mon, May 08, 2017 at 03:12:42PM -0700, prakash.sangappa wrote:
> >> Regarding #3 as a general feature, do we want to
> >> consider this and the complexity
Hello Prakash,
On Tue, May 09, 2017 at 01:59:34PM -0700, Prakash Sangappa wrote:
>
>
> On 5/9/17 1:58 AM, Christoph Hellwig wrote:
> > On Mon, May 08, 2017 at 03:12:42PM -0700, prakash.sangappa wrote:
> >> Regarding #3 as a general feature, do we want to
> >> consider this and the complexity
On Wed, Jun 14, 2017 at 04:51:41PM +0300, Kirill A. Shutemov wrote:
> We need an atomic way to make pmd page table entry not-present.
> This is required to implement pmdp_invalidate() that doesn't loose dirty
> or access bits.
What does the cmpxchg() loop achieves compared to xchg() and then
On Wed, Jun 14, 2017 at 04:51:41PM +0300, Kirill A. Shutemov wrote:
> We need an atomic way to make pmd page table entry not-present.
> This is required to implement pmdp_invalidate() that doesn't loose dirty
> or access bits.
What does the cmpxchg() loop achieves compared to xchg() and then
Hello,
On Wed, Jun 14, 2017 at 04:18:57PM +0200, Martin Schwidefsky wrote:
> Could we change pmdp_invalidate to make it return the old pmd entry?
That to me seems the simplest fix to avoid losing the dirty bit.
I earlier suggested to replace pmdp_invalidate with something like
old_pmd =
Hello,
On Wed, Jun 14, 2017 at 04:18:57PM +0200, Martin Schwidefsky wrote:
> Could we change pmdp_invalidate to make it return the old pmd entry?
That to me seems the simplest fix to avoid losing the dirty bit.
I earlier suggested to replace pmdp_invalidate with something like
old_pmd =
On Thu, Jun 01, 2017 at 10:09:09AM +0200, Michal Hocko wrote:
> That is a bit surprising. I didn't think that the userfault syscall
> (ioctl) can be faster than a regular #PF but considering that
> __mcopy_atomic bypasses the page fault path and it can be optimized for
> the anon case suggests
On Thu, Jun 01, 2017 at 10:09:09AM +0200, Michal Hocko wrote:
> That is a bit surprising. I didn't think that the userfault syscall
> (ioctl) can be faster than a regular #PF but considering that
> __mcopy_atomic bypasses the page fault path and it can be optimized for
> the anon case suggests
On Wed, May 31, 2017 at 04:32:17PM +0200, Michal Hocko wrote:
> I would assume such a patch would be backported to stable trees because
> to me it sounds like the current semantic is simply broken and needs
> fixing anyway but it shouldn't be much different from any other bugs.
So the program
On Wed, May 31, 2017 at 04:32:17PM +0200, Michal Hocko wrote:
> I would assume such a patch would be backported to stable trees because
> to me it sounds like the current semantic is simply broken and needs
> fixing anyway but it shouldn't be much different from any other bugs.
So the program
On Wed, May 31, 2017 at 03:39:22PM +0300, Mike Rapoport wrote:
> For the CRIU usecase, disabling THP for a while and re-enabling it
> back will do the trick, provided VMAs flags are not affected, like
> in the patch you've sent. Moreover, we may even get away with
Are you going to check uname -r
501 - 600 of 3668 matches
Mail list logo