Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()

2019-03-12 Thread Andrea Arcangeli
On Tue, Mar 12, 2019 at 02:19:15PM -0700, James Bottomley wrote: > I mean in the sequence > > flush_dcache_page(page); > flush_dcache_page(page); > > The first flush_dcache_page did all the work and the second it a > tightly pipelined no-op. That's what I mean by there not really being > a doubl

Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address

2019-03-08 Thread Andrea Arcangeli
Hello Jeson, On Fri, Mar 08, 2019 at 04:50:36PM +0800, Jason Wang wrote: > Just to make sure I understand here. For boosting through huge TLB, do > you mean we can do that in the future (e.g by mapping more userspace > pages to kenrel) or it can be done by this series (only about three 4K > pag

Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address

2019-03-08 Thread Andrea Arcangeli
On Fri, Mar 08, 2019 at 04:58:44PM +0800, Jason Wang wrote: > Can I simply can set_page_dirty() before vunmap() in the mmu notifier > callback, or is there any reason that it must be called within vumap()? I also don't see any problem in doing it before vunmap. As far as the mmu notifier and set_

Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address

2019-03-08 Thread Andrea Arcangeli
On Fri, Mar 08, 2019 at 05:13:26PM +0800, Jason Wang wrote: > Actually not wrapping around,  the pages for used ring was marked as > dirty after a round of virtqueue processing when we're sure vhost wrote > something there. Thanks for the clarification. So we need to convert it to set_page_dirty

Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address

2019-03-07 Thread Andrea Arcangeli
Hello Jerome, On Thu, Mar 07, 2019 at 03:17:22PM -0500, Jerome Glisse wrote: > So for the above the easiest thing is to call set_page_dirty() from > the mmu notifier callback. It is always safe to use the non locking > variant from such callback. Well it is safe only if the page was > map with wri

Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address

2019-03-07 Thread Andrea Arcangeli
On Thu, Mar 07, 2019 at 02:09:10PM -0500, Jerome Glisse wrote: > I thought this patch was only for anonymous memory ie not file back ? Yes, the other common usages are on hugetlbfs/tmpfs that also don't need to implement writeback and are obviously safe too. > If so then set dirty is mostly usele

Re: [RFC PATCH V2 5/5] vhost: access vq metadata through kernel virtual address

2019-03-07 Thread Andrea Arcangeli
On Thu, Mar 07, 2019 at 12:56:45PM -0500, Michael S. Tsirkin wrote: > On Thu, Mar 07, 2019 at 10:47:22AM -0500, Michael S. Tsirkin wrote: > > On Wed, Mar 06, 2019 at 02:18:12AM -0500, Jason Wang wrote: > > > +static const struct mmu_notifier_ops vhost_mmu_notifier_ops = { > > > + .invalidate_range

Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm

2019-03-06 Thread Andrea Arcangeli
Hello Zhong, On Wed, Mar 06, 2019 at 09:07:00PM +0800, zhong jiang wrote: > The patch use call_rcu to delay free the task_struct, but It is possible to > free the task_struct > ahead of get_mem_cgroup_from_mm. is it right? Yes it is possible to free before get_mem_cgroup_from_mm, but if it's fre

Re: KASAN: use-after-free Read in get_mem_cgroup_from_mm

2019-03-05 Thread Andrea Arcangeli
k really ever used such mm). However that mm is on its way to exit_mmap as soon as the ioclt returns and this only ever happens during race conditions, so the way CRIU monitor works there wasn't anything fundamentally concerning about this detail, despite it's remarkably "strange". O

Re: [PATCH v2] mm/memory.c: do_fault: avoid usage of stale vm_area_struct

2019-03-02 Thread Andrea Arcangeli
rdered after up_read(mmap_sem) either. Other than the above detail: Reviewed-by: Andrea Arcangeli Thanks, Andrea

Re: [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting

2019-02-18 Thread Andrea Arcangeli
Hello, On Mon, Feb 18, 2019 at 03:47:22PM -0800, Alexander Duyck wrote: > essentially fragmented them. I guess hugepaged went through and > started trying to reassemble the huge pages and as a result there have > been apps that ended up consuming more memory than they would have > otherwise since

Re: [RFC PATCH 0/4] Restore change_pte optimization to its former glory

2019-02-18 Thread Andrea Arcangeli
On Mon, Feb 18, 2019 at 11:04:13AM -0500, Jerome Glisse wrote: > So i run 2 exact same VMs side by side (copy of same COW image) and > built the same kernel tree inside each (that is the only important > workload that exist ;)) but the change_pte did not have any impact: > > before mean {real: 1

Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations

2019-02-14 Thread Andrea Arcangeli
On Thu, Feb 14, 2019 at 04:07:37PM +0800, Huang, Ying wrote: > Before, we choose to use stop_machine() to reduce the overhead of hot > path (page fault handler) as much as possible. But now, I found > rcu_read_lock_sched() is just a wrapper of preempt_disable(). So maybe > we can switch to RCU ve

Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations

2019-02-14 Thread Andrea Arcangeli
Hello, On Thu, Feb 14, 2019 at 12:30:02PM -0800, Andrew Morton wrote: > This was discussed to death and I think the changelog explains the > conclusions adequately. swapoff is super-rare so a stop_machine() in > that path is appropriate if its use permits more efficiency in the > regular swap cod

Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations

2019-02-13 Thread Andrea Arcangeli
Hello everyone, On Mon, Feb 11, 2019 at 04:38:46PM +0800, Huang, Ying wrote: > @@ -2386,7 +2463,17 @@ static void enable_swap_info(struct swap_info_struct > *p, int prio, > frontswap_init(p->type, frontswap_map); > spin_lock(&swap_lock); > spin_lock(&p->lock); > - _enable_s

Re: [RFC PATCH 0/4] Restore change_pte optimization to its former glory

2019-02-11 Thread Andrea Arcangeli
On Mon, Feb 11, 2019 at 02:09:31PM -0500, Jerome Glisse wrote: > Yeah, between do you have any good workload for me to test this ? I > was thinking of running few same VM and having KSM work on them. Is > there some way to trigger KVM to fork ? As the other case is breaking > COW after fork. KVM c

Re: [RFC PATCH 2/4] mm/mmu_notifier: use unsigned for event field in range struct

2019-02-01 Thread Andrea Arcangeli
On Thu, Jan 31, 2019 at 01:37:04PM -0500, Jerome Glisse wrote: > From: Jérôme Glisse > > Use unsigned for event field in range struct so that we can also set > flags with the event. This patch change the field and introduce the > helper. > > Signed-off-by: Jérôme Glisse &

Re: [RFC PATCH 1/4] uprobes: use set_pte_at() not set_pte_at_notify()

2019-02-01 Thread Andrea Arcangeli
On Thu, Jan 31, 2019 at 01:37:03PM -0500, Jerome Glisse wrote: > @@ -207,8 +207,7 @@ static int __replace_page(struct vm_area_struct *vma, > unsigned long addr, > > flush_cache_page(vma, addr, pte_pfn(*pvmw.pte)); > ptep_clear_flush_notify(vma, addr, pvmw.pte); > - set_pte_at_not

Re: [RFC PATCH 0/4] Restore change_pte optimization to its former glory

2019-02-01 Thread Andrea Arcangeli
On Fri, Feb 01, 2019 at 06:57:38PM -0500, Andrea Arcangeli wrote: > If it's cleared with ptep_clear_flush_notify, change_pte still won't > work. The above text needs updating with > "ptep_clear_flush". set_pte_at_notify is all about having > ptep_clear_flush on

Re: [RFC PATCH 0/4] Restore change_pte optimization to its former glory

2019-02-01 Thread Andrea Arcangeli
Hello everyone, On Thu, Jan 31, 2019 at 01:37:02PM -0500, Jerome Glisse wrote: > From: Jérôme Glisse > > This patchset is on top of my patchset to add context information to > mmu notifier [1] you can find a branch with everything [2]. I have not > tested it but i wanted to get the discussion st

Re: [PATCH] powerpc/powernv/npu: Remove redundant change_pte() hook

2019-01-31 Thread Andrea Arcangeli
in > invalidate_range() already. > > CC: Benjamin Herrenschmidt > CC: Paul Mackerras > CC: Michael Ellerman > CC: Alistair Popple > CC: Alexey Kardashevskiy > CC: Mark Hairgrove > CC: Balbir Singh > CC: David Gibson > CC: Andrea Arcangeli > CC: Jerome Gliss

Re: [LSF/MM TOPIC]: userfaultfd (was: [LSF/MM TOPIC] NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE)

2019-01-30 Thread Andrea Arcangeli
Hello Mike, On Wed, Jan 30, 2019 at 10:13:36AM +0200, Mike Rapoport wrote: > We (CRIU) have some concerns about obsoleting soft-dirty in favor of > uffd-wp. If there are other soft-dirty users these concerns would be > relevant to them as well. > > With soft-dirty we collect the information about

[LSF/MM TOPIC] NUMA remote THP vs NUMA local non-THP under MADV_HUGEPAGE

2019-01-29 Thread Andrea Arcangeli
Hello, I'd like to attend the LSF/MM Summit 2019. I'm interested in most MM topics and it's enlightening to listen to the common non-MM topics too. One current topic that could be of interest is the THP / NUMA tradeoff in subject. One issue about a change in MADV_HUGEPAGE behavior made ~3 years

Re: [PATCH 0/1] RFC: sched/fair: skip select_idle_sibling() in presence of sync wakeups

2019-01-09 Thread Andrea Arcangeli
On Wed, Jan 09, 2019 at 10:07:51AM +, Mel Gorman wrote: > I agree with Mike here. Many previous attempts to strictly obey the strict > hint has led to regressions elsewhere -- specifically a task waking 2+ > wakees that temporarily stack on one CPU when nearby CPUs sharing LLC sync-waking 2 wa

Re: [PATCH 0/1] RFC: sched/fair: skip select_idle_sibling() in presence of sync wakeups

2019-01-09 Thread Andrea Arcangeli
Hello Mike, On Wed, Jan 09, 2019 at 05:19:48AM +0100, Mike Galbraith wrote: > On Tue, 2019-01-08 at 22:49 -0500, Andrea Arcangeli wrote: > > Hello, > > > > we noticed some unexpected performance regressions in the scheduler by > > switching the guest CPU topology fro

[PATCH 0/1] RFC: sched/fair: skip select_idle_sibling() in presence of sync wakeups

2019-01-08 Thread Andrea Arcangeli
wait(NULL); } else { while (n--) { write(pipe1[1], buf, 1); read(pipe2[0], buf, 1); } } return 0; } Andrea Arcangeli (1): sched/fair: skip select_idle_sibling() in presence of sync wakeups ker

[PATCH 1/1] sched/fair: skip select_idle_sibling() in presence of sync wakeups

2019-01-08 Thread Andrea Arcangeli
a single CPU used at 100% utilization and that increases performance for those common workloads. Signed-off-by: Andrea Arcangeli --- kernel/sched/fair.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d1907506318a..b2ac152

Re: [PATCH V6 1/4] mm/cma: Add PF flag to force non cma alloc

2019-01-08 Thread Andrea Arcangeli
take a page pin by > migrating pages from CMA region. Marking the section PF_MEMALLOC_NOCMA ensures > that we avoid uncessary page migration later. > > Suggested-by: Andrea Arcangeli > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Andrea Arcangeli

[PATCH 0/1] mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT

2019-01-08 Thread Andrea Arcangeli
ultfd reproduces it easily because it's an heavy user of VM_FAULT_RETRY retvals. Thanks, Andrea Andrea Arcangeli (1): mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT mm/hugetlb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

[PATCH 1/1] mm/hugetlb.c: teach follow_hugetlb_page() to handle FOLL_NOWAIT

2019-01-08 Thread Andrea Arcangeli
vm: switch get_user_page_nowait() to get_user_pages_unlocked()") Signed-off-by: Andrea Arcangeli Tested-by: "Dr. David Alan Gilbert" Reported-by: "Dr. David Alan Gilbert" --- mm/hugetlb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.

Re: [PATCH V6 3/4] powerpc/mm/iommu: Allow migration of cma allocated pages during mm_iommu_get

2019-01-08 Thread Andrea Arcangeli
Hello, On Tue, Jan 08, 2019 at 10:21:09AM +0530, Aneesh Kumar K.V wrote: > @@ -187,41 +149,25 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, > unsigned long ua, > goto unlock_exit; > } > > + ret = get_user_pages_cma_migrate(ua, entries, 1, mem->hpages); In terms

Re: KASAN: use-after-free Read in handle_userfault (2)

2019-01-04 Thread Andrea Arcangeli
On Wed, Jan 02, 2019 at 02:37:58PM +0100, Dmitry Vyukov wrote: > If we are proceeding with "mm: some enhancements to the page fault > mechanism", that's good as it will eliminate at least part of this > output. Agreed. > There are 2 types of debug configs: ones add additional checks for > machine

Re: [PATCH v2] mm: page_mapped: don't assume compound page is huge or THP

2019-01-04 Thread Andrea Arcangeli
ithub.com/jstancek/reproducers/blob/master/kernel/page_mapped_crash/repro.c > > Fix the loop to iterate for "1 << compound_order" pages. > > Debugged-by: Laszlo Ersek > Suggested-by: "Kirill A. Shutemov" > Signed-off-by: Jan Stancek > --- > mm/

Re: KASAN: use-after-free Read in handle_userfault (2)

2018-12-30 Thread Andrea Arcangeli
Hello, On Sun, Dec 30, 2018 at 08:48:05AM +0100, Dmitry Vyukov wrote: > On Wed, Dec 12, 2018 at 10:58 AM Dmitry Vyukov wrote: > > > > On Wed, Dec 12, 2018 at 10:45 AM syzbot > > wrote: > > > > > > Hello, > > > > > > syzbot found the following crash on: > > > > > > HEAD commit:14cf8c1d5b90 Ad

Re: [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low

2018-12-21 Thread Andrea Arcangeli
Hello Yang, On Thu, Dec 20, 2018 at 10:33:26PM -0800, Yang Shi wrote: > > > On 12/20/18 10:04 PM, Hugh Dickins wrote: > > On Thu, 20 Dec 2018, Andrew Morton wrote: > >> Is anyone interested in reviewing this? Seems somewhat serious. > >> Thanks. > > Somewhat serious, but no need to rush. > > >

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-12 Thread Andrea Arcangeli
On Wed, Dec 12, 2018 at 10:50:51AM +0100, Michal Hocko wrote: > I can be convinced that larger pages really require a different behavior > than base pages but you should better show _real_ numbers on a wider > variety workloads to back your claims. I have only heard hand waving and I agree with yo

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-12 Thread Andrea Arcangeli
Hello, I now found a two socket EPYC (is this Naples?) to try to confirm the THP effect of intra-socket THP. CPU(s):128 On-line CPU(s) list: 0-127 Thread(s) per core:2 Core(s) per socket:32 Socket(s): 2 NUMA node(s): 8 NUMA node0 CPU(s): 0-7,64-7

Re: [PATCH] userfaultfd: clear flag if remap event not enabled

2018-12-10 Thread Andrea Arcangeli
ould not generate the remap event, and at the same > > time we should clear all the uffd flags on the new VMA. Without > > this patch, we can still have the VM_UFFD_MISSING|VM_UFFD_WP > > flags on the new VMA even the fault handling process does not > > even know the exista

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-09 Thread Andrea Arcangeli
Hello, On Sun, Dec 09, 2018 at 04:29:13PM -0800, David Rientjes wrote: > [..] on this platform, at least, hugepages are > preferred on the same socket but there isn't a significant benefit from > getting a cross socket hugepage over small page. [..] You didn't release the proprietary software t

[PATCH 0/1] userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered

2018-12-06 Thread Andrea Arcangeli
This should be applied on top of 29ec90660d68 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas") to shut off the false positive warning. Thanks, Andrea Andrea Arcangeli (1): userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered fs/us

[PATCH 1/1] userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered

2018-12-06 Thread Andrea Arcangeli
ow to register VM_MAYWRITE vmas") Reported-by: syzbot+06c7092e7d71218a2...@syzkaller.appspotmail.com Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index cd58939dc977..7a85e609f

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Andrea Arcangeli
On Wed, Dec 05, 2018 at 04:18:14PM -0800, David Rientjes wrote: > On Wed, 5 Dec 2018, Andrea Arcangeli wrote: > > > __GFP_COMPACT_ONLY gave an hope it could give some middle ground but > > it shows awful compaction results, it basically destroys compaction > > effec

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-05 Thread Andrea Arcangeli
ions This makes it possible for QEMU to use transparent huge pages (THP) when transparent_hugepage/enabled=madvise. Otherwise THP is only used when it's enabled system wide. Signed-off-by: Luiz Capitulino Signed-off-by: Anthony Liguori Signed-off-by: Andrea Arcangeli --- exec.c |

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Andrea Arcangeli
Hello, On Wed, Dec 05, 2018 at 01:59:32PM -0800, David Rientjes wrote: > [..] and the kernel test robot has reported, [..] Just for completeness you may have missed one email: https://lkml.kernel.org/r/87tvk1yjkp@yhuang-dev.intel.com 'So I think the report should have been a "performance imp

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Andrea Arcangeli
On Wed, Dec 05, 2018 at 02:03:10PM -0800, Linus Torvalds wrote: > On Wed, Dec 5, 2018 at 12:40 PM Andrea Arcangeli wrote: > > > > So ultimately we decided that the saner behavior that gives the least > > risk of regression for the short term, until we can do something > &g

Re: [patch 1/2 for-4.20] mm, thp: restore node-local hugepage allocations

2018-12-05 Thread Andrea Arcangeli
On Wed, Dec 05, 2018 at 09:15:28PM +0100, Michal Hocko wrote: > If the __GFP_THISNODE should be really used then it should be applied to > all other types of pages. Not only THP. And as such done in a separate > patch. Not a part of the revert. The cleanup was meant to unify THP > allocations and t

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-05 Thread Andrea Arcangeli
On Wed, Dec 05, 2018 at 11:49:26AM -0800, David Rientjes wrote: > High thp utilization is not always better, especially when those hugepages > are accessed remotely and introduce the regressions that I've reported. > Seeking high thp utilization at all costs is not the goal if it causes > workl

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Andrea Arcangeli
Hello, Sorry, it has been challenging to keep up with all fast replies, so I'll start by answering to the critical result below: On Tue, Dec 04, 2018 at 10:45:58AM +, Mel Gorman wrote: > thpscale Percentage Faults Huge >4.20.0-rc4 4.20.0-rc4 >

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-03 Thread Andrea Arcangeli
On Mon, Dec 03, 2018 at 11:28:07AM -0800, Linus Torvalds wrote: > On Mon, Dec 3, 2018 at 10:59 AM Michal Hocko wrote: > > > > You are misinterpreting my words. I haven't dismissed anything. I do > > recognize both usecases under discussion. > > > > I have merely said that a better THP locality nee

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-03 Thread Andrea Arcangeli
On Mon, Dec 03, 2018 at 07:59:54PM +0100, Michal Hocko wrote: > I have merely said that a better THP locality needs more work and during > the review discussion I have even volunteered to work on that. There > are other reclaim related fixes under work right now. All I am saying > is that MADV_TRAN

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-11-28 Thread Andrea Arcangeli
On Wed, Nov 28, 2018 at 08:48:46AM -0800, Linus Torvalds wrote: > On Tue, Nov 27, 2018 at 7:20 PM Huang, Ying wrote: > > > > From the above data, for the parent commit 3 processes exited within > > 14s, another 3 exited within 100s. For this commit, the first process > > exited at 203s. That is,

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-11-27 Thread Andrea Arcangeli
urrent fix you merged is simpler overall and puts us back to a "stable" state without introducing new (minor) features. The below is for further review of the potential alternative (which has still margin for improvement). === From: Andrea Arcangeli Subject: [PATCH 1/2] mm: thp: conso

Re: [patch V2 27/28] x86/speculation: Add seccomp Spectre v2 user space protection mode

2018-11-26 Thread Andrea Arcangeli
Hello, On Sun, Nov 25, 2018 at 11:28:59PM +0100, Thomas Gleixner wrote: > Indeed. Just checked the documentation again, it's also not clear whether > IBPB is required if STIPB is in use. I tried to ask this question too earlier: https://lkml.kernel.org/r/20181119234528.gj29...@redhat.com If the

[PATCH 1/5] userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails

2018-11-26 Thread Andrea Arcangeli
ultfd support") Signed-off-by: Andrea Arcangeli --- mm/hugetlb.c | 2 +- mm/shmem.c | 2 +- mm/userfaultfd.c | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7f2a28ab46d5..705a3e9cc910 100644 --- a/mm/hugetlb.c +++ b/mm

[PATCH 4/5] userfaultfd: shmem: add i_size checks

2018-11-26 Thread Andrea Arcangeli
("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Cc: sta...@vger.kernel.org Signed-off-by: Andrea Arcangeli --- mm/shmem.c | 18 -- mm/userfaultfd.c | 26 -- 2 files changed, 40 insertions(+), 4 deletions(-) diff

[PATCH 2/5] userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem

2018-11-26 Thread Andrea Arcangeli
eropage pte writable. Reported-by: Mike Rapoport Reviewed-by: Hugh Dickins Cc: sta...@vger.kernel.org Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Andrea Arcangeli --- mm/userfaultfd.c | 15 +-- 1 file changed,

[PATCH 0/5] userfaultfd shmem updates

2018-11-26 Thread Andrea Arcangeli
Thank you, Andrea Andrea Arcangeli (5): userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas userfaultfd: shmem: add i_size checks u

[PATCH 5/5] userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set

2018-11-26 Thread Andrea Arcangeli
_atomic_pte for userfaultfd support") Reported-by: Hugh Dickins Signed-off-by: Andrea Arcangeli --- mm/shmem.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index c3ece7a51949..82a381d463bc 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2272,6 +2

[PATCH 3/5] userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas

2018-11-26 Thread Andrea Arcangeli
ger.kernel.org Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 15 +++ mm/userfaultfd.c | 15 ++- 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 356d2b8568c1..cd58939dc977 100644 --- a/fs/userfaultfd

Re: [PATCH] mm: put_and_wait_on_page_locked() while page is migrated

2018-11-24 Thread Andrea Arcangeli
scan.c's is_page_cache_freeable() > and __remove_mapping() now treat a PageWaiters page as if an extra > reference were held? Perhaps, but I don't think it matters much, since > shrink_page_list() already had to win its trylock_page(), so waiters are > not very common there: I noticed no difference when trying the bigger > change, and it's surely not needed while put_and_wait_on_page_locked() > is only used for page migration. > > Reported-and-tested-by: Baoquan He > Signed-off-by: Hugh Dickins > Acked-by: Michal Hocko Reviewed-by: Andrea Arcangeli

Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory

2018-11-20 Thread Andrea Arcangeli
On Tue, Nov 20, 2018 at 12:11:22PM +0300, Kirill A. Shutemov wrote: > On Sat, Nov 10, 2018 at 11:44:12AM -0500, Andrea Arcangeli wrote: > > I would prefer to add intelligence to detect when COWs after fork > > should be done at 2m or 4k granularity (in the latter case by > &

Re: [Patch v5 11/16] x86/speculation: Add Spectre v2 app to app protection modes

2018-11-19 Thread Andrea Arcangeli
On Mon, Nov 19, 2018 at 03:25:41PM -0800, Dave Hansen wrote: > On 11/19/18 3:16 PM, Andrea Arcangeli wrote: > > So you may want to ask why it wasn't written as your "any" vs "any" email: > > Presumably because the authors really and truly meant what they s

Re: [Patch v5 11/16] x86/speculation: Add Spectre v2 app to app protection modes

2018-11-19 Thread Andrea Arcangeli
On Mon, Nov 19, 2018 at 01:33:08PM -0800, Dave Hansen wrote: > On 11/19/18 11:32 AM, Andrea Arcangeli wrote: > > The specs don't say if by making it immune from BTB mistraining, it > > also could prevent to mistrain the BTB in order to attack what's > > outside the

Re: [Patch v5 11/16] x86/speculation: Add Spectre v2 app to app protection modes

2018-11-19 Thread Andrea Arcangeli
On Mon, Nov 19, 2018 at 08:39:41PM +0100, Jiri Kosina wrote: > On Mon, 19 Nov 2018, Andrea Arcangeli wrote: > > > Generally speaking the untrusted code that would try to use spectrev2 > > to attack the other processes is more likely to run inside SECCOMP > > jail tha

Re: [Patch v5 11/16] x86/speculation: Add Spectre v2 app to app protection modes

2018-11-19 Thread Andrea Arcangeli
Hello everyone, On Mon, Nov 19, 2018 at 02:49:36PM +0100, Jiri Kosina wrote: > On Mon, 19 Nov 2018, Thomas Gleixner wrote: > > > > On Sat, 17 Nov 2018, Jiri Kosina wrote: > > > > > Subject: [PATCH] x86/speculation: enforce STIBP for SECCOMP tasks in lite > > > mode > > > > > > If 'lite' mode o

Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory

2018-11-10 Thread Andrea Arcangeli
On Sat, Nov 10, 2018 at 01:22:49PM +, Mel Gorman wrote: > On Fri, Nov 09, 2018 at 02:51:50PM -0500, Andrea Arcangeli wrote: > > And if you're in the camp that is concerned about the use of more RAM > > or/and about the higher latency of COW faults, I'm afraid the >

Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory

2018-11-09 Thread Andrea Arcangeli
Hello, On Fri, Nov 09, 2018 at 03:13:18PM +0300, Kirill A. Shutemov wrote: > On Thu, Nov 08, 2018 at 10:48:58PM -0800, Anthony Yznaga wrote: > > The basic idea as outlined by Mel Gorman in [2] is: > > > > 1) On first fault in a sufficiently sized range, allocate a huge page > >sized and align

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-29 Thread Andrea Arcangeli
Hello, On Mon, Oct 29, 2018 at 11:08:34AM +0100, Michal Hocko wrote: > This seems like a separate issue which should better be debugged. Please > open a new thread describing the problem and the state of the node. Yes, in my view it should be evaluated separately too, because it's overall less co

Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-18 Thread Andrea Arcangeli
On Thu, Oct 18, 2018 at 04:16:40PM -0700, Mike Kravetz wrote: > I was not sure about this, and expected someone could come up with > something better. It just seems there are filesystems like huegtlbfs, > where it makes no sense wasting cycles traversing the filesystem. So, > let's not even try.

Re: [PATCH v3 2/2] sysctl: handle overflow for file-max

2018-10-18 Thread Andrea Arcangeli
Hi Al, On Wed, Oct 17, 2018 at 01:35:48AM +0100, Al Viro wrote: > On Wed, Oct 17, 2018 at 12:33:22AM +0200, Christian Brauner wrote: > > Currently, when writing > > > > echo 18446744073709551616 > /proc/sys/fs/file-max > > > > /proc/sys/fs/file-max will overflow and be set to 0. That quickly > >

Re: possible deadlock in aio_poll

2018-10-17 Thread Andrea Arcangeli
gt;fd_wqh.lock); > + spin_lock_irq(&ctx->fd_wqh.lock); > } > __remove_wait_queue(&ctx->fd_wqh, &wait); > __set_current_state(TASK_RUNNING); > - spin_unlock(&ctx->fd_wqh.lock); > + spin_unlock_irq(&ctx->fd_wqh.lock);

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-16 Thread Andrea Arcangeli
Hello Zi, On Sun, Oct 14, 2018 at 08:53:55PM -0400, Zi Yan wrote: > Hi Andrea, what is the purpose/benefit of making x86’s pmd_present() returns > true > for a THP under splitting? Does it cause problems when ARM64’s pmd_present() > returns false in the same situation? !pmd_present means it's a

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-16 Thread Andrea Arcangeli
Hello, On Tue, Oct 16, 2018 at 03:37:15PM -0700, Andrew Morton wrote: > we'll still make it into 4.19.1. Am reluctant to merge this while > discussion, testing and possibly more development are ongoing. I think there can be definitely more developments primarily to make the compact deferred logi

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-15 Thread Andrea Arcangeli
Hello Andrew, On Mon, Oct 15, 2018 at 03:44:59PM -0700, Andrew Morton wrote: > On Mon, 15 Oct 2018 15:30:17 -0700 (PDT) David Rientjes > wrote: > > Would it be possible to test with my > > patch[*] that does not try reclaim to address the thrashing issue? > > Yes please. It'd also be great i

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-15 Thread Andrea Arcangeli
On Mon, Oct 15, 2018 at 03:30:17PM -0700, David Rientjes wrote: > At the risk of beating a dead horse that has already been beaten, what are > the plans for this patch when the merge window opens? It would be rather > unfortunate for us to start incurring a 14% increase in access latency and >

Re: [PATCH] mm/thp: fix call to mmu_notifier in set_pmd_migration_entry()

2018-10-12 Thread Andrea Arcangeli
On Fri, Oct 12, 2018 at 01:35:19PM -0400, Jerome Glisse wrote: > On Fri, Oct 12, 2018 at 01:24:22PM -0400, Andrea Arcangeli wrote: > > Hello, > > > > On Fri, Oct 12, 2018 at 12:20:54PM -0400, Zi Yan wrote: > > > On 12 Oct 2018, at 12:09, jgli...@redhat.com wrot

Re: [PATCH] mm/thp: fix call to mmu_notifier in set_pmd_migration_entry()

2018-10-12 Thread Andrea Arcangeli
(it's not _range_only_end, if it was _range_only_end the above would be needed) > > calling mmu_notifier_invalidate_range_start/end() inside the function > > calling set_pmd_migration_entry() (see try_to_unmap_one()). > > > > Signed-off-by: Jérôme Glisse > > Repo

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Andrea Arcangeli
On Tue, Oct 09, 2018 at 04:25:10PM +0200, Michal Hocko wrote: > On Tue 09-10-18 14:00:34, Mel Gorman wrote: > > On Tue, Oct 09, 2018 at 02:27:45PM +0200, Michal Hocko wrote: > > > [Sorry for being slow in responding but I was mostly offline last few > > > days] > > > > > > On Tue 09-10-18 10:48:2

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Andrea Arcangeli
On Tue, Oct 09, 2018 at 03:17:30PM -0700, David Rientjes wrote: > causes workloads to severely regress both in fault and access latency when > we know that direct reclaim is unlikely to make direct compaction free an > entire pageblock. It's more likely than not that the reclaim was > pointless

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Andrea Arcangeli
On Mon, Oct 08, 2018 at 01:41:09PM -0700, David Rientjes wrote: > The page allocator is expecting __GFP_NORETRY for thp allocations per its > comment: > > /* >* Checks for costly allocations with __GFP_NORETRY, which >* includes THP page fault allocat

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-05 Thread Andrea Arcangeli
Hello, On Thu, Oct 04, 2018 at 04:05:26PM -0700, David Rientjes wrote: > The source of the problem needs to be addressed: memory compaction. We > regress because we lose __GFP_NORETRY and pointlessly try reclaim, but I commented in detail about the __GFP_NORETRY topic in the other email so I w

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-05 Thread Andrea Arcangeli
Hi, On Fri, Oct 05, 2018 at 01:35:15PM -0700, David Rientjes wrote: > Why is it ever appropriate to do heavy reclaim and swap activity to > allocate a transparent hugepage? This is exactly what the __GFP_NORETRY > check for high-order allocations is attempting to avoid, and it explicitly > sta

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-04 Thread Andrea Arcangeli
Hello David, On Thu, Oct 04, 2018 at 01:16:32PM -0700, David Rientjes wrote: > There are ways to address this without introducing regressions for > existing users of MADV_HUGEPAGE: introduce an madvise() mode to accept > remote thp allocations, which users of this library would never set, or >

Re: [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-09-12 Thread Andrea Arcangeli
Hello, On Tue, Sep 11, 2018 at 01:56:13PM +0200, Michal Hocko wrote: > Well, it seems that expectations differ for users. It seems that kvm > users do not really agree with your interpretation. Like David also mentioned here: lkml.kernel.org/r/alpine.deb.2.21.1808211021110.258...@chino.kir.corp.

Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks

2018-09-05 Thread Andrea Arcangeli
On Wed, Sep 05, 2018 at 08:29:07PM +0200, Jiri Kosina wrote: > (and no, my testing of the patch I sent on current tree didn't produce any > hangs -- was there a reliable way to trigger it on 3.10?). Only a very specific libvirt acceptance test found this after a while and it wasn't a customer it

Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks

2018-09-05 Thread Andrea Arcangeli
On Wed, Sep 05, 2018 at 08:58:23AM -0700, Andi Kleen wrote: > > So, after giving it a bit more thought, I still believe "I want spectre V2 > > protection" vs. "I do not care about spectre V2 on my system > > (=nospectre_v2)" are the sane options we should provide; so I'll respin v4 > > of my pat

Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks

2018-09-04 Thread Andrea Arcangeli
On Wed, Sep 05, 2018 at 01:00:37AM +, Schaufler, Casey wrote: > Sorry, I've been working in security too long for my > optimistic streak to be very wide. Eheh. So I was simply trying to follow in context, but it wasn't entirely clear, so I tried to take it out of context and then it was even l

Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks

2018-09-04 Thread Andrea Arcangeli
Hello, On Tue, Sep 04, 2018 at 06:10:47PM +, Schaufler, Casey wrote: > The real reason to use an LSM based approach is that overloading ptrace > checks is a Really Bad Idea. Ptrace is a user interface. Side-channel is a > processor interface. Even if ptrace_may_access() does exactly what you

Re: [PATCH] KVM: try __get_user_pages_fast even if not in atomic context

2018-08-06 Thread Andrea Arcangeli
s worth is that we shouldn't be calling get_user_pages_unlocked in hva_to_pfn_slow if we could pass FOLL_HWPOISON to get_user_pages_fast. And get_user_pages_fast is really just __get_user_pages_fast + get_user_pages_unlocked with just a difference (see below). Reviewed-by: Andrea Arcangeli > > >

Re: [PATCH] userfaultfd: hugetlbfs: Fix userfaultfd_huge_must_wait pte access

2018-07-03 Thread Andrea Arcangeli
Hello, On Wed, Jun 27, 2018 at 10:47:44AM +0200, Janosch Frank wrote: > On 26.06.2018 19:00, Mike Kravetz wrote: > > On 06/26/2018 06:24 AM, Janosch Frank wrote: > >> Use huge_ptep_get to translate huge ptes to normal ptes so we can > >> check them with the huge_pte_* functions. Otherwise some arc

Re: [PATCH v2] mm/ksm: ignore STABLE_FLAG of rmap_item->address in rmap_walk_ksm

2018-06-07 Thread Andrea Arcangeli
ejia...@gmail.com > Signed-off-by: Jia He > Cc: Suzuki K Poulose > Cc: Andrea Arcangeli > Cc: Minchan Kim > Cc: Claudio Imbrenda > Cc: Arvind Yadav > Cc: Mike Rapoport > Cc: Jia He > Cc: > Signed-off-by: Andrew Morton > --- > Reviewed-by: Andrea Arcangeli

Re: [patch V2 1/2] sysfs/cpu: Add vulnerability folder

2018-01-26 Thread Andrea Arcangeli
especially if this would be used for any other equivalent issue in the future and it won't stick to these 3 files, I didn't implement that yet, because it's less urgent if nobody adds any more files soon. >From 578b411c8dcb1435dd1f94a6cd062f4eedb70fb5 Mon Sep 17 00:00:00 2001 From

Re: [PATCH v2] mm: Reduce memory bloat with THP

2018-01-25 Thread Andrea Arcangeli
On Thu, Jan 25, 2018 at 10:58:32AM +0100, Michal Hocko wrote: > Ohh, absolutely. And that is why we have changed the default in upstream > 444eb2a449ef ("mm: thp: set THP defrag by default to madvise and add a > stall-free defrag option") Agreed, that direct compaction change should already addres

Re: [PATCH v2] mm: Reduce memory bloat with THP

2018-01-25 Thread Andrea Arcangeli
On Thu, Jan 25, 2018 at 11:41:03AM -0800, Nitin Gupta wrote: > I'm trying to address many different THP issues and memory bloat is > first among them. You quoted redis in an earlier email, the redis issue has nothing to do with MADV_DONTNEED. I can quickly explain the redis issue. Redis uses for

Re: [RH72 Spectre] ibpb_enabled = 1 leads to hard LOCKUP under x86_64 host machine

2018-01-20 Thread Andrea Arcangeli
Hello everyone, On Sat, Jan 20, 2018 at 01:56:08PM +, Van De Ven, Arjan wrote: > well first of all don't use IBRS, use retpoline This issue triggers in the IBPB code during user to user context switch and IBPB is still needed there no matter if kernel is using retpolines or if it uses kernel

Re: [PATCH 23/35] x86/speculation: Add basic speculation control code

2018-01-19 Thread Andrea Arcangeli
On Fri, Jan 19, 2018 at 04:15:33AM +, Van De Ven, Arjan wrote: > there is no such guarantee. Some of the IBRS implementations will > actually flush rather than disable, or flush parts and disable other > parts. To me it helps in order to memorize the spec to understand why the spec is the way

Re: [PATCH 23/35] x86/speculation: Add basic speculation control code

2018-01-18 Thread Andrea Arcangeli
Hello, On Thu, Jan 18, 2018 at 03:25:25PM -0800, Andy Lutomirski wrote: > I read the whitepaper that documented the new MSRs a couple days ago > and I'm now completely unable to find it. If anyone could send the > link, that would be great. I see Andrew posted a link. > From memory, however, th

Re: [PATCH 23/35] x86/speculation: Add basic speculation control code

2018-01-18 Thread Andrea Arcangeli
On Thu, Jan 18, 2018 at 12:24:31PM -0600, Josh Poimboeuf wrote: > On Thu, Jan 18, 2018 at 06:12:36PM +0100, Paolo Bonzini wrote: > > On 18/01/2018 18:08, Dave Hansen wrote: > > > On 01/18/2018 08:37 AM, Josh Poimboeuf wrote: > > >>> > > >>> --- a/Documentation/admin-guide/kernel-parameters.txt > >

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Andrea Arcangeli
On Thu, Jan 18, 2018 at 06:45:00AM -0800, Dave Hansen wrote: > On 01/18/2018 04:25 AM, Kirill A. Shutemov wrote: > > [ 10.084024] diff: -858690919 > > [ 10.084258] hpage_nr_pages: 1 > > [ 10.084386] check1: 0 > > [ 10.084478] check2: 0 > ... > > diff --git a/mm/page_vma_mapped.c b/mm/page_v

Re: [PATCH] x86/pti: unpoison pgd for trusted boot

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 02:49:39PM -0800, Dave Hansen wrote: > > Updated to make this on top of x86/pti. Reviewed-by: Andrea Arcangeli

<    1   2   3   4   5   6   7   8   9   10   >