[PATCH 0/1] userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered

2018-12-06 Thread Andrea Arcangeli
. This should be applied on top of 29ec90660d68 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas") to shut off the false positive warning. Thanks, Andrea Andrea Arcangeli (1): userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered fs/userfau

[PATCH 1/1] userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered

2018-12-06 Thread Andrea Arcangeli
allow to register VM_MAYWRITE vmas") Reported-by: syzbot+06c7092e7d71218a2...@syzkaller.appspotmail.com Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index cd58939dc977..7a85e609f

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Andrea Arcangeli
On Wed, Dec 05, 2018 at 04:18:14PM -0800, David Rientjes wrote: > On Wed, 5 Dec 2018, Andrea Arcangeli wrote: > > > __GFP_COMPACT_ONLY gave an hope it could give some middle ground but > > it shows awful compaction results, it basically destroys compaction > > effec

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-05 Thread Andrea Arcangeli
use transparent huge pages (THP) when transparent_hugepage/enabled=madvise. Otherwise THP is only used when it's enabled system wide. Signed-off-by: Luiz Capitulino Signed-off-by: Anthony Liguori Signed-off-by: Andrea Arcangeli --- exec.c | 1 + osdep.h | 5 + 2 files changed, 6 i

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Andrea Arcangeli
Hello, On Wed, Dec 05, 2018 at 01:59:32PM -0800, David Rientjes wrote: > [..] and the kernel test robot has reported, [..] Just for completeness you may have missed one email: https://lkml.kernel.org/r/87tvk1yjkp@yhuang-dev.intel.com 'So I think the report should have been a "performance

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Andrea Arcangeli
On Wed, Dec 05, 2018 at 02:03:10PM -0800, Linus Torvalds wrote: > On Wed, Dec 5, 2018 at 12:40 PM Andrea Arcangeli wrote: > > > > So ultimately we decided that the saner behavior that gives the least > > risk of regression for the short term, until we can do something &g

Re: [patch 1/2 for-4.20] mm, thp: restore node-local hugepage allocations

2018-12-05 Thread Andrea Arcangeli
On Wed, Dec 05, 2018 at 09:15:28PM +0100, Michal Hocko wrote: > If the __GFP_THISNODE should be really used then it should be applied to > all other types of pages. Not only THP. And as such done in a separate > patch. Not a part of the revert. The cleanup was meant to unify THP > allocations and

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-05 Thread Andrea Arcangeli
On Wed, Dec 05, 2018 at 11:49:26AM -0800, David Rientjes wrote: > High thp utilization is not always better, especially when those hugepages > are accessed remotely and introduce the regressions that I've reported. > Seeking high thp utilization at all costs is not the goal if it causes >

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-05 Thread Andrea Arcangeli
Hello, Sorry, it has been challenging to keep up with all fast replies, so I'll start by answering to the critical result below: On Tue, Dec 04, 2018 at 10:45:58AM +, Mel Gorman wrote: > thpscale Percentage Faults Huge >4.20.0-rc4 4.20.0-rc4 >

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-03 Thread Andrea Arcangeli
On Mon, Dec 03, 2018 at 11:28:07AM -0800, Linus Torvalds wrote: > On Mon, Dec 3, 2018 at 10:59 AM Michal Hocko wrote: > > > > You are misinterpreting my words. I haven't dismissed anything. I do > > recognize both usecases under discussion. > > > > I have merely said that a better THP locality

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-12-03 Thread Andrea Arcangeli
On Mon, Dec 03, 2018 at 07:59:54PM +0100, Michal Hocko wrote: > I have merely said that a better THP locality needs more work and during > the review discussion I have even volunteered to work on that. There > are other reclaim related fixes under work right now. All I am saying > is that

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-11-28 Thread Andrea Arcangeli
On Wed, Nov 28, 2018 at 08:48:46AM -0800, Linus Torvalds wrote: > On Tue, Nov 27, 2018 at 7:20 PM Huang, Ying wrote: > > > > From the above data, for the parent commit 3 processes exited within > > 14s, another 3 exited within 100s. For this commit, the first process > > exited at 203s. That

Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression

2018-11-27 Thread Andrea Arcangeli
a "stable" state without introducing new (minor) features. The below is for further review of the potential alternative (which has still margin for improvement). === From: Andrea Arcangeli Subject: [PATCH 1/2] mm: thp: consolidate policy_nodemask call Just a minor cleanup. Signed-o

Re: [patch V2 27/28] x86/speculation: Add seccomp Spectre v2 user space protection mode

2018-11-26 Thread Andrea Arcangeli
Hello, On Sun, Nov 25, 2018 at 11:28:59PM +0100, Thomas Gleixner wrote: > Indeed. Just checked the documentation again, it's also not clear whether > IBPB is required if STIPB is in use. I tried to ask this question too earlier: https://lkml.kernel.org/r/20181119234528.gj29...@redhat.com If

[PATCH 1/5] userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails

2018-11-26 Thread Andrea Arcangeli
ultfd support") Signed-off-by: Andrea Arcangeli --- mm/hugetlb.c | 2 +- mm/shmem.c | 2 +- mm/userfaultfd.c | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7f2a28ab46d5..705a3e9cc910 100644 --- a/mm/hugetlb.c +++ b/mm

[PATCH 4/5] userfaultfd: shmem: add i_size checks

2018-11-26 Thread Andrea Arcangeli
("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Cc: sta...@vger.kernel.org Signed-off-by: Andrea Arcangeli --- mm/shmem.c | 18 -- mm/userfaultfd.c | 26 -- 2 files changed, 40 insertions(+), 4 deletions(-) diff

[PATCH 2/5] userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem

2018-11-26 Thread Andrea Arcangeli
. Reported-by: Mike Rapoport Reviewed-by: Hugh Dickins Cc: sta...@vger.kernel.org Fixes: 4c27fe4c4c84 ("userfaultfd: shmem: add shmem_mcopy_atomic_pte for userfaultfd support") Signed-off-by: Andrea Arcangeli --- mm/userfaultfd.c | 15 +-- 1 file changed, 13 insertions(+), 2

[PATCH 0/5] userfaultfd shmem updates

2018-11-26 Thread Andrea Arcangeli
, Andrea Andrea Arcangeli (5): userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas userfaultfd: shmem: add i_size checks userfaultfd

[PATCH 5/5] userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set

2018-11-26 Thread Andrea Arcangeli
c_pte for userfaultfd support") Reported-by: Hugh Dickins Signed-off-by: Andrea Arcangeli --- mm/shmem.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index c3ece7a51949..82a381d463bc 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2272,6 +2272,16 @@

[PATCH 3/5] userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas

2018-11-26 Thread Andrea Arcangeli
ernel.org Signed-off-by: Andrea Arcangeli --- fs/userfaultfd.c | 15 +++ mm/userfaultfd.c | 15 ++- 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 356d2b8568c1..cd58939dc977 100644 --- a/fs/userfaultfd.c ++

Re: [PATCH] mm: put_and_wait_on_page_locked() while page is migrated

2018-11-24 Thread Andrea Arcangeli
w treat a PageWaiters page as if an extra > reference were held? Perhaps, but I don't think it matters much, since > shrink_page_list() already had to win its trylock_page(), so waiters are > not very common there: I noticed no difference when trying the bigger > change, and it's surely not needed while put_and_wait_on_page_locked() > is only used for page migration. > > Reported-and-tested-by: Baoquan He > Signed-off-by: Hugh Dickins > Acked-by: Michal Hocko Reviewed-by: Andrea Arcangeli

Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory

2018-11-20 Thread Andrea Arcangeli
On Tue, Nov 20, 2018 at 12:11:22PM +0300, Kirill A. Shutemov wrote: > On Sat, Nov 10, 2018 at 11:44:12AM -0500, Andrea Arcangeli wrote: > > I would prefer to add intelligence to detect when COWs after fork > > should be done at 2m or 4k granularity (in the latter case by > &

Re: [Patch v5 11/16] x86/speculation: Add Spectre v2 app to app protection modes

2018-11-19 Thread Andrea Arcangeli
On Mon, Nov 19, 2018 at 03:25:41PM -0800, Dave Hansen wrote: > On 11/19/18 3:16 PM, Andrea Arcangeli wrote: > > So you may want to ask why it wasn't written as your "any" vs "any" email: > > Presumably because the authors really and truly meant what they said.

Re: [Patch v5 11/16] x86/speculation: Add Spectre v2 app to app protection modes

2018-11-19 Thread Andrea Arcangeli
On Mon, Nov 19, 2018 at 01:33:08PM -0800, Dave Hansen wrote: > On 11/19/18 11:32 AM, Andrea Arcangeli wrote: > > The specs don't say if by making it immune from BTB mistraining, it > > also could prevent to mistrain the BTB in order to attack what's > > outside the SECCOMP ja

Re: [Patch v5 11/16] x86/speculation: Add Spectre v2 app to app protection modes

2018-11-19 Thread Andrea Arcangeli
On Mon, Nov 19, 2018 at 08:39:41PM +0100, Jiri Kosina wrote: > On Mon, 19 Nov 2018, Andrea Arcangeli wrote: > > > Generally speaking the untrusted code that would try to use spectrev2 > > to attack the other processes is more likely to run inside SECCOMP > > jail tha

Re: [Patch v5 11/16] x86/speculation: Add Spectre v2 app to app protection modes

2018-11-19 Thread Andrea Arcangeli
Hello everyone, On Mon, Nov 19, 2018 at 02:49:36PM +0100, Jiri Kosina wrote: > On Mon, 19 Nov 2018, Thomas Gleixner wrote: > > > > On Sat, 17 Nov 2018, Jiri Kosina wrote: > > > > > Subject: [PATCH] x86/speculation: enforce STIBP for SECCOMP tasks in lite > > > mode > > > > > > If 'lite' mode

Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory

2018-11-10 Thread Andrea Arcangeli
On Sat, Nov 10, 2018 at 01:22:49PM +, Mel Gorman wrote: > On Fri, Nov 09, 2018 at 02:51:50PM -0500, Andrea Arcangeli wrote: > > And if you're in the camp that is concerned about the use of more RAM > > or/and about the higher latency of COW faults, I'm afraid the > >

Re: [RFC PATCH] mm: thp: implement THP reservations for anonymous memory

2018-11-09 Thread Andrea Arcangeli
Hello, On Fri, Nov 09, 2018 at 03:13:18PM +0300, Kirill A. Shutemov wrote: > On Thu, Nov 08, 2018 at 10:48:58PM -0800, Anthony Yznaga wrote: > > The basic idea as outlined by Mel Gorman in [2] is: > > > > 1) On first fault in a sufficiently sized range, allocate a huge page > >sized and

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-29 Thread Andrea Arcangeli
Hello, On Mon, Oct 29, 2018 at 11:08:34AM +0100, Michal Hocko wrote: > This seems like a separate issue which should better be debugged. Please > open a new thread describing the problem and the state of the node. Yes, in my view it should be evaluated separately too, because it's overall less

Re: [PATCH] hugetlbfs: dirty pages as they are added to pagecache

2018-10-18 Thread Andrea Arcangeli
On Thu, Oct 18, 2018 at 04:16:40PM -0700, Mike Kravetz wrote: > I was not sure about this, and expected someone could come up with > something better. It just seems there are filesystems like huegtlbfs, > where it makes no sense wasting cycles traversing the filesystem. So, > let's not even try.

Re: [PATCH v3 2/2] sysctl: handle overflow for file-max

2018-10-18 Thread Andrea Arcangeli
Hi Al, On Wed, Oct 17, 2018 at 01:35:48AM +0100, Al Viro wrote: > On Wed, Oct 17, 2018 at 12:33:22AM +0200, Christian Brauner wrote: > > Currently, when writing > > > > echo 18446744073709551616 > /proc/sys/fs/file-max > > > > /proc/sys/fs/file-max will overflow and be set to 0. That quickly >

Re: possible deadlock in aio_poll

2018-10-17 Thread Andrea Arcangeli
break; > } > - spin_unlock(>fd_wqh.lock); > + spin_unlock_irq(>fd_wqh.lock); > schedule(); > - spin_lock(>fd_wqh.lock); > + spin_lock_irq(>fd_wqh.lock); > } > __remove_wait_queue(>fd_wq

Re: [PATCH] mm/thp: Correctly differentiate between mapped THP and PMD migration entry

2018-10-16 Thread Andrea Arcangeli
Hello Zi, On Sun, Oct 14, 2018 at 08:53:55PM -0400, Zi Yan wrote: > Hi Andrea, what is the purpose/benefit of making x86’s pmd_present() returns > true > for a THP under splitting? Does it cause problems when ARM64’s pmd_present() > returns false in the same situation? !pmd_present means it's a

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-16 Thread Andrea Arcangeli
Hello, On Tue, Oct 16, 2018 at 03:37:15PM -0700, Andrew Morton wrote: > we'll still make it into 4.19.1. Am reluctant to merge this while > discussion, testing and possibly more development are ongoing. I think there can be definitely more developments primarily to make the compact deferred

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-15 Thread Andrea Arcangeli
Hello Andrew, On Mon, Oct 15, 2018 at 03:44:59PM -0700, Andrew Morton wrote: > On Mon, 15 Oct 2018 15:30:17 -0700 (PDT) David Rientjes > wrote: > > Would it be possible to test with my > > patch[*] that does not try reclaim to address the thrashing issue? > > Yes please. It'd also be great

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-15 Thread Andrea Arcangeli
On Mon, Oct 15, 2018 at 03:30:17PM -0700, David Rientjes wrote: > At the risk of beating a dead horse that has already been beaten, what are > the plans for this patch when the merge window opens? It would be rather > unfortunate for us to start incurring a 14% increase in access latency and >

Re: [PATCH] mm/thp: fix call to mmu_notifier in set_pmd_migration_entry()

2018-10-12 Thread Andrea Arcangeli
On Fri, Oct 12, 2018 at 01:35:19PM -0400, Jerome Glisse wrote: > On Fri, Oct 12, 2018 at 01:24:22PM -0400, Andrea Arcangeli wrote: > > Hello, > > > > On Fri, Oct 12, 2018 at 12:20:54PM -0400, Zi Yan wrote: > > > On 12 Oct 2018, at 12:09, jgli...@redhat.com wrot

Re: [PATCH] mm/thp: fix call to mmu_notifier in set_pmd_migration_entry()

2018-10-12 Thread Andrea Arcangeli
s not _range_only_end, if it was _range_only_end the above would be needed) > > calling mmu_notifier_invalidate_range_start/end() inside the function > > calling set_pmd_migration_entry() (see try_to_unmap_one()). > > > > Signed-off-by: Jérôme Glisse > > Reported-by: A

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Andrea Arcangeli
On Tue, Oct 09, 2018 at 04:25:10PM +0200, Michal Hocko wrote: > On Tue 09-10-18 14:00:34, Mel Gorman wrote: > > On Tue, Oct 09, 2018 at 02:27:45PM +0200, Michal Hocko wrote: > > > [Sorry for being slow in responding but I was mostly offline last few > > > days] > > > > > > On Tue 09-10-18

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Andrea Arcangeli
On Tue, Oct 09, 2018 at 03:17:30PM -0700, David Rientjes wrote: > causes workloads to severely regress both in fault and access latency when > we know that direct reclaim is unlikely to make direct compaction free an > entire pageblock. It's more likely than not that the reclaim was >

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Andrea Arcangeli
On Mon, Oct 08, 2018 at 01:41:09PM -0700, David Rientjes wrote: > The page allocator is expecting __GFP_NORETRY for thp allocations per its > comment: > > /* >* Checks for costly allocations with __GFP_NORETRY, which >* includes THP page fault

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-05 Thread Andrea Arcangeli
Hello, On Thu, Oct 04, 2018 at 04:05:26PM -0700, David Rientjes wrote: > The source of the problem needs to be addressed: memory compaction. We > regress because we lose __GFP_NORETRY and pointlessly try reclaim, but I commented in detail about the __GFP_NORETRY topic in the other email so I

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-05 Thread Andrea Arcangeli
Hi, On Fri, Oct 05, 2018 at 01:35:15PM -0700, David Rientjes wrote: > Why is it ever appropriate to do heavy reclaim and swap activity to > allocate a transparent hugepage? This is exactly what the __GFP_NORETRY > check for high-order allocations is attempting to avoid, and it explicitly >

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-04 Thread Andrea Arcangeli
Hello David, On Thu, Oct 04, 2018 at 01:16:32PM -0700, David Rientjes wrote: > There are ways to address this without introducing regressions for > existing users of MADV_HUGEPAGE: introduce an madvise() mode to accept > remote thp allocations, which users of this library would never set, or >

Re: [PATCH] mm, thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-09-12 Thread Andrea Arcangeli
Hello, On Tue, Sep 11, 2018 at 01:56:13PM +0200, Michal Hocko wrote: > Well, it seems that expectations differ for users. It seems that kvm > users do not really agree with your interpretation. Like David also mentioned here:

Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks

2018-09-05 Thread Andrea Arcangeli
On Wed, Sep 05, 2018 at 08:29:07PM +0200, Jiri Kosina wrote: > (and no, my testing of the patch I sent on current tree didn't produce any > hangs -- was there a reliable way to trigger it on 3.10?). Only a very specific libvirt acceptance test found this after a while and it wasn't a customer it

Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks

2018-09-05 Thread Andrea Arcangeli
On Wed, Sep 05, 2018 at 08:58:23AM -0700, Andi Kleen wrote: > > So, after giving it a bit more thought, I still believe "I want spectre V2 > > protection" vs. "I do not care about spectre V2 on my system > > (=nospectre_v2)" are the sane options we should provide; so I'll respin v4 > > of my

Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks

2018-09-04 Thread Andrea Arcangeli
On Wed, Sep 05, 2018 at 01:00:37AM +, Schaufler, Casey wrote: > Sorry, I've been working in security too long for my > optimistic streak to be very wide. Eheh. So I was simply trying to follow in context, but it wasn't entirely clear, so I tried to take it out of context and then it was even

Re: [PATCH v3 1/3] ptrace: Provide ___ptrace_may_access() that can be applied on arbitrary tasks

2018-09-04 Thread Andrea Arcangeli
Hello, On Tue, Sep 04, 2018 at 06:10:47PM +, Schaufler, Casey wrote: > The real reason to use an LSM based approach is that overloading ptrace > checks is a Really Bad Idea. Ptrace is a user interface. Side-channel is a > processor interface. Even if ptrace_may_access() does exactly what you

Re: [PATCH] KVM: try __get_user_pages_fast even if not in atomic context

2018-08-06 Thread Andrea Arcangeli
n't be calling get_user_pages_unlocked in hva_to_pfn_slow if we could pass FOLL_HWPOISON to get_user_pages_fast. And get_user_pages_fast is really just __get_user_pages_fast + get_user_pages_unlocked with just a difference (see below). Reviewed-by: Andrea Arcangeli > > > Can we apply this tech to

Re: [PATCH] userfaultfd: hugetlbfs: Fix userfaultfd_huge_must_wait pte access

2018-07-03 Thread Andrea Arcangeli
Hello, On Wed, Jun 27, 2018 at 10:47:44AM +0200, Janosch Frank wrote: > On 26.06.2018 19:00, Mike Kravetz wrote: > > On 06/26/2018 06:24 AM, Janosch Frank wrote: > >> Use huge_ptep_get to translate huge ptes to normal ptes so we can > >> check them with the huge_pte_* functions. Otherwise some

Re: [PATCH v2] mm/ksm: ignore STABLE_FLAG of rmap_item->address in rmap_walk_ksm

2018-06-07 Thread Andrea Arcangeli
..@gmail.com > Signed-off-by: Jia He > Cc: Suzuki K Poulose > Cc: Andrea Arcangeli > Cc: Minchan Kim > Cc: Claudio Imbrenda > Cc: Arvind Yadav > Cc: Mike Rapoport > Cc: Jia He > Cc: > Signed-off-by: Andrew Morton > --- > Reviewed-by: Andrea Arcangeli

Re: [patch V2 1/2] sysfs/cpu: Add vulnerability folder

2018-01-26 Thread Andrea Arcangeli
his would be used for any other equivalent issue in the future and it won't stick to these 3 files, I didn't implement that yet, because it's less urgent if nobody adds any more files soon. >From 578b411c8dcb1435dd1f94a6cd062f4eedb70fb5 Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli <aarc

Re: [PATCH v2] mm: Reduce memory bloat with THP

2018-01-25 Thread Andrea Arcangeli
On Thu, Jan 25, 2018 at 10:58:32AM +0100, Michal Hocko wrote: > Ohh, absolutely. And that is why we have changed the default in upstream > 444eb2a449ef ("mm: thp: set THP defrag by default to madvise and add a > stall-free defrag option") Agreed, that direct compaction change should already

Re: [PATCH v2] mm: Reduce memory bloat with THP

2018-01-25 Thread Andrea Arcangeli
On Thu, Jan 25, 2018 at 11:41:03AM -0800, Nitin Gupta wrote: > I'm trying to address many different THP issues and memory bloat is > first among them. You quoted redis in an earlier email, the redis issue has nothing to do with MADV_DONTNEED. I can quickly explain the redis issue. Redis uses

Re: [RH72 Spectre] ibpb_enabled = 1 leads to hard LOCKUP under x86_64 host machine

2018-01-20 Thread Andrea Arcangeli
Hello everyone, On Sat, Jan 20, 2018 at 01:56:08PM +, Van De Ven, Arjan wrote: > well first of all don't use IBRS, use retpoline This issue triggers in the IBPB code during user to user context switch and IBPB is still needed there no matter if kernel is using retpolines or if it uses kernel

Re: [PATCH 23/35] x86/speculation: Add basic speculation control code

2018-01-19 Thread Andrea Arcangeli
On Fri, Jan 19, 2018 at 04:15:33AM +, Van De Ven, Arjan wrote: > there is no such guarantee. Some of the IBRS implementations will > actually flush rather than disable, or flush parts and disable other > parts. To me it helps in order to memorize the spec to understand why the spec is the way

Re: [PATCH 23/35] x86/speculation: Add basic speculation control code

2018-01-18 Thread Andrea Arcangeli
Hello, On Thu, Jan 18, 2018 at 03:25:25PM -0800, Andy Lutomirski wrote: > I read the whitepaper that documented the new MSRs a couple days ago > and I'm now completely unable to find it. If anyone could send the > link, that would be great. I see Andrew posted a link. > From memory, however,

Re: [PATCH 23/35] x86/speculation: Add basic speculation control code

2018-01-18 Thread Andrea Arcangeli
On Thu, Jan 18, 2018 at 12:24:31PM -0600, Josh Poimboeuf wrote: > On Thu, Jan 18, 2018 at 06:12:36PM +0100, Paolo Bonzini wrote: > > On 18/01/2018 18:08, Dave Hansen wrote: > > > On 01/18/2018 08:37 AM, Josh Poimboeuf wrote: > > >>> > > >>> --- a/Documentation/admin-guide/kernel-parameters.txt > >

Re: [mm 4.15-rc8] Random oopses under memory pressure.

2018-01-18 Thread Andrea Arcangeli
On Thu, Jan 18, 2018 at 06:45:00AM -0800, Dave Hansen wrote: > On 01/18/2018 04:25 AM, Kirill A. Shutemov wrote: > > [ 10.084024] diff: -858690919 > > [ 10.084258] hpage_nr_pages: 1 > > [ 10.084386] check1: 0 > > [ 10.084478] check2: 0 > ... > > diff --git a/mm/page_vma_mapped.c

Re: [PATCH] x86/pti: unpoison pgd for trusted boot

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 02:49:39PM -0800, Dave Hansen wrote: > > Updated to make this on top of x86/pti. Reviewed-by: Andrea Arcangeli <aarca...@redhat.com>

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 01:35:45PM -0800, Tim Chen wrote: > time may not provide full protection on all cpu models. All right no problem at all, it's fixed up. Until very recently the majority of microcodes wasn't available in the first place so I guess it's no big issue if in a subset of those

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 03:24:17PM +, David Woodhouse wrote: > Since it achieves nothing¹ but to make userspace run slower, there's no > need to write it again on returning to userspace. It will perform that > function just fine without doing so. Ok, very glad we are on the same page now.

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 06:59:54AM -0800, Dave Hansen wrote: > On 01/10/2018 06:10 AM, Andrea Arcangeli wrote: > > Tim and Dave please comment too, Tim you originally wrote that code > > that leaves IBRS always on and never toggles it in the kernel entry > > point so yo

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
Hello, On Wed, Jan 10, 2018 at 02:46:22PM +0100, Thomas Gleixner wrote: > So here is the simple list of questions all to be answered with YES or > NO. I don't want to see any of the 'but, though ...'. We all know by now > that it's CPU dependent and slow and whatever and that IBRS_ATT will be in

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 01:45:52PM +, Van De Ven, Arjan wrote: > > > Andrea, what you're saying is directly contradicting what I've heard > > from Intel. > > > > The documentation already distinguishes between IBRS on current > > hardware, and IBRS_ATT on future hardware. If it was the case

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 02:10:10PM +0100, Andrea Arcangeli wrote: > It's still incredibly faster to shutdown part of the CPU temporarily > than to flush its internal state as a whole with IBPB. If it wouldn't > be the case ibrs_enabled 0 ibpb_enabled 2 special mode would perform

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 02:05:51PM +0100, Andrea Arcangeli wrote: > Also note, the slowdown of setting IBRS varies with older CPUs being To give a further detail, older CPUs to provide IBRS semantics have to do something even less finegrined that doesn't just restricts speculation across I

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 02:02:02PM +0100, Andrea Arcangeli wrote: > On Wed, Jan 10, 2018 at 12:51:13PM +, David Woodhouse wrote: > > If it worked as Andrea suggests, then there would be absolutely no > > point in the patches we've seen which add the IBRS-frobbing on sy

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 12:51:13PM +, David Woodhouse wrote: > If it worked as Andrea suggests, then there would be absolutely no > point in the patches we've seen which add the IBRS-frobbing on syscall > entry and vmexit. This is perhaps what you're missing I think, there is a huge point:

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 01:47:22PM +0100, Jiri Kosina wrote: > On Wed, 10 Jan 2018, Andrea Arcangeli wrote: > > > Perhaps the confusing come from "less privileged prediction mode" and > > you thought that meant "less privileged ring mode". It says &quo

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 12:29:44PM +, David Woodhouse wrote: > On Wed, 2018-01-10 at 13:17 +0100, Andrea Arcangeli wrote: > > On Wed, Jan 10, 2018 at 12:09:34PM +, David Woodhouse wrote: > > > That is not consistent with the documentation I've seen, which Intel > &g

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 01:20:45PM +0100, Andrea Arcangeli wrote: > On Wed, Jan 10, 2018 at 12:12:53PM +, David Woodhouse wrote: > > IBRS is like a barrier. You must write it between the 'problematic' > > loading of the branch targets, and the kernel code which might

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 12:12:53PM +, David Woodhouse wrote: > IBRS is like a barrier. You must write it between the 'problematic' > loading of the branch targets, and the kernel code which might be > affected. > > You cannot, on current hardware, merely set it once and forget about > it.

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 12:09:34PM +, David Woodhouse wrote: > That is not consistent with the documentation I've seen, which Intel > have so far utterly failed to publish AFAICT. > > "a near indirect jump/call/return may be affected by code in a less privileged > prediction mode that

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 01:01:58PM +0100, Andrea Arcangeli wrote: > On Wed, Jan 10, 2018 at 11:58:54AM +, David Woodhouse wrote: > > On Wed, 2018-01-10 at 12:54 +0100, Andrea Arcangeli wrote: > > > On Wed, Jan 10, 2018 at 09:27:59AM +, David Woodhouse wrote: > > >

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 11:58:54AM +, David Woodhouse wrote: > On Wed, 2018-01-10 at 12:54 +0100, Andrea Arcangeli wrote: > > On Wed, Jan 10, 2018 at 09:27:59AM +, David Woodhouse wrote: > > > I don't know why you're calling that 'IBRS=2'; are you getting > > co

Re: [patch RFC 5/5] x86/speculation: Add basic speculation control code

2018-01-10 Thread Andrea Arcangeli
On Wed, Jan 10, 2018 at 09:27:59AM +, David Woodhouse wrote: > I don't know why you're calling that 'IBRS=2'; are you getting confused > by Andrea's distro horridness? Eh, yes he's got confused. ibrs_enabled 2 simply means to leave IBRS set in SPEC_CTLR 100% of the time, except in guest mode.

Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Andrea Arcangeli
m> Cc: Ning Sun <ning@intel.com> Cc: Thomas Gleixner <t...@linutronix.de> Cc: Ingo Molnar <mi...@redhat.com> Cc: "H. Peter Anvin" <h...@zytor.com> Cc: x...@kernel.org Cc: tboot-de...@lists.sourceforge.net Cc: linux-kernel@vger.kernel.org Signed-off-by: Andrea

Re: [PATCH v2 4/8] x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature

2018-01-08 Thread Andrea Arcangeli
On Mon, Jan 08, 2018 at 08:43:40PM +0300, Alexey Dobriyan wrote: > > + len = sprintf(buf, "%d\n", READ_ONCE(*field)); > > READ_ONCE isn't necessary as there is only one read being made. Others might disagree but you shouldn't ever let gcc touch any memory that can change under gcc without

Re: Avoid speculative indirect calls in kernel

2018-01-08 Thread Andrea Arcangeli
Signed-off-by: Dave Hansen <dave.han...@linux.intel.com> Cc: Ning Sun <ning@intel.com> Cc: Thomas Gleixner <t...@linutronix.de> Cc: Ingo Molnar <mi...@redhat.com> Cc: "H. Peter Anvin" <h...@zytor.com> Cc: x...@kernel.org Cc: tboot-de...@lists.sourcefor

Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-08 Thread Andrea Arcangeli
On Sat, Jan 06, 2018 at 08:41:34PM +0100, Thomas Gleixner wrote: > optimized argumentation. We need to make sure that we have a solution which > kills the problem safely and then take it from there. Correctness first, > optimization later is the rule for this. Better safe than sorry. Agreed,

Re: [RFC] boot failed when enable KAISER/KPTI

2018-01-06 Thread Andrea Arcangeli
Hello Xishi, On Sat, Jan 06, 2018 at 02:45:30PM +0800, Xishi Qiu wrote: > How about this fix patch? I tested and it works. > > diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c > index 088681d..f6c32f5 100644 > --- a/arch/x86/kernel/tboot.c > +++ b/arch/x86/kernel/tboot.c > @@

Re: [PATCH 05/23] x86, kaiser: unmap kernel from userspace page tables (core patch)

2018-01-06 Thread Andrea Arcangeli
On Fri, Jan 05, 2018 at 11:51:38PM -0800, Dave Hansen wrote: > On 01/05/2018 10:28 PM, Hanjun Guo wrote: > >> + > >>p4d = p4d_alloc(_mm, pgd, vaddr); > > Seems pgd will be re-set after p4d_alloc(), so should > > we put the code behind (or after pud_alloc())? Thanks Dave and Jiri for these two

Re: [PATCH 05/23] x86, kaiser: unmap kernel from userspace page tables (core patch)

2018-01-05 Thread Andrea Arcangeli
n it, and therefore EFI can't execute it's code. Fix that by forcefully clearing _PAGE_NX from the PGD (this can't be done by the pgprot API). _PAGE_NX will be automatically reintroduced in efi_call_phys_epilog(), as _set_pgd() will again notice that this is _PAGE_USER, and set _PAGE_NX on it. Signed-

Re: [PATCH 0/7] IBRS patch series

2018-01-05 Thread Andrea Arcangeli
On Fri, Jan 05, 2018 at 04:37:30PM +, David Woodhouse wrote: > You are completely ignoring pre-Skylake here. > > On pre-Skylake, retpoline is perfectly sufficient and it's a *lot* > faster than the IBRS option which is almost prohibitively slow. > > We didn't do it just for fun. And it's

Re: [PATCH 5/7] x86: Use IBRS for firmware update path

2018-01-05 Thread Andrea Arcangeli
On Fri, Jan 05, 2018 at 05:08:48PM +0100, Greg Kroah-Hartman wrote: > On Thu, Jan 04, 2018 at 08:08:55PM +, Woodhouse, David wrote: > > On Thu, 2018-01-04 at 21:05 +0100, Greg KH wrote: > > > On Thu, Jan 04, 2018 at 09:56:46AM -0800, Tim Chen wrote: > > > > > > > > From: David Woodhouse

Re: [PATCH 0/7] IBRS patch series

2018-01-05 Thread Andrea Arcangeli
On Fri, Jan 05, 2018 at 03:38:24PM +, David Woodhouse wrote: > We had IBRS first, and especially on Broadwell and earlier, its > performance really is painful. > > Then came retpoline, purely as an optimisation. A very *important* > performance improvement, but an optimisation nonetheless. >

Re: [PATCH 7/7] x86/microcode: Recheck IBRS features on microcode reload

2018-01-05 Thread Andrea Arcangeli
Hello everyone, On Fri, Jan 05, 2018 at 02:32:17PM +0100, Greg Kroah-Hartman wrote: > On Thu, Jan 04, 2018 at 07:50:33PM +0100, Borislav Petkov wrote: > > On Thu, Jan 04, 2018 at 07:34:30PM +0100, Andrea Arcangeli wrote: > > > void spec_ctrl_rescan_cpuid(void) >

Re: [PATCH 0/7] IBRS patch series

2018-01-05 Thread Andrea Arcangeli
On Fri, Jan 05, 2018 at 02:52:33PM +, Van De Ven, Arjan wrote: > I'm sorry but your whole statement reeks a little bit of "perfect is the > enemy of good" My point is exactly that this sentences could apply to spectre variant#2 in the first place.. If we start moving in any direction,

Re: [PATCH 0/7] IBRS patch series

2018-01-05 Thread Andrea Arcangeli
On Thu, Jan 04, 2018 at 09:22:34PM +, Van De Ven, Arjan wrote: > personally I am comfortable with retpoline on Skylake, but I would >like to have IBRS as an opt in for the paranoid. I think this whole variant#2 issue has to be fixed mathematically or not at all, the reason is that it's

Re: [PATCH 1/7] x86/feature: Detect the x86 feature to control Speculation

2018-01-05 Thread Andrea Arcangeli
On Fri, Jan 05, 2018 at 02:09:43PM +0100, Thomas Gleixner wrote: > On Thu, 4 Jan 2018, Tim Chen wrote: > > +#define MSR_IA32_SPEC_CTRL 0x0048 > > +#define SPEC_CTRL_FEATURE_DISABLE_IBRS (0 << 0) > > +#define SPEC_CTRL_FEATURE_ENABLE_IBRS (1 << 0) > > + > > +#define

Re: [PATCH 05/23] x86, kaiser: unmap kernel from userspace page tables (core patch)

2018-01-05 Thread Andrea Arcangeli
running into this. This is the first time I hear about this, sorry about that. I fixed it with the upstream solution, greatly appreciated the pointer Dave. I don't have hardware to verify it though so we've to follow up on bz. Thanks, Andrea >From 74e2d799b7c22f00a8d3158958e3d6d9fa45c1d2 Mon

Re: [PATCH 4/7] x86/idle: Disable IBRS entering idle and enable it on wakeup

2018-01-04 Thread Andrea Arcangeli
On Fri, Jan 05, 2018 at 12:45:58AM +0100, Thomas Gleixner wrote: > What's the problem to make the early update mandatory for this? That will make a few differences. A host reboot will be required to use the microcode features, if you upgrade the kernel before the microcode_ctl package and you

Re: [PATCH 6/7] x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature

2018-01-04 Thread Andrea Arcangeli
On Thu, Jan 04, 2018 at 03:26:52PM -0800, Tim Chen wrote: > On 01/04/2018 02:54 PM, Peter Zijlstra wrote: > > On Thu, Jan 04, 2018 at 09:56:47AM -0800, Tim Chen wrote: > >> .macro ENABLE_IBRS > >> - ALTERNATIVE "jmp 10f", "", X86_FEATURE_SPEC_CTRL > >> + testl $SPEC_CTRL_IBRS_INUSE,

Re: [PATCH 4/7] x86/idle: Disable IBRS entering idle and enable it on wakeup

2018-01-04 Thread Andrea Arcangeli
On Thu, Jan 04, 2018 at 03:22:09PM -0800, Tim Chen wrote: > No one should be calling this with IRQs enabled. This check is probably > just paranoid. I can get rid of it. Yes, confirmed. > It probably doesn't matter as we will be switching the check > to the spec_ctrl_ibrs a couple of patches

Re: [PATCH 3/7] x86/enter: Use IBRS on syscall and interrupts

2018-01-04 Thread Andrea Arcangeli
On Thu, Jan 04, 2018 at 11:33:21PM +0100, Peter Zijlstra wrote: > So not only did we add a CR3 write, we're now adding an MSR write to the > entry/exit paths. Please tell me that these are 'fast' MSRs? Given > people are already reporting stupid numbers with just the existing > PTI/CR3, what kind

Re: [PATCH 4/7] x86/idle: Disable IBRS entering idle and enable it on wakeup

2018-01-04 Thread Andrea Arcangeli
On Thu, Jan 04, 2018 at 11:47:31PM +0100, Peter Zijlstra wrote: > Argh.. no. Who is calling this with IRQs enabled? And why can't we frob > the MSR with IRQs enabled? That comment doesn't seem to explain > anything. Why we can't is easy to explain, the irq handler would run in such case and that

Re: [PATCH 5/7] x86: Use IBRS for firmware update path

2018-01-04 Thread Andrea Arcangeli
On Thu, Jan 04, 2018 at 09:05:15PM +0100, Greg Kroah-Hartman wrote: > On Thu, Jan 04, 2018 at 09:56:46AM -0800, Tim Chen wrote: > > From: David Woodhouse > > > > We are impervious to the indirect branch prediction attack with retpoline > > but firmware won't be, so we still

Re: [PATCH 6/7] x86/spec_ctrl: Add sysctl knobs to enable/disable SPEC_CTRL feature

2018-01-04 Thread Andrea Arcangeli
On Thu, Jan 04, 2018 at 07:52:19PM +0100, Borislav Petkov wrote: > So why not "IBRS always" or off? No need for the "IBRS only in the > kernel" setting. Because it's slower (or much slower depending on how much stuff the microcode has to disable in the CPU to provide IBSR) and you only need that

  1   2   3   4   5   6   7   8   9   10   >