Re: [PATCH v2 5/9] mm: Initialize struct vm_unmapped_area_info

2024-02-28 Thread Kirill A. Shutemov
et = 0; return vm_unmapped_area(); } -- Kiryl Shutsemau / Kirill A. Shutemov

Re: [RFC PATCH v11 10/29] mm: Add AS_UNMOVABLE to mark mapping as completely unmovable

2023-07-26 Thread Kirill A . Shutemov
On Tue, Jul 25, 2023 at 01:51:55PM +0100, Matthew Wilcox wrote: > On Tue, Jul 25, 2023 at 01:24:03PM +0300, Kirill A . Shutemov wrote: > > On Tue, Jul 18, 2023 at 04:44:53PM -0700, Sean Christopherson wrote: > > > diff --git a/mm/compaction.c b/mm/compaction.c > &

Re: [RFC PATCH v11 10/29] mm: Add AS_UNMOVABLE to mark mapping as completely unmovable

2023-07-25 Thread Kirill A . Shutemov
mapping is still tied to the folio). Vlastimil, any comments? -- Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH 00/14] arch,mm: cleanup Kconfig entries for ARCH_FORCE_MAX_ORDER

2023-03-23 Thread Kirill A. Shutemov
h/m68k/Kconfig.cpu | 16 +--- > arch/nios2/Kconfig| 17 + > arch/powerpc/Kconfig | 22 +- > arch/sh/mm/Kconfig| 19 +-- > arch/sparc/Kconfig| 16 +++++--- > arch/xtensa/Kconfig | 16 +--- > 10 files changed, 76 insertions(+), 80 deletions(-) Acked-by: Kirill A. Shutemov -- Kiryl Shutsemau / Kirill A. Shutemov

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-24 Thread Kirill A. Shutemov
On Thu, Sep 23, 2021 at 08:21:03PM +0200, Borislav Petkov wrote: > On Thu, Sep 23, 2021 at 12:05:58AM +0300, Kirill A. Shutemov wrote: > > Unless we find other way to guarantee RIP-relative access, we must use > > fixup_pointer() to access any global variables. > > Yah, I've

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-22 Thread Kirill A. Shutemov
On Wed, Sep 22, 2021 at 09:52:07PM +0200, Borislav Petkov wrote: > On Wed, Sep 22, 2021 at 05:30:15PM +0300, Kirill A. Shutemov wrote: > > Not fine, but waiting to blowup with random build environment change. > > Why is it not fine? > > Are you suspecting that the co

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-22 Thread Kirill A. Shutemov
On Wed, Sep 22, 2021 at 08:40:43AM -0500, Tom Lendacky wrote: > On 9/21/21 4:58 PM, Kirill A. Shutemov wrote: > > On Tue, Sep 21, 2021 at 04:43:59PM -0500, Tom Lendacky wrote: > > > On 9/21/21 4:34 PM, Kirill A. Shutemov wrote: > > > > On Tue, Sep 21, 2021 at 11:

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Kirill A. Shutemov
On Tue, Sep 21, 2021 at 04:43:59PM -0500, Tom Lendacky wrote: > On 9/21/21 4:34 PM, Kirill A. Shutemov wrote: > > On Tue, Sep 21, 2021 at 11:27:17PM +0200, Borislav Petkov wrote: > > > On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote: > > >

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Kirill A. Shutemov
On Tue, Sep 21, 2021 at 11:27:17PM +0200, Borislav Petkov wrote: > On Wed, Sep 22, 2021 at 12:20:59AM +0300, Kirill A. Shutemov wrote: > > I still believe calling cc_platform_has() from __startup_64() is totally > > broken as it lacks proper wrapping while accessing global varia

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-21 Thread Kirill A. Shutemov
rypt_identity.c @@ -288,7 +288,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp) unsigned long pgtable_area_len; unsigned long decrypted_base; - if (!cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) + if (1 || !cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)) return; /* -- Kirill A. Shutemov

Re: [PATCH v3 5/8] x86/sme: Replace occurrences of sme_active() with cc_platform_has()

2021-09-20 Thread Kirill A. Shutemov
have a special version of the helper). Note that only AMD requires these cc_platform_has() to return true. -- Kirill A. Shutemov

Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-12 Thread Kirill A. Shutemov
On Wed, Aug 11, 2021 at 10:52:55AM -0500, Tom Lendacky wrote: > On 8/11/21 7:19 AM, Kirill A. Shutemov wrote: > > On Tue, Aug 10, 2021 at 02:48:54PM -0500, Tom Lendacky wrote: > >> On 8/10/21 1:45 PM, Kuppuswamy, Sathyanarayanan wrote: > >>> > >>> &g

Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-11 Thread Kirill A. Shutemov
ared/unencrypted > area, though? Or since it is shared, there's actually nothing you need to > do (the bss decrpyted section exists even if CONFIG_AMD_MEM_ENCRYPT is not > configured)? AFAICS, only kvmclock uses __bss_decrypted. We don't enable kvmclock in TDX at the moment. It may change in the future. -- Kirill A. Shutemov

Re: [PATCH v7 01/11] mm/mremap: Fix race between MOVE_PMD mremap and pageout

2021-06-08 Thread Kirill A. Shutemov
On Tue, Jun 08, 2021 at 04:47:19PM +0530, Aneesh Kumar K.V wrote: > On 6/8/21 3:12 PM, Kirill A. Shutemov wrote: > > On Tue, Jun 08, 2021 at 01:22:23PM +0530, Aneesh Kumar K.V wrote: > > > > > > Hi Hugh, > > > > > > Hugh Dickins writes: > > &g

Re: [PATCH v7 01/11] mm/mremap: Fix race between MOVE_PMD mremap and pageout

2021-06-08 Thread Kirill A. Shutemov
> and old pfn > > unlock(pud_ptl) > ptep_clear_flush() > old pfn is free. > > Stale TLB entry > > Both the above race condition can be fixed if we force mremap path to > take rmap lock. > > Signed-off-by: Aneesh Kumar K.V Looks like it should be enough to address the race. It would be nice to understand what is performance overhead of the additional locking. Is it still faster to move single PMD page table under these locks comparing to moving PTE page table entries without the locks? -- Kirill A. Shutemov

Re: [PATCH] Raise the minimum GCC version to 5.2

2021-05-03 Thread Kirill A. Shutemov
you need to check it per distro. For Debian it would be here: https://distrowatch.com/table.php?distribution=debian -- Kirill A. Shutemov

Re: [PATCH 0/5] perf/mm: Fix PERF_SAMPLE_*_PAGE_SIZE

2020-11-16 Thread Kirill A. Shutemov
e it's an issue, but strictly speaking, size of page according to page table tree doesn't mean pagewalk would fill TLB entry of the size. CPU may support 1G pages in page table tree without 1G TLB at all. IIRC, current Intel CPU still don't have any 1G iTLB entries and fill 2M iTLB instead. -- Kirill A. Shutemov

Re: [PATCH v3 2/4] PM: hibernate: make direct map manipulations more explicit

2020-11-03 Thread Kirill A. Shutemov
On Tue, Nov 03, 2020 at 02:13:50PM +0200, Mike Rapoport wrote: > On Tue, Nov 03, 2020 at 02:08:16PM +0300, Kirill A. Shutemov wrote: > > On Sun, Nov 01, 2020 at 07:08:13PM +0200, Mike Rapoport wrote: > > > diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c > &

Re: [PATCH v3 0/4] arch, mm: improve robustness of direct map manipulation

2020-11-03 Thread Kirill A. Shutemov
arch, mm: make kernel_page_present() always available The series looks good to me (apart from the minor nit): Acked-by: Kirill A. Shutemov -- Kirill A. Shutemov

Re: [PATCH v3 2/4] PM: hibernate: make direct map manipulations more explicit

2020-11-03 Thread Kirill A. Shutemov
} else { > + debug_pagealloc_map_pages(page, 1, enable); > + } > +} > + > static int swsusp_page_is_free(struct page *); > static void swsusp_set_page_forbidden(struct page *); > static void swsusp_unset_page_forbidden(struct page *); -- Kirill A. Shutemov

Re: [PATCH v11 01/25] mm/gup: factor out duplicate code from four routines

2019-12-18 Thread Kirill A. Shutemov
On Wed, Dec 18, 2019 at 02:15:53PM -0800, John Hubbard wrote: > On 12/18/19 7:52 AM, Kirill A. Shutemov wrote: > > On Mon, Dec 16, 2019 at 02:25:13PM -0800, John Hubbard wrote: > > > +static void put_compound_head(struct page *page, int refs) > > > +{ > > > + /*

Re: [PATCH v11 06/25] mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM

2019-12-18 Thread Kirill A. Shutemov
arg is NULL) > + * and return -ENOTSUPP if DAX isn't allowed in this case: > + */ > + return __gup_longterm_locked(tsk, mm, start, nr_pages, pages, > + vmas, gup_flags | FOLL_TOUCH | > + FOLL_REMOTE); > + } > > return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, > locked, > -- > 2.24.1 > -- Kirill A. Shutemov

Re: [PATCH v11 04/25] mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages

2019-12-18 Thread Kirill A. Shutemov
ould save you an indentation level. > + int count = page_ref_dec_return(page); > + > + /* > + * devmap page refcounts are 1-based, rather than 0-based: if > + * refcount is 1, then the page is free and the refcount is > + * stable because nobody holds a reference on the page. > + */ > + if (count == 1) > + free_devmap_managed_page(page); > + else if (!count) > + __put_page(page); > + } > + > + return is_devmap; > +} > +EXPORT_SYMBOL(put_devmap_managed_page); > +#endif > -- > 2.24.1 > > -- Kirill A. Shutemov

Re: [PATCH v11 01/25] mm/gup: factor out duplicate code from four routines

2019-12-18 Thread Kirill A. Shutemov
age(page); > +} It's not terribly efficient. Maybe something like: VM_BUG_ON_PAGE(page_ref_count(page) < ref, page); if (refs > 2) page_ref_sub(page, refs - 1); put_page(page); ? -- Kirill A. Shutemov

Re: [PATCH v5 01/11] asm-generic/pgtable: Adds generic functions to monitor lockless pgtable walks

2019-10-08 Thread Kirill A. Shutemov
. Yes we do. MADV_DONTNEED used a lot by userspace memory allocators and it will be very noticible performance regression if we would switch it to down_write(mmap_sem). -- Kirill A. Shutemov

Re: [PATCH V4 2/2] mm/pgtable/debug: Add test validating architecture page table helpers

2019-10-07 Thread Kirill A. Shutemov
On Mon, Oct 07, 2019 at 03:51:58PM +0200, Ingo Molnar wrote: > > * Kirill A. Shutemov wrote: > > > On Mon, Oct 07, 2019 at 03:06:17PM +0200, Ingo Molnar wrote: > > > > > > * Anshuman Khandual wrote: > > > > > > > This adds a t

Re: [PATCH V4 2/2] mm/pgtable/debug: Add test validating architecture page table helpers

2019-10-07 Thread Kirill A. Shutemov
inline function + define. Something like: #define mm_p4d_folded mm_p4d_folded static inline bool mm_p4d_folded(struct mm_struct *mm) { return !pgtable_l5_enabled(); } But I don't see much reason to be more verbose here than needed. -- Kirill A. Shutemov

Re: [PATCH v4 03/11] mm/gup: Applies counting method to monitor gup_pgd_range

2019-09-30 Thread Kirill A. Shutemov
On Fri, Sep 27, 2019 at 08:40:00PM -0300, Leonardo Bras wrote: > As decribed, gup_pgd_range is a lockless pagetable walk. So, in order to ^ typo -- Kirill A. Shutemov

Re: [PATCH V3 0/2] mm/debug: Add tests for architecture exported page table helpers

2019-09-24 Thread Kirill A. Shutemov
wn) in pud_clear_tests() as there were no available > __pgd() definitions. > > - ARM32 > - IA64 Hm. Grep shows __pgd() definitions for both of them. Is it for specific config? -- Kirill A. Shutemov

Re: [PATCH V2 2/2] mm/pgtable/debug: Add test validating architecture page table helpers

2019-09-13 Thread Kirill A. Shutemov
but its just a single line. Kirill suggested this in the > previous version. There is a generic fallback definition but s390 has it's > own. This change overrides the generic one for x86 probably as a fix or as > an improvement. Kirill should be able to help classify it in which case it > can be a separate patch. I don't think it worth a separate patch. -- Kirill A. Shutemov

Re: [PATCH] mm/pgtable/debug: Fix test validating architecture page table helpers

2019-09-13 Thread Kirill A. Shutemov
/x86/mm/highmem_32.c#L34 > >> > >> I have not checked others, but I guess it is like that for all. > >> > > > > > > Seems like I answered too quickly. All kmap_atomic() do preempt_disable(), > > but not all pte_alloc_map() call kmap_atomic(). > > > > However, for instance ARM does: > > > > https://elixir.bootlin.com/linux/v5.3-rc8/source/arch/arm/include/asm/pgtable.h#L200 > > > > And X86 as well: > > > > https://elixir.bootlin.com/linux/v5.3-rc8/source/arch/x86/include/asm/pgtable_32.h#L51 > > > > Microblaze also: > > > > https://elixir.bootlin.com/linux/v5.3-rc8/source/arch/microblaze/include/asm/pgtable.h#L495 > > All the above platforms checks out to be using k[un]map_atomic(). I am > wondering whether > any of the intermediate levels will have similar problems on any these 32 bit > platforms > or any other platforms which might be using generic k[un]map_atomic(). No. Kernel only allocates pte page table from highmem. All other page tables are always visible in kernel address space. -- Kirill A. Shutemov

Re: [PATCH V2 2/2] mm/pgtable/debug: Add test validating architecture page table helpers

2019-09-12 Thread Kirill A. Shutemov
code here __init (or it's variants) so it can be discarded on boot. It has not use after that. -- Kirill A. Shutemov

Re: [PATCH 1/1] mm/pgtable/debug: Add test validating architecture page table helpers

2019-09-09 Thread Kirill A. Shutemov
le entry from generic code like this test case is bit tricky. That > >> is because there are not enough helpers to create entries with an absolute > >> value. This would have been easier if all the platforms provided functions > >> like __pxx() which is not the case now. Otherwise something like this > >> should > >> have worked. > >> > >> > >> pud_t pud = READ_ONCE(*pudp); > >> pud = __pud(pud_val(pud) | RANDOM_VALUE (keeping lower 12 bits 0)) > >> WRITE_ONCE(*pudp, pud); > >> > >> But __pud() will fail to build in many platforms. > > > > Hmm, I simply used this on my system to make pud_clear_tests() work, not > > sure if it works on all archs: > > > > pud_val(*pudp) |= RANDOM_NZVALUE; > > Which compiles on arm64 but then fails on x86 because of the way pmd_val() > has been defined there. Use instead *pudp = __pud(pud_val(*pudp) | RANDOM_NZVALUE); It *should* be more portable. -- Kirill A. Shutemov

Re: [PATCH 1/1] mm/pgtable/debug: Add test validating architecture page table helpers

2019-09-05 Thread Kirill A. Shutemov
, (pgtable_t) page); - pud_populate_tests(mm, pudp, pmdp); - p4d_populate_tests(mm, p4dp, pudp); - pgd_populate_tests(mm, pgdp, p4dp); + pud_populate_tests(mm, pudp, saved_pmdp); + p4d_populate_tests(mm, p4dp, saved_pudp); + pgd_populate_tests(mm, pgdp, saved_p4dp); p4d_free(mm, saved_p4dp); pud_free(mm, saved_pudp); -- Kirill A. Shutemov

Re: [PATCH 1/1] mm/pgtable/debug: Add test validating architecture page table helpers

2019-09-04 Thread Kirill A. Shutemov
ith random or garbage > + * values. These saved addresses will be used for freeing > + * page table pages. > + */ > + saved_p4dp = p4d_offset(pgdp, 0UL); > + saved_pudp = pud_offset(p4dp, 0UL); > + saved_pmdp = pmd_offset(pudp, 0UL); > + saved_ptep = pte_offset_map(pmdp, 0UL); > + > + pte_basic_tests(page, prot); > + pmd_basic_tests(page, prot); > + pud_basic_tests(page, prot); > + p4d_basic_tests(page, prot); > + pgd_basic_tests(page, prot); > + > + pte_clear_tests(ptep); > + pmd_clear_tests(pmdp); > + pud_clear_tests(pudp); > + p4d_clear_tests(p4dp); > + pgd_clear_tests(pgdp); > + > + pmd_populate_tests(mm, pmdp, (pgtable_t) page); This is not correct for architectures that defines pgtable_t as pte_t pointer, not struct page pointer. > + pud_populate_tests(mm, pudp, pmdp); > + p4d_populate_tests(mm, p4dp, pudp); > + pgd_populate_tests(mm, pgdp, p4dp); This is wrong. All p?dp points to the second entry in page table entry. This is not valid pointer for page table and triggers p?d_bad() on x86. Use saved_p?dp instead. > + > + p4d_free(mm, saved_p4dp); > + pud_free(mm, saved_pudp); > + pmd_free(mm, saved_pmdp); > + pte_free(mm, (pgtable_t) virt_to_page(saved_ptep)); > + > + mm_dec_nr_puds(mm); > + mm_dec_nr_pmds(mm); > + mm_dec_nr_ptes(mm); > + __mmdrop(mm); > + > + free_mapped_page(page); > + return 0; > +} > + > +static void __exit arch_pgtable_tests_exit(void) { } > + > +module_init(arch_pgtable_tests_init); > +module_exit(arch_pgtable_tests_exit); > + > +MODULE_LICENSE("GPL v2"); > +MODULE_AUTHOR("Anshuman Khandual "); > +MODULE_DESCRIPTION("Test archicture page table helpers"); > -- > 2.20.1 > > -- Kirill A. Shutemov

Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default

2019-03-19 Thread Kirill A. Shutemov
ges in that case? > > The problem with the transparent_hugepage/enabled interface is that it > conflates performing compaction work to produce THP-pages with the > ability to map huge pages at all. That's not [entirely] true. transparent_hugepage/defrag gates heavy-duty compaction. We do only very limited compaction if it's not advised by transparent_hugepage/defrag. I believe DAX has to respect transparent_hugepage/enabled. Or not advertise its huge pages as THP. It's confusing for user. -- Kirill A. Shutemov

Re: [PATCH 2/2] mm/dax: Don't enable huge dax mapping by default

2019-03-06 Thread Kirill A. Shutemov
ocated out of /dev/dax/ or > /dev/pmem*. Do we have a reason not to use hugepages for mapping pages in > that case? Yes. Like when you don't want dax to compete for TLB with mission-critical application (which uses hugetlb for instance). -- Kirill A. Shutemov

Re: [PATCH] mmap.2: describe the 5level paging hack

2019-02-12 Thread Kirill A. Shutemov
recommending (void *) -1 as such address. > .\" Before Linux 2.6.24, the address was rounded up to the next page > .\" boundary; since 2.6.24, it is rounded down! > The address of the new mapping is returned as the result of the call. > -- > 2.20.1.791.gb4d0f1c61a-goog > -- Kirill A. Shutemov

Re: [PATCH] mm/zsmalloc.c: Fix zsmalloc 32-bit PAE support

2018-12-10 Thread Kirill A. Shutemov
efine OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) Have you tested it with CONFIG_X86_5LEVEL=y? ASAICS, the patch makes OBJ_INDEX_BITS and what depends from it dynamic -- it depends what paging mode we are booting in. ZS_SIZE_CLASSES depends indirectly on OBJ_INDEX_BITS and I don't see how struct zs_pool definition can compile with dynamic ZS_SIZE_CLASSES. Hm? -- Kirill A. Shutemov

Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2)

2018-10-25 Thread Kirill A. Shutemov
On Wed, Oct 24, 2018 at 07:09:07PM -0700, Joel Fernandes wrote: > On Wed, Oct 24, 2018 at 03:57:24PM +0300, Kirill A. Shutemov wrote: > > On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: > > > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote: >

Re: [PATCH 1/4] treewide: remove unused address argument from pte_alloc functions (v2)

2018-10-25 Thread Kirill A. Shutemov
's not needed anymore. Page allocator and SL?B are good enough now. See 3c936465249f ("[SPARC64]: Kill pgtable quicklists and use SLAB.") -- Kirill A. Shutemov

Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2)

2018-10-24 Thread Kirill A. Shutemov
On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote: > > On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote: > > > diff --git a/mm/mremap.c b/mm/mremap.c > > > index

Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2)

2018-10-24 Thread Kirill A. Shutemov
Set the new pmd */ > + set_pmd_at(mm, new_addr, new_pmd, pmd); > + if (new_ptl != old_ptl) > + spin_unlock(new_ptl); > + spin_unlock(old_ptl); > + > + *need_flush = true; > + return true; > + } > + return false; > +} > + -- Kirill A. Shutemov

Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions

2018-10-12 Thread Kirill A. Shutemov
On Fri, Oct 12, 2018 at 05:42:24PM +0100, Anton Ivanov wrote: > > On 10/12/18 3:48 PM, Anton Ivanov wrote: > > On 12/10/2018 15:37, Kirill A. Shutemov wrote: > > > On Fri, Oct 12, 2018 at 03:09:49PM +0100, Anton Ivanov wrote: > > > > On 10/12/18 2:37

Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions

2018-10-12 Thread Kirill A. Shutemov
On Fri, Oct 12, 2018 at 09:57:19AM -0700, Joel Fernandes wrote: > On Fri, Oct 12, 2018 at 04:19:46PM +0300, Kirill A. Shutemov wrote: > > On Fri, Oct 12, 2018 at 05:50:46AM -0700, Joel Fernandes wrote: > > > On Fri, Oct 12, 2018 at 02:30:56PM +0300, Kirill A. Shutemov wrote: >

Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions

2018-10-12 Thread Kirill A. Shutemov
/* Set the new pmd */ > > + set_pmd_at(mm, new_addr, new_pmd, pmd); > > UML does not have set_pmd_at at all Every architecture does. :) But it may come not from the arch code. > If I read the code right, MIPS completely ignores the address argument so > set_pmd_at there may not have the effect which this patch is trying to > achieve. Ignoring address is fine. Most architectures do that.. The ideas is to move page table to the new pmd slot. It's nothing to do with the address passed to set_pmd_at(). -- Kirill A. Shutemov

Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions

2018-10-12 Thread Kirill A. Shutemov
On Fri, Oct 12, 2018 at 05:50:46AM -0700, Joel Fernandes wrote: > On Fri, Oct 12, 2018 at 02:30:56PM +0300, Kirill A. Shutemov wrote: > > On Thu, Oct 11, 2018 at 06:37:56PM -0700, Joel Fernandes (Google) wrote: > > > Android needs to mremap large regions of memory during

Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions

2018-10-12 Thread Kirill A. Shutemov
On Fri, Oct 12, 2018 at 02:30:56PM +0300, Kirill A. Shutemov wrote: > On Thu, Oct 11, 2018 at 06:37:56PM -0700, Joel Fernandes (Google) wrote: > > @@ -239,7 +287,21 @@ unsigned long move_page_tables(struct vm_area_struct > > *vma, > > split_huge_pmd(v

Re: [PATCH v2 2/2] mm: speed up mremap by 500x on large regions

2018-10-12 Thread Kirill A. Shutemov
ue; > + } else if (extent == PMD_SIZE) { Hm. What guarantees that new_addr is PMD_SIZE-aligned? It's not obvious to me. -- Kirill A. Shutemov

Re: [PATCH v2 1/2] treewide: remove unused address argument from pte_alloc functions

2018-10-12 Thread Kirill A. Shutemov
pte_quicklist = (unsigned long *)(*ret); > - ret[0] = 0; > - pgtable_cache_size--; > - } > - return (pte_t *)ret; > -} > - Ditto. -- Kirill A. Shutemov

Re: [Update] Regression in 4.18 - 32-bit PowerPC crashes on boot - bisected to commit 1d40a5ea01d5

2018-06-29 Thread Kirill A. Shutemov
tlb_flush_pgtable(tlb, address); - pgtable_page_dtor(table); pgtable_free_tlb(tlb, page_address(table), 0); } #endif /* _ASM_POWERPC_PGALLOC_32_H */ -- Kirill A. Shutemov

Re: [PATCH] selftests/vm: Update max va test to check for high address return.

2018-02-28 Thread Kirill A. Shutemov
y address, not restricted to 47-bit address space. It doesn't mean the application *require* the address to be above 47-bit. At least on x86, -1 just shift upper boundary of address range where we can look for unmapped area. -- Kirill A. Shutemov

Re: [mainline][Memory off/on][83e3c48] kernel Oops with memory hot-unplug on ppc

2018-02-19 Thread Kirill A. Shutemov
> > > The code was first introduced with commit( 83e3c48: mm/sparsemem: > > Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y) Any chance to bisect it? Could you check if the commit just before 83e3c48729d9 is fine? -- Kirill A. Shutemov

Re: [PATCH v6 00/24] Speculative page faults

2018-01-16 Thread Kirill A. Shutemov
t case scenario? Like when we go far enough into speculative code path on every page fault and then fallback to normal page fault? -- Kirill A. Shutemov

Re: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-08 Thread Kirill A. Shutemov
> > > 3) We don't switch to large address space if hint_addr + len > 128TB. > > The decision to switch to large address space is primarily based on hint > > addr > > But does the mmap succeed in that case or not? > > ie: mmap(0x7000, 0x2000, ...) = ? It does, but resulting address doesn't match the hint. It's somewhere below 47-bit border. -- Kirill A. Shutemov

Re: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-07 Thread Kirill A. Shutemov
For everything else we search in < 128TB space if hint addr is below > 128TB > > 3) We don't switch to large address space if hint_addr + len > 128TB. The > decision to switch to large address space is primarily based on hint addr > > Is there any other rule we need to outline? Or is any of the above not > correct? That's correct. -- Kirill A. Shutemov

Re: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-07 Thread Kirill A. Shutemov
On Tue, Nov 07, 2017 at 02:05:42PM +0100, Florian Weimer wrote: > On 11/07/2017 12:44 PM, Kirill A. Shutemov wrote: > > On Tue, Nov 07, 2017 at 12:26:12PM +0100, Florian Weimer wrote: > > > On 11/07/2017 12:15 PM, Kirill A. Shutemov wrote: > > > > > > > &

Re: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-07 Thread Kirill A. Shutemov
might use high bits of pointer returned by > that library because those are never satisfied today and the library > would fall back. If you want to point that it's ABI break, yes it is. But we allow ABI break as long as nobody notices. I think it's reasonable to expect that nobody relies on such corner cases. If we would find any piece of software affect by the change we would need to reconsider. -- Kirill A. Shutemov

Re: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-07 Thread Kirill A. Shutemov
On Tue, Nov 07, 2017 at 12:26:12PM +0100, Florian Weimer wrote: > On 11/07/2017 12:15 PM, Kirill A. Shutemov wrote: > > > > First of all, using addr and MAP_FIXED to develop our heuristic can > > > never really give unchanged ABI. It's an in-band signal. brk()

Re: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-07 Thread Kirill A. Shutemov
see > out-of-range addresses, but I expected a full opt-out based on RLIMIT_AS > would be sufficient for them. Just use mmap(-1), without MAP_FIXED to get full address space. -- Kirill A. Shutemov

Re: POWER: Unexpected fault when writing to brk-allocated memory

2017-11-07 Thread Kirill A. Shutemov
ersonality. Something out-of-band. I don't wan to > get too far into that discussion yet. First we need to agree whether > or not the code in the tree today is a problem. Well, we've discussed before all options you are proposing. Linus wanted a minimalistic interface, so we took this path for now. We can always add more ways to get access to full address space later. -- Kirill A. Shutemov

Re: [PATCH] mm: remove unnecessary WARN_ONCE in page_vma_mapped_walk().

2017-10-03 Thread Kirill A. Shutemov
xes: 616b8371539a ("mm: thp: enable thp migration in generic path") > Reported-and-tested-by: Abdul Haleem <abdha...@linux.vnet.ibm.com> > Signed-off-by: Zi Yan <zi@cs.rutgers.edu> > Cc: "Kirill A. Shutemov" <kirill.shute...@linux.intel.com> > Cc: An

Re: [PATCH v2 14/20] mm: Provide speculative fault infrastructure

2017-08-26 Thread Kirill A. Shutemov
md = pmd; > + vmf.pgoff = linear_page_index(vma, address); > + vmf.gfp_mask = __get_fault_gfp_mask(vma); > + vmf.sequence = seq; > + vmf.flags = flags; > + > + local_irq_enable(); > + > + /* > + * We need to re-validate the VMA after chec

Re: [PATCH 05/16] mm: Protect VMA modifications using VMA sequence count

2017-08-10 Thread Kirill A. Shutemov
On Thu, Aug 10, 2017 at 10:27:50AM +0200, Laurent Dufour wrote: > On 10/08/2017 02:58, Kirill A. Shutemov wrote: > > On Wed, Aug 09, 2017 at 12:43:33PM +0200, Laurent Dufour wrote: > >> On 09/08/2017 12:12, Kirill A. Shutemov wrote: > >>> On Tue, Aug 08, 2017 at 04

Re: [PATCH 05/16] mm: Protect VMA modifications using VMA sequence count

2017-08-09 Thread Kirill A. Shutemov
On Wed, Aug 09, 2017 at 12:43:33PM +0200, Laurent Dufour wrote: > On 09/08/2017 12:12, Kirill A. Shutemov wrote: > > On Tue, Aug 08, 2017 at 04:35:38PM +0200, Laurent Dufour wrote: > >> The VMA sequence count has been introduced to allow fast detection of > >> VMA modif

Re: [PATCH 05/16] mm: Protect VMA modifications using VMA sequence count

2017-08-09 Thread Kirill A. Shutemov
here near complete list of places where we touch vm_flags. What is your plan for the rest? -- Kirill A. Shutemov

Re: [PATCH 02/16] mm: Prepare for FAULT_FLAG_SPECULATIVE

2017-08-09 Thread Kirill A. Shutemov
vmf->orig_pte))) { > if (old_page) { > if (!PageAnon(old_page)) { -- Kirill A. Shutemov

Re: [RFC PATCH 1/3] powerpc/mm: update pmdp_invalidate to return old pmd value

2017-07-27 Thread Kirill A. Shutemov
kernel.org/r/20170615145224.66200-1-kirill.shute...@linux.intel.com -- Kirill A. Shutemov

Re: 5-level pagetable patches break ppc64le

2017-03-13 Thread Kirill A. Shutemov
haven't had a chance to narrow it down yet. Please check if patch by this link helps: http://lkml.kernel.org/r/20170313052213.11411-1-kirill.shute...@linux.intel.com -- Kirill A. Shutemov

Re: [PATCH] mm: stop leaking PageTables

2017-01-08 Thread Kirill A. Shutemov
r implementation, perhaps? Delete it. > > Fixes: 953c66c2b22a ("mm: THP page cache support for ppc64") > Signed-off-by: Hugh Dickins <hu...@google.com> Sorry, that I missed this initially. Acked-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> -- Kirill A. Shutemov

Re: [RFC 1/4] mm: remove unused TASK_SIZE_OF()

2017-01-02 Thread Kirill A. Shutemov
ev@lists.ozlabs.org > Cc: linux-s...@vger.kernel.org > Cc: sparcli...@vger.kernel.org > Signed-off-by: Dmitry Safonov <dsafo...@virtuozzo.com> I've noticed this too. Acked-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> -- Kirill A. Shutemov

Re: [PATCH V3 2/2] mm: THP page cache support for ppc64

2016-11-13 Thread Kirill A. Shutemov
hp_split_page 51518 > thp_split_page_failed 1 > thp_deferred_split_page 73566 > thp_split_pmd 665 > thp_zero_page_alloc 3 > thp_zero_page_alloc_failed 0 > > Signed-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.vnet.ibm.com> One nit-pick below, but otherwise Acked-by: Kir

Re: [PATCH 2/2] mm: THP page cache support for ppc64

2016-11-11 Thread Kirill A. Shutemov
On Fri, Nov 11, 2016 at 05:42:11PM +0530, Aneesh Kumar K.V wrote: > "Kirill A. Shutemov" <kir...@shutemov.name> writes: > > > On Mon, Nov 07, 2016 at 02:04:41PM +0530, Aneesh Kumar K.V wrote: > >> @@ -2953,6 +2966,13 @@ static int do_set_pmd(struct f

Re: [PATCH 2/2] mm: THP page cache support for ppc64

2016-11-11 Thread Kirill A. Shutemov
-ENOMEM handling? I think we should do this way before this point. Maybe in do_fault() or something. -- Kirill A. Shutemov

Re: [PATCH V2] mm: move vma_is_anonymous check within pmd_move_must_withdraw

2016-11-11 Thread Kirill A. Shutemov
ned-off-by: Aneesh Kumar K.V <aneesh.ku...@linux.vnet.ibm.com> Acked-by: Kirill A. Shutemov <kirill.shute...@linux.intel.com> -- Kirill A. Shutemov

Re: [RFC PATCH] powerpc/mm: THP page cache support

2016-09-26 Thread Kirill A. Shutemov
addr) > { > @@ -1359,6 +1367,8 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct > vm_area_struct *vma, > atomic_long_dec(>mm->nr_ptes); > add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR); > } else { > + if (arch_needs_pgtable_deposit()) Just hide the arch_needs_pgtable_deposit() check in zap_deposited_table(). > + zap_deposited_table(tlb->mm, pmd); > add_mm_counter(tlb->mm, MM_FILEPAGES, -HPAGE_PMD_NR); > } > spin_unlock(ptl); -- Kirill A. Shutemov

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-25 Thread Kirill A. Shutemov
at 07:19:07PM +0100, Gerald Schaefer wrote: > >> On Tue, 23 Feb 2016 13:32:21 +0300 > >> "Kirill A. Shutemov" <kir...@shutemov.name> wrote: > >> > The theory is that the splitting bit effetely masked bogus pmd_present(): > >> > we had p

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-23 Thread Kirill A. Shutemov
for (i = 0; i < HPAGE_PMD_NR; i++) { page_remove_rmap(page + i, false); put_page(page + i); } -- Kirill A. Shutemov ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org htt

Re: Question on follow_page_mask

2016-02-23 Thread Kirill A. Shutemov
is the purpose behind the BUG_ON. I would guess requesting pin on non-reclaimable page is considered useless, meaning suspicius behavior. BUG_ON() is overkill, I think. WARN_ON_ONCE() would make it. Not that this follow_huge_addr() on Power is not reachable via do_move_page_to_node_array(), becaus

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-23 Thread Kirill A. Shutemov
644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -490,7 +490,7 @@ static inline int pud_bad(pud_t pud) static inline int pmd_present(pmd_t pmd) { - return pmd_val(pmd) != _SEGMENT_ENTRY_INVALID; + return !(pmd_val(pmd) & _SEGMENT_ENTRY_INVA

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-18 Thread Kirill A. Shutemov
On Thu, Feb 18, 2016 at 04:00:37PM +0100, Gerald Schaefer wrote: > On Thu, 18 Feb 2016 01:58:08 +0200 > "Kirill A. Shutemov" <kir...@shutemov.name> wrote: > > > On Wed, Feb 17, 2016 at 08:13:40PM +0100, Gerald Schaefer wrote: > > > On Sat, 13 Feb 2016 12:58

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-17 Thread Kirill A. Shutemov
ther VM_BUG_ONs in mm/huge_memory.c that check the same? > > This behavior is not new, it was the same before the THP rework, so I do not > assume that it is related to the current problems, maybe with the exception > of this specific crash. I never saw the BUG at mm/huge_memo

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-17 Thread Kirill A. Shutemov
On Tue, Feb 16, 2016 at 05:24:44PM +0100, Gerald Schaefer wrote: > On Mon, 15 Feb 2016 23:35:26 +0200 > "Kirill A. Shutemov" <kir...@shutemov.name> wrote: > > > Is there any chance that I'll be able to trigger the bug using QEMU? > > Does anybody have an Q

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-15 Thread Kirill A. Shutemov
On Mon, Feb 15, 2016 at 07:37:02PM +0100, Gerald Schaefer wrote: > On Mon, 15 Feb 2016 13:31:59 +0200 > "Kirill A. Shutemov" <kir...@shutemov.name> wrote: > > > On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote: > > > > > > On Sat, 13

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-15 Thread Kirill A. Shutemov
On Sat, Feb 13, 2016 at 12:58:31PM +0100, Sebastian Ott wrote: > > On Sat, 13 Feb 2016, Kirill A. Shutemov wrote: > > Could you check if revert of fecffad25458 helps? > > I reverted fecffad25458 on top of 721675fcf277cf - it oopsed with: > > ยข 1851.721062! Unable

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-12 Thread Kirill A. Shutemov
On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote: > On Thu, 11 Feb 2016 21:09:42 +0200 > "Kirill A. Shutemov" <kir...@shutemov.name> wrote: > > > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote: > > > Hi, > > > &

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-12 Thread Kirill A. Shutemov
On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote: > On Fri, 12 Feb 2016 16:57:27 +0100 > Christian Borntraeger <borntrae...@de.ibm.com> wrote: > > > On 02/12/2016 04:41 PM, Kirill A. Shutemov wrote: > > > On Thu, Feb 11, 2016 at 08:57:02P

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-11 Thread Kirill A. Shutemov
On Thu, Feb 11, 2016 at 09:09:42PM +0200, Kirill A. Shutemov wrote: > On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote: > > Hi, > > > > Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 and > > he also bisected this to commit 61f5d698 &

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-11 Thread Kirill A. Shutemov
c pmdp_invalidate() would do the trick, right? If yes, I'll prepare patch tomorrow (some sleep required). -- Kirill A. Shutemov ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V2] powerpc/mm: Fix Multi hit ERAT cause by recent THP update

2016-02-07 Thread Kirill A. Shutemov
t *pmdp) > +{ > + > +} > +#endif > + > #ifndef __HAVE_ARCH_PTE_SAME > static inline int pte_same(pte_t pte_a, pte_t pte_b) > { > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 36c070167b71..b52d16a86e91 100644 > --- a/mm/huge_memory.c > ++

Re: [PATCH] powerpc/mm: Fix Multi hit ERAT cause by recent THP update

2016-02-05 Thread Kirill A. Shutemov
/huge_memory.c > index 36c070167b71..b52d16a86e91 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2860,6 +2860,7 @@ static void __split_huge_pmd_locked(struct > vm_area_struct *vma, pmd_t *pmd, > young = pmd_young(*pmd); > dirty = pmd_dirty(*pmd); > > + pmdp_huge_splitting_flush(vma, haddr, pmd); > pgtable = pgtable_trans_huge_withdraw(mm, pmd); > pmd_populate(mm, &_pmd, pgtable); > > -- > 2.5.0 > -- Kirill A. Shutemov ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V5 5/7] mm: mmap: Add mmap flag to request VM_LOCKONFAULT

2015-07-27 Thread Kirill A. Shutemov
On Mon, Jul 27, 2015 at 09:41:26AM -0400, Eric B Munson wrote: On Mon, 27 Jul 2015, Kirill A. Shutemov wrote: On Fri, Jul 24, 2015 at 05:28:43PM -0400, Eric B Munson wrote: The cost of faulting in all memory to be locked can be very high when working with large mappings. If only

Re: [PATCH V5 2/7] mm: mlock: Add new mlock system call

2015-07-27 Thread Kirill A. Shutemov
/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/include/uapi/asm/mman.h| 5 + Define MLOCK_LOCKED in include/uapi/asm-generic/mman-common.h. This way you can drop changes in powerpc, sparc and tile. Otherwise looks good. -- Kirill A. Shutemov

Re: [PATCH V5 4/7] mm: mlock: Add mlock flags to enable VM_LOCKONFAULT usage

2015-07-27 Thread Kirill A. Shutemov
. Signed-off-by: Eric B Munson emun...@akamai.com Cc: Michal Hocko mho...@suse.cz Cc: Vlastimil Babka vba...@suse.cz Cc: Jonathan Corbet cor...@lwn.net Cc: Kirill A. Shutemov kir...@shutemov.name Cc: linux-al...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: linux-m...@linux-mips.org Cc

Re: [PATCH V5 5/7] mm: mmap: Add mmap flag to request VM_LOCKONFAULT

2015-07-27 Thread Kirill A. Shutemov
it's demonstrably useful. [1] http://lkml.kernel.org/g/20150114095019.gc4...@dhcp22.suse.cz -- Kirill A. Shutemov ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH V4 5/6] mm: mmap: Add mmap flag to request VM_LOCKONFAULT

2015-07-22 Thread Kirill A. Shutemov
On Wed, Jul 22, 2015 at 10:32:20AM -0400, Eric B Munson wrote: On Wed, 22 Jul 2015, Kirill A. Shutemov wrote: On Tue, Jul 21, 2015 at 03:59:40PM -0400, Eric B Munson wrote: The cost of faulting in all memory to be locked can be very high when working with large mappings. If only

Re: [PATCH V4 5/6] mm: mmap: Add mmap flag to request VM_LOCKONFAULT

2015-07-22 Thread Kirill A. Shutemov
for the locked but not present state, expose it as an mmap option like MAP_LOCKED - VM_LOCKED. What is advantage over mmap() + mlock(MLOCK_ONFAULT)? -- Kirill A. Shutemov ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org

Re: [PATCH V4 1/3] mm/thp: Split out pmd collpase flush into a separate functions

2015-05-12 Thread Kirill A. Shutemov
does code movement for clarity. There should not be any change in functionality. Signed-off-by: Aneesh Kumar K.V aneesh.ku...@linux.vnet.ibm.com For the patchset: Acked-by: Kirill A. Shutemov kirill.shute...@linux.intel.com -- Kirill A. Shutemov

Re: [PATCH V3] powerpc/thp: Serialize pmd clear against a linux page table walk.

2015-05-11 Thread Kirill A. Shutemov
address. Such a clear need to wait for the parallel find_linux_pte_or_hugepte to finish. With zap_huge_pmd, we can run into issues, with a hugepage pte getting zapped due to a MADV_DONTNEED while other cpu fault it in as small pages. Reported-by: Kirill A. Shutemov kirill.shute

Re: [PATCH V2 1/2] mm/thp: Split out pmd collpase flush into a seperate functions

2015-05-07 Thread Kirill A. Shutemov
, mmun_start, mmun_end); -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Kirill

  1   2   >