On 07/04/2017 10:16 AM, Martin Schwidefsky wrote: > On Tue, 4 Jul 2017 09:48:11 +0200 > Christian Borntraeger <borntrae...@de.ibm.com> wrote: > >> On 07/03/2017 09:07 PM, Dr. David Alan Gilbert wrote: >>> * Michael S. Tsirkin (m...@redhat.com) wrote: >>>> On Fri, Jun 30, 2017 at 05:31:39PM +0100, Dr. David Alan Gilbert wrote: >>>>> * Christian Borntraeger (borntrae...@de.ibm.com) wrote: >>>>>> On 04/26/2017 01:45 PM, Christian Borntraeger wrote: >>>>>> >>>>>>>> Hmm, I have a theory, if the flags field has bit 1 set, i.e. >>>>>>>> RAM_SAVE_FLAG_COMPRESS >>>>>>>> then try changing ram_handle_compressed to always do the memset. >>>>>>> >>>>>>> FWIW, changing ram_handle_compressed to always memset makes the problem >>>>>>> go away. >>>>>> >>>>>> It is still running fine now with the "always memset change" >>>>> >>>>> Did we ever nail down a fix for this; as I remember Andrea said >>>>> we shouldn't need to do that memset, but we came to the conclusion >>>>> it was something specific to how s390 protection keys worked. >>>>> >>>>> Dave >>>> >>>> No we didn't. Let's merge that for now, with a comment that >>>> we don't really understand what's going on? >>> >>> Hmm no, I don't really want to change the !s390 behaviour, especially >>> since it causes allocation that we otherwise avoid and Andrea's >>> reply tothe original post pointed out it's not necessary. >> >> >> Since storage keys are per physical page we must not have shared pages. >> Therefore in s390_enable_skey we already do a break_ksm (via ksm_madvise), >> in other words we already allocate pages on the keyless->keyed switch. >> >> So why not do the same for zero pages - instead of invalidating the page >> table entry, lets just do a proper COW. >> >> Something like this maybe (Andrea, Martin any better way to do that?) >> >> >> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c >> index 4fb3d3c..11475c7 100644 >> --- a/arch/s390/mm/gmap.c >> +++ b/arch/s390/mm/gmap.c >> @@ -2149,13 +2149,18 @@ EXPORT_SYMBOL_GPL(s390_enable_sie); >> static int __s390_enable_skey(pte_t *pte, unsigned long addr, >> unsigned long next, struct mm_walk *walk) >> { >> + struct page *page; >> /* >> - * Remove all zero page mappings, >> + * Remove all zero page mappings with a COW >> * after establishing a policy to forbid zero page mappings >> * following faults for that page will get fresh anonymous pages >> */ >> - if (is_zero_pfn(pte_pfn(*pte))) >> - ptep_xchg_direct(walk->mm, addr, pte, __pte(_PAGE_INVALID)); >> + if (is_zero_pfn(pte_pfn(*pte))) { >> + if (get_user_pages(addr, 1, FOLL_WRITE, &page, NULL) == 1) >> + put_page(page); >> + else >> + return -ENOMEM; >> + } >> /* Clear storage key */ >> ptep_zap_key(walk->mm, addr, pte); >> return 0; > > I do not quite get the problem here. The zero-page mappings are always > marked as _PAGE_SPECIAL. These should be safe to replace with an empty > pte. We do mark all VMAs as unmergeable prior to the page table walk > that replaces all zero-page mappings. What is the get_user_pages() call > supposed to do? >
After talking to Martin, we decided that it might be a good trial to simply not use the empty zero page at all for KVM guests. Something like this diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h index 57057fb..65ab11d 100644 --- a/arch/s390/include/asm/pgtable.h +++ b/arch/s390/include/asm/pgtable.h @@ -505,7 +505,7 @@ static inline int mm_alloc_pgste(struct mm_struct *mm) * In the case that a guest uses storage keys * faults should no longer be backed by zero pages */ -#define mm_forbids_zeropage mm_use_skey +#define mm_forbids_zeropage mm_has_pgste static inline int mm_use_skey(struct mm_struct *mm) { #ifdef CONFIG_PGSTE diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 4fb3d3c..88f502a 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2149,13 +2149,6 @@ EXPORT_SYMBOL_GPL(s390_enable_sie); static int __s390_enable_skey(pte_t *pte, unsigned long addr, unsigned long next, struct mm_walk *walk) { - /* - * Remove all zero page mappings, - * after establishing a policy to forbid zero page mappings - * following faults for that page will get fresh anonymous pages - */ - if (is_zero_pfn(pte_pfn(*pte))) - ptep_xchg_direct(walk->mm, addr, pte, __pte(_PAGE_INVALID)); /* Clear storage key */ ptep_zap_key(walk->mm, addr, pte); return 0; seems to do the trick. Will have a look what that means for the memory usage for the usual cases.