Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM

2015-04-27 Thread Martin Schwidefsky
On Mon, 27 Apr 2015 15:48:42 +0200
Alexander Graf ag...@suse.de wrote:

 On 04/23/2015 02:13 PM, Martin Schwidefsky wrote:
  On Thu, 23 Apr 2015 14:01:23 +0200
  Alexander Graf ag...@suse.de wrote:
 
  As far as alternative approaches go, I don't have a great idea otoh.
  We could have an elf flag indicating that this process needs 4k page
  tables to limit the impact to a single process. In fact, could we
  maybe still limit the scope to non-global? A personality may work
  as well. Or ulimit?
  I tried the ELF flag approach, does not work. The trouble is that
  allocate_mm() has to create the page tables with 4K tables if you
  want to change the page table layout later on. We have learned the
  hard way that the direction 2K to 4K does not work due to races
  in the mm.
 
  Now there are two major cases: 1) fork + execve and 2) fork only.
  The ELF flag can be used to reduce from 4K to 2K for 1) but not 2).
  2) is required for apps that use lots of forking, e.g. database or
  web servers. Same goes for the approach with a personality flag or
  ulimit.
 
  We would have to distinguish the two cases for allocate_mm(),
  if the new mm is allocated for a fork the current mm decides
  2K vs. 4K. If the new mm is allocated by binfmt_elf, then start
  with 4K and do the downgrade after the ELF flag has been evaluated.
 
 Well, you could also make it a personality flag for example, no? Then 
 every new process below a certain one always gets 4k page tables until 
 they drop the personality, at which point each child would only get 2k 
 page tables again.
 
 I'm mostly concerned that people will end up mixing VMs and other 
 workloads on the same LPAR, so I don't think there's a one-shoe-fits-all 
 solution.

If I add an argument to mm_init() to indicate if this context
is for fork() or execve() then the ELF header flag approach works.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: s390: remove delayed reallocation of page tables for KVM

2015-04-23 Thread Martin Schwidefsky
On Thu, 23 Apr 2015 14:01:23 +0200
Alexander Graf ag...@suse.de wrote:

 As far as alternative approaches go, I don't have a great idea otoh.
 We could have an elf flag indicating that this process needs 4k page
 tables to limit the impact to a single process. In fact, could we
 maybe still limit the scope to non-global? A personality may work
 as well. Or ulimit?

I tried the ELF flag approach, does not work. The trouble is that
allocate_mm() has to create the page tables with 4K tables if you
want to change the page table layout later on. We have learned the
hard way that the direction 2K to 4K does not work due to races
in the mm.

Now there are two major cases: 1) fork + execve and 2) fork only.
The ELF flag can be used to reduce from 4K to 2K for 1) but not 2).
2) is required for apps that use lots of forking, e.g. database or
web servers. Same goes for the approach with a personality flag or
ulimit.

We would have to distinguish the two cases for allocate_mm(),
if the new mm is allocated for a fork the current mm decides
2K vs. 4K. If the new mm is allocated by binfmt_elf, then start
with 4K and do the downgrade after the ELF flag has been evaluated.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compiler bug gcc4.6/4.7 with ACCESS_ONCE and workarounds

2014-11-12 Thread Martin Schwidefsky
On Tue, 11 Nov 2014 16:36:06 -0800
Linus Torvalds torva...@linux-foundation.org wrote:

 On Tue, Nov 11, 2014 at 4:33 PM, Linus Torvalds
 torva...@linux-foundation.org wrote:
 
  I guess as a workaround it is fine, as long as we don't lose sight of
  trying to eventually do a better job.
 
 Oh, and when it comes to the actual gcc bug - do you have any reason
 to believe that it's somehow triggered more easily by something
 particular in the arch/s390/kvm/gaccess.c code?
 
 IOW, why does this problem not hit the x86 spinlocks that also use
 volatile pointers to aggregate types? Or does it?

This looks similiar to what we had on s390:

old.tickets = ACCESS_ONCE(lock-tickets)

In theory x86 should be affected as well. On s390 we have lots of
instruction that operate on memory and the cost model of gcc makes
the compiler more inclined to access memory multiple times. My
guess would be that once the value is cached in a register the
cost model for x86 will usually make sure that the value is not
read a second time. But this is no guarantee.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process

2014-10-23 Thread Martin Schwidefsky
On Wed, 22 Oct 2014 13:09:26 +0200
Dominik Dingel din...@linux.vnet.ibm.com wrote:

 s390 has the special notion of storage keys which are some sort of page flags
 associated with physical pages and live outside of direct addressable memory.
 These storage keys can be queried and changed with a special set of 
 instructions.
 The mentioned instructions behave quite nicely under virtualization, if there 
 is: 
 - an invalid pte, then the instructions will work on memory in the host page 
 table
 - a valid pte, then the instructions will work with the real storage key
 
 Thanks to Martin with his software reference and dirty bit tracking,
 the kernel does not issue any storage key instructions as now a 
 software based approach will be taken, on the other hand distributions 
 in the wild are currently using them.
 
 However, for virtualized guests we still have a problem with guest pages 
 mapped to zero pages and the kernel same page merging.  
 With each one multiple guest pages will point to the same physical page
 and share the same storage key.
 
 Let's fix this by introducing a new function which s390 will define to
 forbid new zero page mappings.  If the guest issues a storage key related 
 instruction we flag the mm_struct, drop existing zero page mappings
 and unmerge the guest memory.
 
 v2 - v3:
  - Clearing up patch description Patch 3/4
  - removing unnecessary flag in mmu_context (Paolo)
 
 v1 - v2: 
  - Following Dave and Paolo suggestion removing the vma flag
 
 Dominik Dingel (4):
   s390/mm: recfactor global pgste updates
   mm: introduce mm_forbids_zeropage function
   s390/mm: prevent and break zero page mappings in case of storage keys
   s390/mm: disable KSM for storage key enabled pages
 
  arch/s390/include/asm/pgalloc.h |   2 -
  arch/s390/include/asm/pgtable.h |   8 +-
  arch/s390/kvm/kvm-s390.c|   2 +-
  arch/s390/kvm/priv.c|  17 ++--
  arch/s390/mm/pgtable.c  | 180 
 ++--
  include/linux/mm.h  |   4 +
  mm/huge_memory.c|   2 +-
  mm/memory.c |   2 +-
  8 files changed, 106 insertions(+), 111 deletions(-)
 
Patches look good to me and as nobody seems to disagree with the proposed
solution I will add the code to the features branch of the s390 tree.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] mm: introduce new VM_NOZEROPAGE flag

2014-10-21 Thread Martin Schwidefsky
On Mon, 20 Oct 2014 20:14:53 +0200
Paolo Bonzini pbonz...@redhat.com wrote:

 On 10/18/2014 06:28 PM, Dave Hansen wrote:
   Currently it is an all or nothing thing, but for a future change we might 
   want to just
   tag the guest memory instead of the complete user address space.
 
  I think it's a bad idea to reserve a flag for potential future use.  If
  you_need_  it in the future, let's have the discussion then.  For now, I
  think it should probably just be stored in the mm somewhere.
 
 I agree with Dave (I thought I disagreed, but I changed my mind while 
 writing down my thoughts).  Just define mm_forbids_zeropage in 
 arch/s390/include/asm, and make it return mm-context.use_skey---with a 
 comment explaining how this is only for processes that use KVM, and then 
 only for guests that use storage keys.

The mm_forbids_zeropage() sure will work for now, but I think a vma flag
is the better solution. This is analog to VM_MERGEABLE or VM_NOHUGEPAGE,
the best solution would be to only mark those vmas that are mapped to
the guest. That we have not found a way to do that yet in a sensible way
does not change the fact that no-zero-page is a per-vma property, no?

But if you insist we go with the mm_forbids_zeropage() until we find a
clever way to distinguish the guest vmas from the qemu ones.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] cmma and lazy sske

2014-04-02 Thread Martin Schwidefsky
On Wed,  2 Apr 2014 11:24:43 +0200
Christian Borntraeger borntrae...@de.ibm.com wrote:

 shall we send out patch 1 via your tree or kvm? Should be still in for 3.15
 
 Can you ACK patches 2-5?
 
 
 Dominik Dingel (5):
   KVM: s390: make cmma usage conditionally
   KVM: s390: Adding skey bit to mmu context
   KVM: s390: Clear storage keys
   KVM: s390: Allow skeys to be enabled for the current process
   KVM: s390: Don't enable skeys by default
 
  arch/s390/include/asm/kvm_host.h|  4 ++
  arch/s390/include/asm/mmu.h |  2 +
  arch/s390/include/asm/mmu_context.h |  1 +
  arch/s390/include/asm/pgalloc.h |  3 +-
  arch/s390/include/asm/pgtable.h | 42 +++--
  arch/s390/kvm/diag.c|  6 ---
  arch/s390/kvm/kvm-s390.c| 61 +++-
  arch/s390/kvm/kvm-s390.h|  6 ++-
  arch/s390/kvm/priv.c| 94 
 +
  arch/s390/kvm/trace.h   | 14 ++
  arch/s390/mm/pgtable.c  | 61 +++-
  11 files changed, 199 insertions(+), 95 deletions(-)
 
This is all related to kvm and the majority of changes are for files
in the kvm directory. The changes in the mm headers only affect the
code for CONFIG_PGSTE=y. I think this should go over the kvm tree.

And Acked-by: Martin Schwidefsky schwidef...@de.ibm.com for the
patches 2-5.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] mm: add support for discard of unused ptes

2013-07-31 Thread Martin Schwidefsky
On Tue, 30 Jul 2013 13:44:22 -0700
Andrew Morton a...@linux-foundation.org wrote:

 On Thu, 25 Jul 2013 10:54:20 +0200 Martin Schwidefsky 
 schwidef...@de.ibm.com wrote:
 
  From: Konstantin Weitz konstantin.we...@gmail.com
  
  In a virtualized environment and given an appropriate interface the guest
  can mark pages as unused while they are free (for the s390 implementation
  see git commit 45e576b1c3d00206 guest page hinting light). For the host
  the unused state is a property of the pte.
  
  This patch adds the primitive 'pte_unused' and code to the host swap out
  handler so that pages marked as unused by all mappers are not swapped out
  but discarded instead, thus saving one IO for swap out and potentially
  another one for swap in.
  
  ...
 
  --- a/include/asm-generic/pgtable.h
  +++ b/include/asm-generic/pgtable.h
  @@ -193,6 +193,19 @@ static inline int pte_same(pte_t pte_a, pte_t pte_b)
   }
   #endif
   
  +#ifndef __HAVE_ARCH_PTE_UNUSED
  +/*
  + * Some architectures provide facilities to virtualization guests
  + * so that they can flag allocated pages as unused. This allows the
  + * host to transparently reclaim unused pages. This function returns
  + * whether the pte's page is unused.
  + */
  +static inline int pte_unused(pte_t pte)
  +{
  +   return 0;
  +}
  +#endif
  +
   #ifndef __HAVE_ARCH_PMD_SAME
   #ifdef CONFIG_TRANSPARENT_HUGEPAGE
   static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
  diff --git a/mm/rmap.c b/mm/rmap.c
  index cd356df..2291f25 100644
  --- a/mm/rmap.c
  +++ b/mm/rmap.c
  @@ -1234,6 +1234,16 @@ int try_to_unmap_one(struct page *page, struct 
  vm_area_struct *vma,
  }
  set_pte_at(mm, address, pte,
 swp_entry_to_pte(make_hwpoison_entry(page)));
  +   } else if (pte_unused(pteval)) {
  +   /*
  +* The guest indicated that the page content is of no
  +* interest anymore. Simply discard the pte, vmscan
  +* will take care of the rest.
  +*/
  +   if (PageAnon(page))
  +   dec_mm_counter(mm, MM_ANONPAGES);
  +   else
  +   dec_mm_counter(mm, MM_FILEPAGES);
  } else if (PageAnon(page)) {
  swp_entry_t entry = { .val = page_private(page) };
 
 Obviously harmless.  Please include this in whatever tree carries
 [PATCH 2/2] s390/kvm: support collaborative memory management.
 
Cool, thanks. This will go out via the KVM tree then.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] s390/kvm: support collaborative memory management

2013-07-25 Thread Martin Schwidefsky
From: Konstantin Weitz konstantin.we...@gmail.com

This patch enables Collaborative Memory Management (CMM) for kvm
on s390. CMM allows the guest to inform the host about page usage
(see arch/s390/mm/cmm.c). The host uses this information to avoid
swapping in unused pages in the page fault handler. Further, a CPU
provided list of unused invalid pages is processed to reclaim swap
space of not yet accessed unused pages.

[ Martin Schwidefsky: patch reordering and cleanup ]

Signed-off-by: Konstantin Weitz konstantin.we...@gmail.com
Signed-off-by: Martin Schwidefsky schwidef...@de.ibm.com
---
 arch/s390/include/asm/kvm_host.h |5 ++-
 arch/s390/include/asm/pgtable.h  |   24 
 arch/s390/kvm/kvm-s390.c |   25 +
 arch/s390/kvm/kvm-s390.h |2 +
 arch/s390/kvm/priv.c |   41 
 arch/s390/mm/pgtable.c   |   77 ++
 6 files changed, 173 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 3238d40..de6450e 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -113,7 +113,9 @@ struct kvm_s390_sie_block {
__u64   gbea;   /* 0x0180 */
__u8reserved188[24];/* 0x0188 */
__u32   fac;/* 0x01a0 */
-   __u8reserved1a4[92];/* 0x01a4 */
+   __u8reserved1a4[20];/* 0x01a4 */
+   __u64   cbrlo;  /* 0x01b8 */
+   __u8reserved1c0[64];/* 0x01c0 */
 } __attribute__((packed));
 
 struct kvm_vcpu_stat {
@@ -149,6 +151,7 @@ struct kvm_vcpu_stat {
u32 instruction_stsi;
u32 instruction_stfl;
u32 instruction_tprot;
+   u32 instruction_essa;
u32 instruction_sigp_sense;
u32 instruction_sigp_sense_running;
u32 instruction_sigp_external_call;
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 75fb726..65d48b8 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -227,6 +227,7 @@ extern unsigned long MODULES_END;
 #define _PAGE_SWR  0x008   /* SW pte referenced bit */
 #define _PAGE_SWW  0x010   /* SW pte write bit */
 #define _PAGE_SPECIAL  0x020   /* SW associated with special page */
+#define _PAGE_UNUSED   0x040   /* SW bit for ptep_clear_flush() */
 #define __HAVE_ARCH_PTE_SPECIAL
 
 /* Set of bits not changed in pte_modify */
@@ -375,6 +376,12 @@ extern unsigned long MODULES_END;
 
 #endif /* CONFIG_64BIT */
 
+/* Guest Page State used for virtualization */
+#define _PGSTE_GPS_ZERO0x8000UL
+#define _PGSTE_GPS_USAGE_MASK  0x0300UL
+#define _PGSTE_GPS_USAGE_STABLE 0xUL
+#define _PGSTE_GPS_USAGE_UNUSED 0x0100UL
+
 /*
  * A user page table pointer has the space-switch-event bit, the
  * private-space-control bit and the storage-alteration-event-control
@@ -590,6 +597,12 @@ static inline int pte_file(pte_t pte)
return (pte_val(pte)  mask) == _PAGE_TYPE_FILE;
 }
 
+static inline int pte_swap(pte_t pte)
+{
+   unsigned long mask = _PAGE_RO | _PAGE_INVALID | _PAGE_SWT | _PAGE_SWX;
+   return (pte_val(pte)  mask) == _PAGE_TYPE_SWAP;
+}
+
 static inline int pte_special(pte_t pte)
 {
return (pte_val(pte)  _PAGE_SPECIAL);
@@ -794,6 +807,7 @@ unsigned long gmap_translate(unsigned long address, struct 
gmap *);
 unsigned long __gmap_fault(unsigned long address, struct gmap *);
 unsigned long gmap_fault(unsigned long address, struct gmap *);
 void gmap_discard(unsigned long from, unsigned long to, struct gmap *);
+void __gmap_zap(unsigned long address, struct gmap *);
 
 void gmap_register_ipte_notifier(struct gmap_notifier *);
 void gmap_unregister_ipte_notifier(struct gmap_notifier *);
@@ -825,6 +839,7 @@ static inline void set_pte_at(struct mm_struct *mm, 
unsigned long addr,
 
if (mm_has_pgste(mm)) {
pgste = pgste_get_lock(ptep);
+   pgste_val(pgste) = ~_PGSTE_GPS_ZERO;
pgste_set_key(ptep, pgste, entry);
pgste_set_pte(ptep, entry);
pgste_set_unlock(ptep, pgste);
@@ -858,6 +873,12 @@ static inline int pte_young(pte_t pte)
return 0;
 }
 
+#define __HAVE_ARCH_PTE_UNUSED
+static inline int pte_unused(pte_t pte)
+{
+   return pte_val(pte)  _PAGE_UNUSED;
+}
+
 /*
  * pgd/pmd/pte modification functions
  */
@@ -1142,6 +1163,9 @@ static inline pte_t ptep_clear_flush(struct 
vm_area_struct *vma,
pte_val(*ptep) = _PAGE_TYPE_EMPTY;
 
if (mm_has_pgste(vma-vm_mm)) {
+   if ((pgste_val(pgste)  _PGSTE_GPS_USAGE_MASK) ==
+   _PGSTE_GPS_USAGE_UNUSED)
+   pte_val(pte) |= _PAGE_UNUSED;
pgste = pgste_update_all(pte, pgste);
pgste_set_unlock(ptep, pgste);
}
diff --git a/arch/s390/kvm/kvm

[PATCH 1/2] mm: add support for discard of unused ptes

2013-07-25 Thread Martin Schwidefsky
From: Konstantin Weitz konstantin.we...@gmail.com

In a virtualized environment and given an appropriate interface the guest
can mark pages as unused while they are free (for the s390 implementation
see git commit 45e576b1c3d00206 guest page hinting light). For the host
the unused state is a property of the pte.

This patch adds the primitive 'pte_unused' and code to the host swap out
handler so that pages marked as unused by all mappers are not swapped out
but discarded instead, thus saving one IO for swap out and potentially
another one for swap in.

[ Martin Schwidefsky: patch reordering and simplification ]

Signed-off-by: Konstantin Weitz konstantin.we...@gmail.com
Signed-off-by: Martin Schwidefsky schwidef...@de.ibm.com
---
 include/asm-generic/pgtable.h |   13 +
 mm/rmap.c |   10 ++
 2 files changed, 23 insertions(+)

diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 2f47ade..ec540c5 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -193,6 +193,19 @@ static inline int pte_same(pte_t pte_a, pte_t pte_b)
 }
 #endif
 
+#ifndef __HAVE_ARCH_PTE_UNUSED
+/*
+ * Some architectures provide facilities to virtualization guests
+ * so that they can flag allocated pages as unused. This allows the
+ * host to transparently reclaim unused pages. This function returns
+ * whether the pte's page is unused.
+ */
+static inline int pte_unused(pte_t pte)
+{
+   return 0;
+}
+#endif
+
 #ifndef __HAVE_ARCH_PMD_SAME
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b)
diff --git a/mm/rmap.c b/mm/rmap.c
index cd356df..2291f25 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1234,6 +1234,16 @@ int try_to_unmap_one(struct page *page, struct 
vm_area_struct *vma,
}
set_pte_at(mm, address, pte,
   swp_entry_to_pte(make_hwpoison_entry(page)));
+   } else if (pte_unused(pteval)) {
+   /*
+* The guest indicated that the page content is of no
+* interest anymore. Simply discard the pte, vmscan
+* will take care of the rest.
+*/
+   if (PageAnon(page))
+   dec_mm_counter(mm, MM_ANONPAGES);
+   else
+   dec_mm_counter(mm, MM_FILEPAGES);
} else if (PageAnon(page)) {
swp_entry_t entry = { .val = page_private(page) };
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC][PATCH 0/2] s390/kvm: add kvm support for guest page hinting v2

2013-07-25 Thread Martin Schwidefsky
v1-v2:
 - found a way to simplify the common code patch

Linux on s390 as a guest under z/VM has been using the guest page
hinting interface (alias collaborative memory management) for a long
time. The full version with volatile states has been deemed to be too
complicated (see the old discussion about guest page hinting e.g. on
http://marc.info/?l=linux-mmm=123816662017742w=2).
What is currently implemented for the guest is the unused and stable
states to mark unallocated pages as freely available to the host.
This works just fine with z/VM as the host.

The two patches in this series implement the guest page hinting
interface for the unused and stable states in the KVM host.
Most of the code specific to s390 but there is a common memory
management part as well, see patch #1.

The code is working stable now, from my point of view this is ready
for prime-time.

Konstantin Weitz (2):
  mm: add support for discard of unused ptes
  s390/kvm: support collaborative memory management

 arch/s390/include/asm/kvm_host.h |5 ++-
 arch/s390/include/asm/pgtable.h  |   24 
 arch/s390/kvm/kvm-s390.c |   25 +
 arch/s390/kvm/kvm-s390.h |2 +
 arch/s390/kvm/priv.c |   41 
 arch/s390/mm/pgtable.c   |   77 ++
 include/asm-generic/pgtable.h|   13 +++
 mm/rmap.c|   10 +
 8 files changed, 196 insertions(+), 1 deletion(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] s390/kvm fixes

2013-05-21 Thread Martin Schwidefsky
On Sun, 19 May 2013 11:49:43 +0300
Gleb Natapov g...@redhat.com wrote:

 Hi Christian,
 
 On Fri, May 17, 2013 at 02:41:30PM +0200, Christian Borntraeger wrote:
  Gleb, Paolo, Marcelo,
  
  here are some low level changes to kvm on s390 that we have been
  cooking for a while now.
  
  Patch s390/pgtable: fix ipte notify bit will go via Martins
  tree into 3.10, but is included to reduce the amount of merge
  conflicts. 
  
  Patch s390: fix gmap_ipte_notifier vs. software dirty pages
  will also go via Martins tree into 3.10 and it fixes a hang with
  heavy host paging and KVM. This is optional for merging, but
  makes testing on kvm/next easier.
  
 Can I add Martin's ACKs to those two then?

Yes.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] KVM updates for the 3.9 merge window

2013-02-25 Thread Martin Schwidefsky
On Sun, 24 Feb 2013 16:05:31 -0800
Linus Torvalds torva...@linux-foundation.org wrote:

 On Wed, Feb 20, 2013 at 5:17 PM, Marcelo Tosatti mtosa...@redhat.com wrote:
 
  Please pull from
 
  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/kvm-3.9-1
 
  to receive the KVM updates for the 3.9 merge window [..]
 
 Ok, particularly the s390 people should check me resolution of the
 conflicts, since they include the renaming of IOINT_VIR to IRQIO_VIR.
 But the uapi header file move should be couble-checked by people who
 use this too.

The IRQIO_VIR is now at the end of the IRQIO_xxx entries while we had
it between the IRQIO_CSC and the IRQIO_PCI entry. No worry, we just
change our code and use the upstream version. Just completed the
double-check and pushed my branches to the linux-s390 tree. The code
is in sync.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] kvm/s390: sigp related changes for 3.6

2012-07-02 Thread Martin Schwidefsky
On Fri, 29 Jun 2012 20:19:46 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:

 On Tue, Jun 26, 2012 at 04:06:35PM +0200, Cornelia Huck wrote:
  Avi, Marcelo,
  
  here are some more s390 patches for the next release.
  
  Patches 1 and 2 are included for dependency reasons; they will also
  be sent through Martin's s390 tree.
 
 I don't see why patch 1 is a dependency for merging in the kvm 
 tree, and why patch 2 should go through both trees?
 
 That is, patch 1 can go through S390 tree, patches 2-6 through 
 KVM tree. No?

Patch 3 has a dependency on patch 2 and patch 2 has a dependency
on patch 1. The hunk in pcpu_running would cause a reject:

@@ -155,8 +131,8 @@ static inline int pcpu_stopped(struct pcpu *pcpu)
 
 static inline int pcpu_running(struct pcpu *pcpu)
 {
-   if (__pcpu_sigp(pcpu-address, sigp_sense_running,
-   0, pcpu-status) != sigp_status_stored)
+   if (__pcpu_sigp(pcpu-address, SIGP_SENSE_RUNNING,
+   0, pcpu-status) != SIGP_CC_STATUS_STORED)
return 1;
/* Status stored condition code is equivalent to cpu not running. */
return 0;


-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 10/12] [PATCH] kvm-s390: storage key interface

2011-12-15 Thread Martin Schwidefsky
On Thu, 15 Dec 2011 11:28:03 +0100
Carsten Otte co...@de.ibm.com wrote:

 + case KVM_S390_KEYOP_SSKE:
 + if (!(vma-vm_flags  (VM_WRITE | VM_MAYWRITE))) {
 + r = -EACCES;
 + break;
 + }

Unfortunately I just realized while discussing with Heiko that a check
for VM_WRITE is not enough. We could still have a read-only pte that
points to a file backed page which is purely read-only. A write access
is allowed but would cause copy-on-write. But we set the storage key
of the original page which would make the read-only page dirty.
We need to solved the race on the dirty bit in a clean way, otherwise
there always will be a corner case.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 10/12] [PATCH] kvm-s390: storage key interface

2011-12-12 Thread Martin Schwidefsky
On Sat, 10 Dec 2011 13:35:39 +0100
Carsten Otte co...@de.ibm.com wrote:

 --- a/arch/s390/mm/pgtable.c
 +++ b/arch/s390/mm/pgtable.c
 @@ -393,6 +393,33 @@ out_unmap:
  }
  EXPORT_SYMBOL_GPL(gmap_map_segment);
 
 +static pmd_t *__pmdp_for_addr(struct mm_struct *mm, unsigned long addr)
 +{
 + struct vm_area_struct *vma;
 + pgd_t *pgd;
 + pud_t *pud;
 + pmd_t *pmd;
 +
 + vma = find_vma(mm, addr);
 + if (!vma || (vma-vm_start  addr))
 + return ERR_PTR(-EFAULT);
 +
 + pgd = pgd_offset(mm, addr);
 + pud = pud_alloc(mm, pgd, addr);
 + if (!pud)
 + return ERR_PTR(-ENOMEM);
 +
 + pmd = pmd_alloc(mm, pud, addr);
 + if (!pmd)
 + return ERR_PTR(-ENOMEM);
 +
 + if (!pmd_present(*pmd) 
 + __pte_alloc(mm, vma, pmd, addr))
 + return ERR_PTR(-ENOMEM);
 +
 + return pmd;
 +}
 +
  /*
   * this function is assumed to be called with mmap_sem held
   */

The __pmdp_for_addr function is fine for the usage in __gmap_fault.

 @@ -806,6 +820,26 @@ int s390_enable_sie(void)
  }
  EXPORT_SYMBOL_GPL(s390_enable_sie);
 
 +pte_t *ptep_for_addr(unsigned long addr)
 +{
 + pmd_t *pmd;
 + pte_t *pte;
 +
 + down_read(current-mm-mmap_sem);
 +
 + pmd = __pmdp_for_addr(current-mm, addr);
 + if (IS_ERR(pmd)) {
 + pte = (pte_t *)pmd;
 + goto up_out;
 + }
 +
 + pte = pte_offset(pmd, addr);
 +up_out:
 + up_read(current-mm-mmap_sem);
 + return pte;
 +}
 +EXPORT_SYMBOL_GPL(ptep_for_addr);
 +
  #if defined(CONFIG_DEBUG_PAGEALLOC)  defined(CONFIG_HIBERNATION)
  bool kernel_page_present(struct page *page)
  {

There is a fundamental locking sanfu. The pointer the the page table
entry is only valid until the mmap_sem is released. The down_read/up_read
has to be done in the caller.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 01/12] [PATCH] kvm-s390: ioctl to switch to user controlled virtual machines

2011-12-01 Thread Martin Schwidefsky
On Thu, 01 Dec 2011 15:15:03 +0200
Avi Kivity a...@redhat.com wrote:

  +
  +   if (kvm-arch.gmap)
  +   gmap_free(kvm-arch.gmap);
  +
  +   kvm-arch.gmap = NULL;
 
 Locking?
 
 What happens if a vcpu is created afterwards?
 
 I guess you don't mind too much since this is a privileged interface for
 a single purpose.

That is indeed a race. A malicious user space could create a new cpu with
KVM_CREATE_VCPU on another thread after the for loop checked that there
are no VCPUs. The new VCPU could then pick up the kvm-arch.gmap and use it
while the caller of KVM_S390_ENABLE_UCONTROL frees the structure.
The kvm_s390_enable_ucontrol function needs to lock with the kvm-lock mutex.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] [PATCH] kvm: Fix tprot locking

2011-11-17 Thread Martin Schwidefsky
On Thu, 17 Nov 2011 12:27:41 +0200
Avi Kivity a...@redhat.com wrote:

 On 11/17/2011 12:00 PM, Carsten Otte wrote:
  From: Christian Borntraeger borntrae...@de.ibm.com 
 
  There is a potential host deadlock in the tprot intercept handling.
  We must not hold the mmap semaphore while resolving the guest
  address. If userspace is remapping, then the memory detection in
  the guest is broken anyway so we can safely separate the 
  address translation from walking the vmas.
 
  Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com 
  Signed-off-by: Carsten Otte co...@de.ibm.com
  ---
 
   arch/s390/kvm/priv.c |   10 --
   1 file changed, 8 insertions(+), 2 deletions(-)
 
  diff -urpN linux-2.6/arch/s390/kvm/priv.c 
  linux-2.6-patched/arch/s390/kvm/priv.c
  --- linux-2.6/arch/s390/kvm/priv.c  2011-10-24 09:10:05.0 +0200
  +++ linux-2.6-patched/arch/s390/kvm/priv.c  2011-11-17 10:03:53.0 
  +0100
  @@ -336,6 +336,7 @@ static int handle_tprot(struct kvm_vcpu
  u64 address1 = disp1 + base1 ? vcpu-arch.guest_gprs[base1] : 0;
  u64 address2 = disp2 + base2 ? vcpu-arch.guest_gprs[base2] : 0;
  struct vm_area_struct *vma;
  +   unsigned long user_address;
   
  vcpu-stat.instruction_tprot++;
   
  @@ -349,9 +350,14 @@ static int handle_tprot(struct kvm_vcpu
  return -EOPNOTSUPP;
   
   
  +   /* we must resolve the address without holding the mmap semaphore.
  +* This is ok since the userspace hypervisor is not supposed to change
  +* the mapping while the guest queries the memory. Otherwise the guest
  +* might crash or get wrong info anyway. */
  +   user_address = (unsigned long) __guestaddr_to_user(vcpu, address1);
  +
  down_read(current-mm-mmap_sem);
  -   vma = find_vma(current-mm,
  -   (unsigned long) __guestaddr_to_user(vcpu, address1));
  +   vma = find_vma(current-mm, user_address);
  if (!vma) {
  up_read(current-mm-mmap_sem);
  return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
 
 
 Unrelated to the patch, but I'm curious: it looks like __gmap_fault()
 dereferences the guest page table?  How can it assume that it is mapped?

The gmap code does not assume that the code is mapped. If the individual
MB has not been mapped in the guest address space or the target memory
is not mapped in the process address space __gmap_fault() returns -EFAULT. 

 I'm probably misreading the code.
 
 A little closer to the patch, x86 handles the same issue by calling
 get_user_pages_fast().  This should be more scalable than bouncing
 mmap_sem, something to consider.

I don't think that the frequency of asynchronous page faults will make
it necessary to use get_user_pages_fast(). We are talking about the
case where I/O is necessary to provide the page that the guest accessed.

The advantage of the way s390 does things is that after __gmap_fault
translated the guest address to a user space address we can just do a
standard page fault for the user space process. Only if that requires
I/O we go the long way. Makes sense?

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] [PATCH] kvm: Fix tprot locking

2011-11-17 Thread Martin Schwidefsky
On Thu, 17 Nov 2011 12:15:52 +0100
Martin Schwidefsky schwidef...@de.ibm.com wrote:

 On Thu, 17 Nov 2011 12:27:41 +0200
 Avi Kivity a...@redhat.com wrote:
 
  On 11/17/2011 12:00 PM, Carsten Otte wrote:
   From: Christian Borntraeger borntrae...@de.ibm.com 
  
   There is a potential host deadlock in the tprot intercept handling.
   We must not hold the mmap semaphore while resolving the guest
   address. If userspace is remapping, then the memory detection in
   the guest is broken anyway so we can safely separate the 
   address translation from walking the vmas.
  
   Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com 
   Signed-off-by: Carsten Otte co...@de.ibm.com
   ---
  
arch/s390/kvm/priv.c |   10 --
1 file changed, 8 insertions(+), 2 deletions(-)
  
   diff -urpN linux-2.6/arch/s390/kvm/priv.c 
   linux-2.6-patched/arch/s390/kvm/priv.c
   --- linux-2.6/arch/s390/kvm/priv.c2011-10-24 09:10:05.0 
   +0200
   +++ linux-2.6-patched/arch/s390/kvm/priv.c2011-11-17 
   10:03:53.0 +0100
   @@ -336,6 +336,7 @@ static int handle_tprot(struct kvm_vcpu
 u64 address1 = disp1 + base1 ? vcpu-arch.guest_gprs[base1] : 0;
 u64 address2 = disp2 + base2 ? vcpu-arch.guest_gprs[base2] : 0;
 struct vm_area_struct *vma;
   + unsigned long user_address;

 vcpu-stat.instruction_tprot++;

   @@ -349,9 +350,14 @@ static int handle_tprot(struct kvm_vcpu
 return -EOPNOTSUPP;


   + /* we must resolve the address without holding the mmap semaphore.
   +  * This is ok since the userspace hypervisor is not supposed to change
   +  * the mapping while the guest queries the memory. Otherwise the guest
   +  * might crash or get wrong info anyway. */
   + user_address = (unsigned long) __guestaddr_to_user(vcpu, address1);
   +
 down_read(current-mm-mmap_sem);
   - vma = find_vma(current-mm,
   - (unsigned long) __guestaddr_to_user(vcpu, address1));
   + vma = find_vma(current-mm, user_address);
 if (!vma) {
 up_read(current-mm-mmap_sem);
 return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING);
  
  
  Unrelated to the patch, but I'm curious: it looks like __gmap_fault()
  dereferences the guest page table?  How can it assume that it is mapped?
 
 The gmap code does not assume that the code is mapped. If the individual
 MB has not been mapped in the guest address space or the target memory
 is not mapped in the process address space __gmap_fault() returns -EFAULT. 
 
  I'm probably misreading the code.
  
  A little closer to the patch, x86 handles the same issue by calling
  get_user_pages_fast().  This should be more scalable than bouncing
  mmap_sem, something to consider.
 
 I don't think that the frequency of asynchronous page faults will make
 it necessary to use get_user_pages_fast(). We are talking about the
 case where I/O is necessary to provide the page that the guest accessed.
 
 The advantage of the way s390 does things is that after __gmap_fault
 translated the guest address to a user space address we can just do a
 standard page fault for the user space process. Only if that requires
 I/O we go the long way. Makes sense?

Hmm, Carsten just made me aware that your question is not about pfault,
it is about the standard case of a guest fault.

For normal guest faults we use a cool trick that the s390 hardware
allows us. We have the paging table for the kvm process and we have the
guest page table for execution in the virtualized context. The trick is
that the guest page table reuses the lowest level of the process page
table. A fault that sets a pte in the process page table will
automatically make that pte visible in the guest page table as well
if the memory region has been mapped in the higher order page tables.
Even the invalidation of a pte will automatically (!!) remove the
referenced page from the guest page table as well, including the TLB
entries on all cpus. The IPTE instruction is your friend :-)
That is why we don't need mm notifiers.

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] S390: Add virtio hotplug add support

2010-09-13 Thread Martin Schwidefsky
On Mon, 13 Sep 2010 13:05:57 +0930
Rusty Russell ru...@rustcorp.com.au wrote:

 On Sun, 12 Sep 2010 06:30:43 pm Avi Kivity wrote:
On 09/12/2010 02:42 AM, Alexander Graf wrote:
   On 24.08.2010, at 15:48, Alexander Graf wrote:
  
   The one big missing feature in s390-virtio was hotplugging. This is no 
   more.
   This patch implements hotplug add support, so you can on the fly add new 
   devices
   in the guest.
  
   Keep in mind that this needs a patch for qemu to actually leverage the
   functionality.
  
   Signed-off-by: Alexander Grafag...@suse.de
   ping (on the patch set)?
  
  
  Actually Marcelo applied it.  But the natural place for it is Rusty's 
  virtio tree.  Rusty, if you want to take it, let me know and I'll drop 
  it from kvm.git.
 
 I thought it would be in the s390 tree, which is why I didn't take it...
 
 But I'm *always* happy to let do the work!

I didn't pick them up after I saw that Marcelo took them. If others want
to do the work, be my guest..

-- 
blue skies,
   Martin.

Reality continues to ruin my life. - Calvin.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html