Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-05 Thread Xiao Guangrong


[ CCed Zhang Yang ]

On 06/04/2015 04:36 PM, Paolo Bonzini wrote:



On 04/06/2015 10:23, Xiao Guangrong wrote:


So, why do you need to always use IPAT=0?  Can patch 15 keep the current
logic for RAM, like this:

 if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu-kvm))
 ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) 
   VMX_EPT_MT_EPTE_SHIFT;
 else
 ret = (MTRR_TYPE_WRBACK  VMX_EPT_MT_EPTE_SHIFT)
 | VMX_EPT_IPAT_BIT;


Yeah, it's okay, actually we considered this way, however
- it's light enough, it did not hurt guest performance based on our
   benchmark.
- the logic has always used for noncherent_dma case, extend it to
   normal case should have low risk and also help us to check the logic.


But noncoherent_dma is not the common case, so it's not necessarily true
that the risk is low.


I thought noncoherent_dma exists on 1st generation(s) IOMMU, it should
be fully tested at that time.




- completely follow MTRRS spec would be better than host hides it.


We are a virtualization platform, we know well when MTRRs are necessary.

Tis a risk from blindly obeying the guest MTRRs: userspace can see stale
data if the guest's accesses bypass the cache.  AMD bypasses this by
enabling snooping even in cases that ordinarily wouldn't snoop; for
Intel the solution is that RAM-backed areas should always use IPAT.


Not sure if UC and other cacheable type combinations on guest and host
will cause problem. The SMD mentioned that snoop is not required only when
The UC attribute comes from the MTRRs and the processors are not required
 to snoop their caches since the data could never have been cached.
(Vol 3. 11.5.2.2)
VMX do not touch hardware MTRR MSRs and i guess snoop works under this case.

I also noticed if SS (self-snooping) is supported we need not to invalidate
cache when programming memory type (Vol 3. 11.11.8), so that means CPU works
well on the page which has different cache types i guess.

After think it carefully, we (Zhang Yang) doubt if always set WB for DMA
memory is really a good idea because we can not assume WB DMA works well for
all devices. One example is that audio DMA (not a MMIO region) is required WC
to improve its performance.

However, we think the SDM is not clear enough so let's do full vMTRR on MMIO
and noncoherent_dma first. :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-04 Thread Paolo Bonzini


On 04/06/2015 10:26, Xiao Guangrong wrote:

 CR0.CD is always 0 in both host and guest, i guess it's why we cleared
 CR0.CD and CR0.NW in vmx_set_cr0().
 
 It reminds me that we should check guest CR0.CD before check guest MTRR
 and disable guest PAT if guest CR0.CD = 1.

I think it can be done separately, on top.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-04 Thread Xiao Guangrong



On 06/03/2015 03:55 PM, Paolo Bonzini wrote:



On 03/06/2015 04:56, Xiao Guangrong wrote:



On 06/01/2015 05:36 PM, Paolo Bonzini wrote:



On 30/05/2015 12:59, Xiao Guangrong wrote:

Currently guest MTRR is completely prohibited if cache snoop is
supported on
IOMMU (!noncoherent_dma) and host does the emulation based on the
knowledge
from host side, however, host side is not the good point to know
what the purpose of guest is. A good example is that pass-throughed VGA
frame buffer is not always UC as host expected


Can you explain how?  The original idea was that such a framebuffer
would be kvm_is_reserved_pfn and thus be unconditionally UC.


Yes, frame-buffer is always UC in current code, however, UC for
frame-buffer causes bad performance.


Understood now, thanks.


So that guest will configure the range to MTRR, this patchset follows
guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to
emulate guest cache type as guest expects.


Unlike e.g. CR0.CD=1, UC memory does not snoop the cache to preserve
coherency.  AMD, has special logic to do this, for example:

- if guest PAT says UC and host MTRR says WB, the processor will not
cache the memory but will snoop the cache as if CR0.CD=1

- if guest PAT says WC and host (nested page table) PAT says WB and
host MTRR says WB, the processor will still do write combining but
also snoop the cache as if CR0.CD=1

I am worried that the lack of this feature could cause problems if
guests map QEMU's VGA framebuffer as uncached.  We have this problem on
ARM, so it's not 100% theoretical.


CR0.CD is always 0 in both host and guest, i guess it's why we cleared
CR0.CD and CR0.NW in vmx_set_cr0().



So, why do you need to always use IPAT=0?  Can patch 15 keep the current
logic for RAM, like this:

if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu-kvm))
ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) 
  VMX_EPT_MT_EPTE_SHIFT;
else
ret = (MTRR_TYPE_WRBACK  VMX_EPT_MT_EPTE_SHIFT)
| VMX_EPT_IPAT_BIT;


Yeah, it's okay, actually we considered this way, however
- it's light enough, it did not hurt guest performance based on our
  benchmark.
- the logic has always used for noncherent_dma case, extend it to
  normal case should have low risk and also help us to check the logic.
- completely follow MTRRS spec would be better than host hides it.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-04 Thread Paolo Bonzini


On 04/06/2015 10:23, Xiao Guangrong wrote:

 So, why do you need to always use IPAT=0?  Can patch 15 keep the current
 logic for RAM, like this:

 if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu-kvm))
 ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) 
   VMX_EPT_MT_EPTE_SHIFT;
 else
 ret = (MTRR_TYPE_WRBACK  VMX_EPT_MT_EPTE_SHIFT)
 | VMX_EPT_IPAT_BIT;
 
 Yeah, it's okay, actually we considered this way, however
 - it's light enough, it did not hurt guest performance based on our
   benchmark.
 - the logic has always used for noncherent_dma case, extend it to
   normal case should have low risk and also help us to check the logic.

But noncoherent_dma is not the common case, so it's not necessarily true
that the risk is low.

 - completely follow MTRRS spec would be better than host hides it.

We are a virtualization platform, we know well when MTRRs are necessary.

Tis a risk from blindly obeying the guest MTRRs: userspace can see stale
data if the guest's accesses bypass the cache.  AMD bypasses this by
enabling snooping even in cases that ordinarily wouldn't snoop; for
Intel the solution is that RAM-backed areas should always use IPAT.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-04 Thread Xiao Guangrong



On 06/04/2015 04:23 PM, Xiao Guangrong wrote:



On 06/03/2015 03:55 PM, Paolo Bonzini wrote:



On 03/06/2015 04:56, Xiao Guangrong wrote:



On 06/01/2015 05:36 PM, Paolo Bonzini wrote:



On 30/05/2015 12:59, Xiao Guangrong wrote:

Currently guest MTRR is completely prohibited if cache snoop is
supported on
IOMMU (!noncoherent_dma) and host does the emulation based on the
knowledge
from host side, however, host side is not the good point to know
what the purpose of guest is. A good example is that pass-throughed VGA
frame buffer is not always UC as host expected


Can you explain how?  The original idea was that such a framebuffer
would be kvm_is_reserved_pfn and thus be unconditionally UC.


Yes, frame-buffer is always UC in current code, however, UC for
frame-buffer causes bad performance.


Understood now, thanks.


So that guest will configure the range to MTRR, this patchset follows
guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to
emulate guest cache type as guest expects.


Unlike e.g. CR0.CD=1, UC memory does not snoop the cache to preserve
coherency.  AMD, has special logic to do this, for example:

- if guest PAT says UC and host MTRR says WB, the processor will not
cache the memory but will snoop the cache as if CR0.CD=1

- if guest PAT says WC and host (nested page table) PAT says WB and
host MTRR says WB, the processor will still do write combining but
also snoop the cache as if CR0.CD=1

I am worried that the lack of this feature could cause problems if
guests map QEMU's VGA framebuffer as uncached.  We have this problem on
ARM, so it's not 100% theoretical.


CR0.CD is always 0 in both host and guest, i guess it's why we cleared
CR0.CD and CR0.NW in vmx_set_cr0().


It reminds me that we should check guest CR0.CD before check guest MTRR
and disable guest PAT if guest CR0.CD = 1.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-03 Thread Paolo Bonzini


On 03/06/2015 04:56, Xiao Guangrong wrote:
 
 
 On 06/01/2015 05:36 PM, Paolo Bonzini wrote:


 On 30/05/2015 12:59, Xiao Guangrong wrote:
 Currently guest MTRR is completely prohibited if cache snoop is
 supported on
 IOMMU (!noncoherent_dma) and host does the emulation based on the
 knowledge
 from host side, however, host side is not the good point to know
 what the purpose of guest is. A good example is that pass-throughed VGA
 frame buffer is not always UC as host expected

 Can you explain how?  The original idea was that such a framebuffer
 would be kvm_is_reserved_pfn and thus be unconditionally UC.
 
 Yes, frame-buffer is always UC in current code, however, UC for
 frame-buffer causes bad performance.

Understood now, thanks.

 So that guest will configure the range to MTRR, this patchset follows
 guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to
 emulate guest cache type as guest expects.

Unlike e.g. CR0.CD=1, UC memory does not snoop the cache to preserve
coherency.  AMD, has special logic to do this, for example:

- if guest PAT says UC and host MTRR says WB, the processor will not
cache the memory but will snoop the cache as if CR0.CD=1

- if guest PAT says WC and host (nested page table) PAT says WB and
host MTRR says WB, the processor will still do write combining but
also snoop the cache as if CR0.CD=1

I am worried that the lack of this feature could cause problems if
guests map QEMU's VGA framebuffer as uncached.  We have this problem on
ARM, so it's not 100% theoretical.

So, why do you need to always use IPAT=0?  Can patch 15 keep the current
logic for RAM, like this:

if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu-kvm))
ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) 
  VMX_EPT_MT_EPTE_SHIFT;
else
ret = (MTRR_TYPE_WRBACK  VMX_EPT_MT_EPTE_SHIFT)
| VMX_EPT_IPAT_BIT;

?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-02 Thread Xiao Guangrong



On 06/01/2015 05:36 PM, Paolo Bonzini wrote:



On 30/05/2015 12:59, Xiao Guangrong wrote:

Currently guest MTRR is completely prohibited if cache snoop is supported on
IOMMU (!noncoherent_dma) and host does the emulation based on the knowledge
from host side, however, host side is not the good point to know
what the purpose of guest is. A good example is that pass-throughed VGA
frame buffer is not always UC as host expected


Can you explain how?  The original idea was that such a framebuffer
would be kvm_is_reserved_pfn and thus be unconditionally UC.


Yes, frame-buffer is always UC in current code, however, UC for frame-buffer
causes bad performance. It's quoted from Documentation/x86/mtrr.txt:

| This is most useful when you have a video (VGA) card on a PCI or AGP bus.
| Enabling write-combining allows bus write transfers to be combined into a
| larger transfer before bursting over the PCI/AGP bus. This can increase
| performance of image write operations 2.5 times or more.

So that guest will configure the range to MTRR, this patchset follows
guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to emulate
guest cache type as guest expects.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-01 Thread Paolo Bonzini


On 01/06/2015 11:36, Paolo Bonzini wrote:
 Does this have a performance impact on shadow?  Perhaps we could cache
 in struct kvm_arch_memory_slot whether the memslot is covered by MTRRs?

Nevermind, patch 15 answers my question.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-01 Thread Paolo Bonzini


On 30/05/2015 12:59, Xiao Guangrong wrote:
 Currently guest MTRR is completely prohibited if cache snoop is supported on
 IOMMU (!noncoherent_dma) and host does the emulation based on the knowledge
 from host side, however, host side is not the good point to know
 what the purpose of guest is. A good example is that pass-throughed VGA
 frame buffer is not always UC as host expected

Can you explain how?  The original idea was that such a framebuffer
would be kvm_is_reserved_pfn and thus be unconditionally UC.

 +bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
 +   int page_num)
 +{
 + struct mtrr_looker looker;
 + struct kvm_mtrr *mtrr_state = vcpu-arch.mtrr_state;
 + u64 start = gfn_to_gpa(gfn), end = gfn_to_gpa(gfn + page_num);
 + int type = -1;
 +
 + mtrr_for_each_mem_type(looker, mtrr_state, start, end) {
 + if (type == -1) {
 + type = looker.mem_type;
 + continue;
 + }
 +
 + if (type != looker.mem_type)
 + return false;
 + }
 +
 + if ((type != -1)  looker.partial_map 
 +   (mtrr_state-def_type != type))
 + return false;
 +

No Pascal-like parentheses.

Does this have a performance impact on shadow?  Perhaps we could cache
in struct kvm_arch_memory_slot whether the memslot is covered by MTRRs?

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-05-30 Thread Xiao Guangrong
Based on Intel's SDM, mapping huge page which do not have consistent
memory cache for each 4k page will cause undefined behavior

In order to avoiding this kind of undefined behavior, we force to use
4k pages under this case

Signed-off-by: Xiao Guangrong guangrong.x...@linux.intel.com
---
 arch/x86/kvm/mmu.c  | 20 +++-
 arch/x86/kvm/mtrr.c | 25 +
 arch/x86/kvm/x86.h  |  2 ++
 3 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7462c57..c8c2a90 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3437,6 +3437,16 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool 
prefault, gfn_t gfn,
return false;
 }
 
+static bool
+check_hugepage_cache_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, int level)
+{
+   int page_num = KVM_PAGES_PER_HPAGE(level);
+
+   gfn = ~(page_num - 1);
+
+   return kvm_mtrr_check_gfn_range_consistency(vcpu, gfn, page_num);
+}
+
 static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
  bool prefault)
 {
@@ -3462,9 +3472,17 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
if (r)
return r;
 
-   force_pt_level = mapping_level_dirty_bitmap(vcpu, gfn);
+   if (mapping_level_dirty_bitmap(vcpu, gfn) ||
+   !check_hugepage_cache_consistency(vcpu, gfn, PT_DIRECTORY_LEVEL))
+   force_pt_level = 1;
+   else
+   force_pt_level = 0;
+
if (likely(!force_pt_level)) {
level = mapping_level(vcpu, gfn);
+   if (level  PT_DIRECTORY_LEVEL 
+   !check_hugepage_cache_consistency(vcpu, gfn, level))
+   level = PT_DIRECTORY_LEVEL;
gfn = ~(KVM_PAGES_PER_HPAGE(level) - 1);
} else
level = PT_PAGE_TABLE_LEVEL;
diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index bc90834..703a66b 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -645,3 +645,28 @@ u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, 
gfn_t gfn)
return type;
 }
 EXPORT_SYMBOL_GPL(kvm_mtrr_get_guest_memory_type);
+
+bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
+ int page_num)
+{
+   struct mtrr_looker looker;
+   struct kvm_mtrr *mtrr_state = vcpu-arch.mtrr_state;
+   u64 start = gfn_to_gpa(gfn), end = gfn_to_gpa(gfn + page_num);
+   int type = -1;
+
+   mtrr_for_each_mem_type(looker, mtrr_state, start, end) {
+   if (type == -1) {
+   type = looker.mem_type;
+   continue;
+   }
+
+   if (type != looker.mem_type)
+   return false;
+   }
+
+   if ((type != -1)  looker.partial_map 
+ (mtrr_state-def_type != type))
+   return false;
+
+   return true;
+}
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index a3dae49..7c30ec8 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -166,6 +166,8 @@ void kvm_vcpu_mtrr_init(struct kvm_vcpu *vcpu);
 bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data);
 int kvm_mtrr_set_msr(struct kvm_vcpu *vcpu, u32 msr, u64 data);
 int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
+bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
+ int page_num);
 
 #define KVM_SUPPORTED_XCR0 (XSTATE_FP | XSTATE_SSE | XSTATE_YMM \
| XSTATE_BNDREGS | XSTATE_BNDCSR \
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html