Re: [PATCH 1/1] virtio_console: perf-test fix [FIX] read-out all data after perf-test [FIX] code clean-up

2010-09-28 Thread Amit Shah
On (Thu) Sep 23 2010 [14:11:52], Lukas Doktor wrote:
 @@ -829,6 +832,11 @@ def run_virtio_console(test, params, env):
  exit_event.set()
  thread.join()
  
 +# Let the guest read-out all the remaining data
 +while not _on_guest(virt.poll('%s', %s)
 +% (port[1], select.POLLIN), vm, 2)[0]:
 +time.sleep(1)

This is just polling the guest, not reading out any data?

(BTW POLLIN will be set if host is disconnected and there's no data to
read, so ensure you don't enter an infinite loop here.)

Amit
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


pci passthrough with KVM

2010-09-28 Thread Inigo Losada
We are using pci passthrough with an SCSI Adapter card. The system is:

-  O.S:  Ubuntu 10.04.1 LTS
-  KVM Packages: 

kvm1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9.2
kvm-pxe   5.4.4-1ubuntu1
qemu-kvm 0.12.3+noroms-0ubuntu9.2 
libvirt-bin  0.7.5-5ubuntu27.2
python-libvirt0.7.5-5ubuntu27.2
libvirt00.7.5-5ubuntu27.2

- Kernel  2.6.32.15+drm33.5.iommurecompiled with following options :

CONFIG_DMAR=y
CONFIG_INTR_REMAP=y

- Apparmor is stopped

When we started the virtual machine we obtain the following error: 

char device redirected to /dev/pts/3 
device: 04:04.0: driver=pci-assign host=04:04.0 
Failed to assign irq for 04:04.0: Operation not permitted 
Perhaps you are assigning a device that shares an IRQ with another device? 


Thanks in advance,


Inigo 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn()

2010-09-28 Thread Sheng Yang
Eliminate:
arch/x86/kvm/emulate.c:801: warning: ‘sv’ may be used uninitialized in this
function

on gcc 4.1.2

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/emulate.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index aead72e..d0df25d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -798,7 +798,7 @@ done:
 
 static void fetch_bit_operand(struct decode_cache *c)
 {
-   long sv, mask;
+   long sv = 0, mask;
 
if (c-dst.type == OP_MEM  c-src.type == OP_REG) {
mask = ~(c-dst.bytes * 8 - 1);
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] qemu-kvm: device assignment: Reflect assigned device's MSI data/address modification to KVM

2010-09-28 Thread Sheng Yang
Device driver can modify MSI data/address when MSI is enabled. This patch
reflect the moditication to KVM.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 hw/device-assignment.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 87f7418..1e83495 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1283,7 +1283,8 @@ static void 
assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
 #ifdef KVM_CAP_DEVICE_MSI
 if (assigned_dev-cap.available  ASSIGNED_DEVICE_CAP_MSI) {
 ctrl_pos = pos + PCI_MSI_FLAGS;
-if (address = ctrl_pos  address + len  ctrl_pos)
+/* Cover control, MSI data/addr fields */
+if (pos = address  address + len  pos + PCI_MSI_DATA_32 + 2)
 assigned_dev_update_msi(pci_dev, ctrl_pos);
 pos += PCI_CAPABILITY_CONFIG_MSI_LENGTH;
 }
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, nmi: workaround sti; hlt race vs nmi; intr

2010-09-28 Thread Avi Kivity

 On 09/27/2010 11:55 PM, H. Peter Anvin wrote:

On 09/27/2010 02:27 AM, Avi Kivity wrote:

  I don't have write permissions for the spec.  If you can verify that all
  existing and future hardware will mask NMI after STI and convince the
  spec owners to update the specifications, I'm all for it; it's certainly
  a cleaner solution.


I'm trying to see if this is doable.  As you can well imagine, though,
it takes time.


Joerg, can you look at this from the AMD side?

First, do AMD processors block NMI after an IF-enabling STI?  The 
documentation explicitly states that they don't:


Sets the interrupt flag (IF) in the rFLAGS register to 1, thereby 
allowing external interrupts received on
the INTR input. Interrupts received on the non-maskable interrupt 
(NMI) input are not affected by this

instruction.


though that may refer to the general effect of EFLAGS.IF, not STI.

Second, if NMIs are blocked, is it possible to retro-document this?

Personally I think the safer route is to take the patch.  There are 
other processors besides Intel and AMD and we can't test all of them, 
not to mention various emulators and virtual machine monitors out there.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] KVM: MMU: cleanup for error mask set while walk guest page table

2010-09-28 Thread Xiao Guangrong
On 09/27/2010 10:30 PM, Avi Kivity wrote:

   r = FNAME(walk_addr)(walker, vcpu, vaddr,
 - !!(access  PFERR_WRITE_MASK),
 - !!(access  PFERR_USER_MASK),
 - !!(access  PFERR_FETCH_MASK));
 + access  PFERR_WRITE_MASK,
 + access  PFERR_USER_MASK,
 + access  PFERR_FETCH_MASK);

   if (r) {
   gpa = gfn_to_gpa(walker.gfn);
 
 Interesting.  Maybe a next step is to pass the page-fault error code
 instead of the various bits? 

Yeah, it's a good idea, i'll post a patch to do it.

 Not sure how that interacts with nested
 ept (which has a different permission model).
 

Umm, we just move the error code parsing from the caller site to 
FNAME(walk_addr)
function, i think it not make trouble for implement nested ept.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fix kvmclock bug

2010-09-28 Thread Jan Kiszka
Am 27.09.2010 21:00, Zachary Amsden wrote:
 On 09/25/2010 11:54 PM, Jan Kiszka wrote:
 That only leaves us with the likely wrong unstable declaration of the
 TSC after resume. And that raises the question for me if KVM is actually
 that much smarter than the Linux kernel in detecting TSC jumps. If
 something is missing, can't we improve the kernel's detection mechanism
 which already has suspend/resume support?

 
 Linux must make the the conservative choice about TSC being declared
 unstable; if it is possible that it has become unstable, it is
 unstable.  Unfortunately, this bodes not well for us, as most of the
 finer points of accuracy depend on having a stable TSC.
 
 There's a bunch of places that declare TSC unstable, and where in the
 suspend / resume cycle that happens would depend on your actual hardware.

It's absolutely clear where this happens: kvm_arch_vcpu_load. And it
seems to happen as the TSC is reset due to suspend-to-RAM.

Again: Linux recovers from this and continues to use the TSC. KVM is
more picky, so my question is if this is really required.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: MMU: move access code parsing to FNAME(walk_addr) function

2010-09-28 Thread Xiao Guangrong
Move access code parsing from caller site to FNAME(walk_addr) function

Signed-off-by: Xiao Guangrong xiaoguangr...@cn.fujitsu.com
---
 arch/x86/kvm/paging_tmpl.h |   40 
 1 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index a83ff37..9a5f7bb 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -116,16 +116,18 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, 
pt_element_t gpte)
  */
 static int FNAME(walk_addr_generic)(struct guest_walker *walker,
struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
-   gva_t addr, int write_fault,
-   int user_fault, int fetch_fault)
+   gva_t addr, u32 access)
 {
pt_element_t pte;
gfn_t table_gfn;
unsigned index, pt_access, uninitialized_var(pte_access);
gpa_t pte_gpa;
bool eperm, present, rsvd_fault;
-   int offset;
-   u32 access = 0;
+   int offset, write_fault, user_fault, fetch_fault;
+
+   write_fault = access  PFERR_WRITE_MASK;
+   user_fault = access  PFERR_USER_MASK;
+   fetch_fault = access  PFERR_FETCH_MASK;
 
trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
 fetch_fault);
@@ -215,6 +217,7 @@ walk:
int lvl = walker-level;
gpa_t real_gpa;
gfn_t gfn;
+   u32 ac;
 
gfn = gpte_to_gfn_lvl(pte, lvl);
gfn += (addr  PT_LVL_OFFSET_MASK(lvl))  PAGE_SHIFT;
@@ -224,10 +227,10 @@ walk:
is_cpuid_PSE36())
gfn += pse36_gfn_delta(pte);
 
-   access |= write_fault | fetch_fault | user_fault;
+   ac = write_fault | fetch_fault | user_fault;
 
real_gpa = mmu-translate_gpa(vcpu, gfn_to_gpa(gfn),
- access);
+ ac);
if (real_gpa == UNMAPPED_GVA)
return 0;
 
@@ -282,21 +285,18 @@ error:
 }
 
 static int FNAME(walk_addr)(struct guest_walker *walker,
-   struct kvm_vcpu *vcpu, gva_t addr,
-   int write_fault, int user_fault, int fetch_fault)
+   struct kvm_vcpu *vcpu, gva_t addr, u32 access)
 {
return FNAME(walk_addr_generic)(walker, vcpu, vcpu-arch.mmu, addr,
-   write_fault, user_fault, fetch_fault);
+   access);
 }
 
 static int FNAME(walk_addr_nested)(struct guest_walker *walker,
   struct kvm_vcpu *vcpu, gva_t addr,
-  int write_fault, int user_fault,
-  int fetch_fault)
+  u32 access)
 {
return FNAME(walk_addr_generic)(walker, vcpu, vcpu-arch.nested_mmu,
-   addr, write_fault, user_fault,
-   fetch_fault);
+   addr, access);
 }
 
 static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
@@ -532,7 +532,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr,
 {
int write_fault = error_code  PFERR_WRITE_MASK;
int user_fault = error_code  PFERR_USER_MASK;
-   int fetch_fault = error_code  PFERR_FETCH_MASK;
struct guest_walker walker;
u64 *sptep;
int write_pt = 0;
@@ -550,8 +549,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t 
addr,
/*
 * Look up the guest pte for the faulting address.
 */
-   r = FNAME(walk_addr)(walker, vcpu, addr, write_fault, user_fault,
-fetch_fault);
+   r = FNAME(walk_addr)(walker, vcpu, addr, error_code);
 
/*
 * The page is not mapped by the guest.  Let the guest handle it.
@@ -669,10 +667,7 @@ static gpa_t FNAME(gva_to_gpa)(struct kvm_vcpu *vcpu, 
gva_t vaddr, u32 access,
gpa_t gpa = UNMAPPED_GVA;
int r;
 
-   r = FNAME(walk_addr)(walker, vcpu, vaddr,
-access  PFERR_WRITE_MASK,
-access  PFERR_USER_MASK,
-access  PFERR_FETCH_MASK);
+   r = FNAME(walk_addr)(walker, vcpu, vaddr, access);
 
if (r) {
gpa = gfn_to_gpa(walker.gfn);
@@ -690,10 +685,7 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu 
*vcpu, gva_t vaddr,
gpa_t gpa = UNMAPPED_GVA;
int r;
 
-   r = FNAME(walk_addr_nested)(walker, vcpu, vaddr,
-   access  

Re: [PATCH 2/7] KVM: MMU: cleanup for error mask set while walk guest page table

2010-09-28 Thread Avi Kivity

 On 09/28/2010 10:58 AM, Xiao Guangrong wrote:

Yeah, it's a good idea, i'll post a patch to do it.

  Not sure how that interacts with nested
  ept (which has a different permission model).


Umm, we just move the error code parsing from the caller site to 
FNAME(walk_addr)
function, i think it not make trouble for implement nested ept.


It means the nept code will have to take the ept error code and convert 
it to the standard error code.  Not too difficult.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] KVM: MMU: update 'root_hpa' out of loop in PAE shadow path

2010-09-28 Thread Avi Kivity

 On 09/28/2010 04:43 AM, Xiao Guangrong wrote:

On 09/27/2010 10:39 PM, Avi Kivity wrote:
   On 09/27/2010 12:14 PM, Xiao Guangrong wrote:
  On 09/27/2010 06:02 PM, Xiao Guangrong wrote:
 The value of 'vcpu-arch.mmu.pae_root' is not modified, so we can
  update 'root_hpa'
 out of the loop
  
 Xiao Guangrongxiaoguangr...@cn.fujitsu.com

  Hi Avi, Marcelo,

  I'm so sorry that missed Signed-off-by:  string in all patches, i'll
  repost this
  patchset after your review. :-(

  I fixed up the signoffs and applied.

Thanks Avi.

I noticed you applied KVM: MMU: rename 'sp-root_count' to 'sp-active_count'
(commit 55d80c448e7e3417) instead of this patch :-)


Thanks for noticing!  Fixed.

No idea how it happened.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, nmi: workaround sti; hlt race vs nmi; intr

2010-09-28 Thread Roedel, Joerg
On Tue, Sep 28, 2010 at 04:50:24AM -0400, Avi Kivity wrote:
   On 09/27/2010 11:55 PM, H. Peter Anvin wrote:
  On 09/27/2010 02:27 AM, Avi Kivity wrote:
  
I don't have write permissions for the spec.  If you can verify that all
existing and future hardware will mask NMI after STI and convince the
spec owners to update the specifications, I'm all for it; it's certainly
a cleaner solution.
  
 
  I'm trying to see if this is doable.  As you can well imagine, though,
  it takes time.
 
 Joerg, can you look at this from the AMD side?

Yes, I will discuss this with the hardware peoples. Will take some time
here too.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: ia64: define kvm_lapic_enabled() to fix a compile error

2010-09-28 Thread Avi Kivity

 On 09/27/2010 12:01 PM, Zhang, Xiantao wrote:


  Maybe we should make ia64 kvm depend on CONFIG_BROKEN.

  It has been experimental for quite a while.

I don't think the kvm/ia64 is broken in the upstream Linux, and it should work 
according to our last try.


When was that?


  The big issue is about userspace support. Latest qemu maybe not work with 
latest kernel, but if you choose an old qemu, it should work!


There's always the chance that changes in the host kernel or 
architecture independent kvm code will cause breakage.  We need regular 
use or testing.



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Add svm cpuid features

2010-09-28 Thread Roedel, Joerg
On Mon, Sep 27, 2010 at 12:22:13PM -0400, Avi Kivity wrote:
   On 09/27/2010 05:40 PM, Roedel, Joerg wrote:
  On Mon, Sep 27, 2010 at 11:02:23AM -0400, Avi Kivity wrote:
  On 09/27/2010 04:58 PM, Avi Kivity wrote:
   On 09/27/2010 03:16 PM, Joerg Roedel wrote:
  This patch adds the svm cpuid feature flags to the qemu
  intialization path. It also adds the svm features available
  on phenom to its cpu-definition and extends the host cpu
  type to support all svm features KVM can provide.


  This should really be all svm features since we'll later mask them
  with kvm capabilities.  This way, if we later add more capabilities,
  we automatically gain support in userspace.
 
  Yes, that is what -cpu host does with these patches applied. The
  svm_features variable is set to -1 in this case and masked out later
  with the KVM capabilities.
 
 
 Well, running current uq/master I get:
 
has_svm: can't execute cpuid 0x800a
kvm: no hardware support
 
 ?

Weird, it worked here as I tested it. I had it on qemu/master and with
all three patches. But patch 1 should not make the difference. I take a
look, have you pushed the failing uq/master? What was your command line?

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Add svm cpuid features

2010-09-28 Thread Avi Kivity

 On 09/28/2010 11:28 AM, Roedel, Joerg wrote:

On Mon, Sep 27, 2010 at 12:22:13PM -0400, Avi Kivity wrote:
On 09/27/2010 05:40 PM, Roedel, Joerg wrote:
On Mon, Sep 27, 2010 at 11:02:23AM -0400, Avi Kivity wrote:
 On 09/27/2010 04:58 PM, Avi Kivity wrote:
   On 09/27/2010 03:16 PM, Joerg Roedel wrote:
  This patch adds the svm cpuid feature flags to the qemu
  intialization path. It also adds the svm features available
  on phenom to its cpu-definition and extends the host cpu
  type to support all svm features KVM can provide.
   
   
  This should really be all svm features since we'll later mask 
them
  with kvm capabilities.  This way, if we later add more 
capabilities,
  we automatically gain support in userspace.
  
Yes, that is what -cpu host does with these patches applied. The
svm_features variable is set to -1 in this case and masked out later
with the KVM capabilities.
  

  Well, running current uq/master I get:

 has_svm: can't execute cpuid 0x800a
 kvm: no hardware support

  ?

Weird, it worked here as I tested it. I had it on qemu/master and with
all three patches. But patch 1 should not make the difference. I take a
look, have you pushed the failing uq/master?


Yes, 8fe6a21c76.


What was your command line?


qemu-system-x86_64 -m 2G -cpu kvm64,+svm,+npt -enable-kvm ...

Note this is qemu.git, so -enable-kvm is needed.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] Emulate MSI-X mask bits for assigned devices

2010-09-28 Thread Sheng Yang
Hi Avi  Marcelo

This patchset would add emulation of MSI-X mask bit for assigned devices in
QEmu.

BTW: We are also purposed an acceleration of MSI-X mask bit for KVM - to get
it done in kernel. That's because sometime MSI-X mask bit was accessed very
frequently by guest and emulation in QEmu cost a lot(tens to hundreds percent
CPU utilization sometime), e.g. guest using Linux 2.6.18 or under some heavy
workload. The method has been proved efficient in Xen, but it would require
VMM to handle MSI-X table. Is that OK with you?

--
regards
Yang, Sheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices

2010-09-28 Thread Sheng Yang
This patch enable per-vector mask for assigned devices using MSI-X.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 arch/x86/kvm/x86.c   |1 +
 include/linux/kvm.h  |9 -
 include/linux/kvm_host.h |1 +
 virt/kvm/assigned-dev.c  |   39 +++
 4 files changed, 49 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8412c91..e6933e6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_DEBUGREGS:
case KVM_CAP_X86_ROBUST_SINGLESTEP:
case KVM_CAP_XSAVE:
+   case KVM_CAP_DEVICE_MSIX_MASK:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 919ae53..f2b7cdc 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
 #endif
 #define KVM_CAP_PPC_GET_PVINFO 57
 #define KVM_CAP_PPC_IRQ_LEVEL 58
+#ifdef __KVM_HAVE_MSIX
+#define KVM_CAP_DEVICE_MSIX_MASK 59
+#endif
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
 };
 
 #define KVM_MAX_MSIX_PER_DEV   256
+
+#define KVM_MSIX_FLAG_MASK 1
+
 struct kvm_assigned_msix_entry {
__u32 assigned_dev_id;
__u32 gsi;
__u16 entry; /* The index of entry in the MSI-X table */
-   __u16 padding[3];
+   __u16 flags;
+   __u16 padding[2];
 };
 
 #endif /* __LINUX_KVM_H */
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0b89d00..a43405c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
 };
 
 #define KVM_ASSIGNED_MSIX_PENDING  0x1
+#define KVM_ASSIGNED_MSIX_MASK 0x2
 struct kvm_guest_msix_entry {
u32 vector;
u16 entry;
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 7c98928..15b8c32 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -17,6 +17,8 @@
 #include linux/pci.h
 #include linux/interrupt.h
 #include linux/slab.h
+#include linux/irqnr.h
+
 #include irq.h
 
 static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head 
*head,
@@ -666,6 +668,30 @@ msix_nr_out:
return r;
 }
 
+static void update_msix_mask(struct kvm_assigned_dev_kernel *assigned_dev,
+int index)
+{
+   int irq;
+
+   if (!assigned_dev-dev-msix_enabled ||
+   !(assigned_dev-irq_requested_type  KVM_DEV_IRQ_HOST_MSIX))
+   return;
+
+   irq = assigned_dev-host_msix_entries[index].vector;
+
+   ASSERT(irq != 0);
+
+   if (assigned_dev-guest_msix_entries[index].flags 
+   KVM_ASSIGNED_MSIX_MASK)
+   disable_irq(irq);
+   else {
+   enable_irq(irq);
+   if (assigned_dev-guest_msix_entries[index].flags 
+   KVM_ASSIGNED_MSIX_PENDING)
+   schedule_work(assigned_dev-interrupt_work);
+   }
+}
+
 static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
   struct kvm_assigned_msix_entry *entry)
 {
@@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
adev-guest_msix_entries[i].entry = entry-entry;
adev-guest_msix_entries[i].vector = entry-gsi;
adev-host_msix_entries[i].entry = entry-entry;
+   if ((entry-flags  KVM_MSIX_FLAG_MASK) 
+   !(adev-guest_msix_entries[i].flags 
+   KVM_ASSIGNED_MSIX_MASK)) {
+   adev-guest_msix_entries[i].flags |=
+   KVM_ASSIGNED_MSIX_MASK;
+   update_msix_mask(adev, i);
+   } else if (!(entry-flags  KVM_MSIX_FLAG_MASK) 
+   (adev-guest_msix_entries[i].flags 
+   KVM_ASSIGNED_MSIX_MASK)) {
+   adev-guest_msix_entries[i].flags =
+   ~KVM_ASSIGNED_MSIX_MASK;
+   update_msix_mask(adev, i);
+   }
break;
}
if (i == adev-entries_nr) {
-- 
1.7.0.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] qemu-kvm: device assignment: Some clean up for MSI-X code

2010-09-28 Thread Sheng Yang

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 hw/device-assignment.c |   75 ---
 1 files changed, 51 insertions(+), 24 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 87f7418..4edae52 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1129,15 +1129,10 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev, 
unsigned int ctrl_pos)
 #endif
 
 #ifdef KVM_CAP_DEVICE_MSIX
-static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev)
+static int get_msix_entries_max_nr(AssignedDevice *adev)
 {
-AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
-uint16_t entries_nr = 0, entries_max_nr;
-int pos = 0, i, r = 0;
-uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
-struct kvm_assigned_msix_nr msix_nr;
-struct kvm_assigned_msix_entry msix_entry;
-void *va = adev-msix_table_page;
+int pos, entries_max_nr;
+PCIDevice *pci_dev = adev-dev;
 
 if (adev-cap.available  ASSIGNED_DEVICE_CAP_MSI)
 pos = pci_dev-cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
@@ -1148,6 +1143,17 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 entries_max_nr = PCI_MSIX_TABSIZE;
 entries_max_nr += 1;
 
+return entries_max_nr;
+}
+
+static int get_msix_valid_entries_nr(AssignedDevice *adev,
+uint16_t entries_max_nr)
+{
+void *va = adev-msix_table_page;
+uint32_t msg_data, msg_ctrl;
+uint16_t entries_nr = 0;
+int i;
+
 /* Get the usable entry number for allocating */
 for (i = 0; i  entries_max_nr; i++) {
 memcpy(msg_ctrl, va + i * 16 + 12, 4);
@@ -1157,11 +1163,20 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 continue;
 entries_nr ++;
 }
+return entries_nr;
+}
+
+static int assigned_dev_update_msix_mmio(PCIDevice *pci_dev,
+ uint16_t entries_nr,
+ uint16_t entries_max_nr)
+{
+AssignedDevice *adev = container_of(pci_dev, AssignedDevice, dev);
+int i, r = 0;
+uint32_t msg_addr, msg_upper_addr, msg_data, msg_ctrl;
+struct kvm_assigned_msix_nr msix_nr;
+struct kvm_assigned_msix_entry msix_entry;
+void *va = adev-msix_table_page;
 
-if (entries_nr == 0) {
-fprintf(stderr, MSI-X entry number is zero!\n);
-return -EINVAL;
-}
 msix_nr.assigned_dev_id = calc_assigned_dev_id(adev-h_segnr, 
adev-h_busnr,
   (uint8_t)adev-h_devfn);
 msix_nr.entry_nr = entries_nr;
@@ -1203,8 +1218,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 adev-entry[entries_nr].u.msi.address_lo = msg_addr;
 adev-entry[entries_nr].u.msi.address_hi = msg_upper_addr;
 adev-entry[entries_nr].u.msi.data = msg_data;
-DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n!, msg_data, msg_addr);
-   kvm_add_routing_entry(kvm_context, adev-entry[entries_nr]);
+DEBUG(MSI-X data 0x%x, MSI-X addr_lo 0x%x\n, msg_data, msg_addr);
+kvm_add_routing_entry(kvm_context, adev-entry[entries_nr]);
 
 msix_entry.gsi = adev-entry[entries_nr].gsi;
 msix_entry.entry = i;
@@ -1226,12 +1241,12 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 return r;
 }
 
-static void assigned_dev_update_msix(PCIDevice *pci_dev, unsigned int ctrl_pos)
+static void assigned_dev_update_msix(PCIDevice *pci_dev, int enable_msix)
 {
 struct kvm_assigned_irq assigned_irq_data;
 AssignedDevice *assigned_dev = container_of(pci_dev, AssignedDevice, dev);
-uint16_t *ctrl_word = (uint16_t *)(pci_dev-config + ctrl_pos);
 int r;
+uint16_t entries_nr, entries_max_nr;
 
 memset(assigned_irq_data, 0, sizeof assigned_irq_data);
 assigned_irq_data.assigned_dev_id  =
@@ -1242,8 +1257,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, 
unsigned int ctrl_pos)
  * try to catch this by only deassigning irqs if the guest is using
  * MSIX or intends to start. */
 if ((assigned_dev-irq_requested_type  KVM_DEV_IRQ_GUEST_MSIX) ||
-(*ctrl_word  PCI_MSIX_ENABLE)) {
-
+enable_msix) {
 assigned_irq_data.flags = assigned_dev-irq_requested_type;
 free_dev_irq_entries(assigned_dev);
 r = kvm_deassign_irq(kvm_context, assigned_irq_data);
@@ -1254,14 +1268,25 @@ static void assigned_dev_update_msix(PCIDevice 
*pci_dev, unsigned int ctrl_pos)
 assigned_dev-irq_requested_type = 0;
 }
 
-if (*ctrl_word  PCI_MSIX_ENABLE) {
-assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
-  KVM_DEV_IRQ_GUEST_MSIX;
-
-if (assigned_dev_update_msix_mmio(pci_dev)  0) {
+entries_max_nr = get_msix_entries_max_nr(assigned_dev);
+if (entries_max_nr == 0) {
+fprintf(stderr, assigned_dev_update_msix: MSI-X entries_max_nr == 0);
+

[PATCH 3/3] qemu-kvm: device assignment: emulate MSI-X mask bits

2010-09-28 Thread Sheng Yang
This patch emulated MSI-X per vector mask bit on assigned device.

Signed-off-by: Sheng Yang sh...@linux.intel.com
---
 hw/device-assignment.c |  127 +++-
 1 files changed, 126 insertions(+), 1 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 4edae52..564c5ab 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1146,6 +1146,7 @@ static int get_msix_entries_max_nr(AssignedDevice *adev)
 return entries_max_nr;
 }
 
+#define MSIX_VECTOR_MASK 0x1
 static int get_msix_valid_entries_nr(AssignedDevice *adev,
 uint16_t entries_max_nr)
 {
@@ -1159,7 +1160,11 @@ static int get_msix_valid_entries_nr(AssignedDevice 
*adev,
 memcpy(msg_ctrl, va + i * 16 + 12, 4);
 memcpy(msg_data, va + i * 16 + 8, 4);
 /* Ignore unused entry even it's unmasked */
+#ifdef KVM_CAP_DEVICE_MSIX_MASK
+if (msg_data == 0 || (msg_ctrl  MSIX_VECTOR_MASK))
+#else
 if (msg_data == 0)
+#endif
 continue;
 entries_nr ++;
 }
@@ -1188,6 +1193,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev,
 }
 
 free_dev_irq_entries(adev);
+memset(pci_dev-msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
+sizeof(*pci_dev-msix_entry_used));
 adev-irq_entries_nr = entries_nr;
 adev-entry = calloc(entries_nr, sizeof(struct kvm_irq_routing_entry));
 if (!adev-entry) {
@@ -1223,6 +1230,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev,
 
 msix_entry.gsi = adev-entry[entries_nr].gsi;
 msix_entry.entry = i;
+pci_dev-msix_entry_used[i] = 1;
 r = kvm_assign_set_msix_entry(kvm_context, msix_entry);
 if (r) {
 fprintf(stderr, fail to set MSI-X entry! %s\n, strerror(-r));
@@ -1266,6 +1274,8 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev, 
int enable_msix)
 perror(assigned_dev_update_msix: deassign irq);
 
 assigned_dev-irq_requested_type = 0;
+memset(pci_dev-msix_entry_used, 0, KVM_MAX_MSIX_PER_DEV *
+sizeof(*pci_dev-msix_entry_used));
 }
 
 entries_max_nr = get_msix_entries_max_nr(assigned_dev);
@@ -1273,10 +1283,16 @@ static void assigned_dev_update_msix(PCIDevice 
*pci_dev, int enable_msix)
 fprintf(stderr, assigned_dev_update_msix: MSI-X entries_max_nr == 0);
 return;
 }
+/*
+ * Guest may try to enable MSI-X before setting MSI-X entry done, so
+ * let's wait until guest unmask the entries.
+ */
 entries_nr = get_msix_valid_entries_nr(assigned_dev, entries_max_nr);
 if (entries_nr == 0) {
+#ifndef KVM_CAP_DEVICE_MSIX_MASK
 if (enable_msix)
 fprintf(stderr, MSI-X entry number is zero!\n);
+#endif
 return;
 }
 if (enable_msix) {
@@ -1320,7 +1336,8 @@ static void 
assigned_device_pci_cap_write_config(PCIDevice *pci_dev, uint32_t ad
 if (address = ctrl_pos  address + len  ctrl_pos) {
 ctrl_pos--; /* control is word long */
 ctrl_word = (uint16_t *)(pci_dev-config + ctrl_pos);
-assigned_dev_update_msix(pci_dev, (*ctrl_word  PCI_MSIX_ENABLE));
+assigned_dev_update_msix(pci_dev,
+(*ctrl_word  PCI_MSIX_ENABLE)  !(*ctrl_word  
PCI_MSIX_MASK));
}
 pos += PCI_CAPABILITY_CONFIG_MSIX_LENGTH;
 }
@@ -1412,6 +1429,104 @@ static uint32_t msix_mmio_readw(void *opaque, 
target_phys_addr_t addr)
 (8 * (addr  3)))  0x;
 }
 
+#ifdef KVM_CAP_DEVICE_MSIX_MASK
+static void msix_mmio_access_mask_bit(AssignedDevice *adev, int index)
+{
+void *page = adev-msix_table_page;
+uint32_t msg_ctrl, msg_data, msg_upper_addr, msg_addr;
+struct kvm_assigned_msix_entry msix_entry;
+int r = 0, pos, ctrl_word, entry_idx, entries_max_nr;
+struct kvm_irq_routing_entry old_entry = {};
+
+memcpy(msg_addr, (char *)page + index * 16, 4);
+memcpy(msg_upper_addr, (char *)page + index * 16 + 4, 4);
+memcpy(msg_data, (char *)page + index * 16 + 8, 4);
+memcpy(msg_ctrl, (char *)page + index * 16 + 12, 4);
+DEBUG(Access mask bit: MSI-X entries index %d: 
+msg_addr 0x%x, msg_upper_addr 0x%x, msg_data 0x%x, vec_ctl 
0x%x\n,
+index, msg_addr, msg_upper_addr, msg_data, msg_ctrl);
+
+if (adev-cap.available  ASSIGNED_DEVICE_CAP_MSI)
+pos = adev-dev.cap.start + PCI_CAPABILITY_CONFIG_MSI_LENGTH;
+else
+pos = adev-dev.cap.start;
+
+ctrl_word = *(uint16_t *)(adev-dev.config + pos + 2);
+
+if (!((ctrl_word  PCI_MSIX_ENABLE)  !(ctrl_word  PCI_MSIX_MASK)))
+return;
+
+if (!adev-dev.msix_entry_used[index]  (msg_ctrl  MSIX_VECTOR_MASK) == 
0) {
+DEBUG(Try to modify unenabled MSI-X entry %d's mask. 
+Reenable MSI-X.\n,
+index);
+assigned_dev_update_msix(adev-dev, 1);
+

Re: [PATCH v11 17/17]add two new ioctls for mp device.

2010-09-28 Thread Michael S. Tsirkin
On Sat, Sep 25, 2010 at 12:27:35PM +0800, xiaohui@intel.com wrote:
 From: Xin Xiaohui xiaohui@intel.com
 
 The patch add two ioctls for mp device.
 One is for userspace to query how much memory locked to make mp device
 run smoothly. Another one is for userspace to set how much meory locked
 it really wants.
 
 ---
  drivers/vhost/mpassthru.c |  103 
 +++--
  include/linux/mpassthru.h |2 +
  2 files changed, 54 insertions(+), 51 deletions(-)
 
 diff --git a/drivers/vhost/mpassthru.c b/drivers/vhost/mpassthru.c
 index d86d94c..e3a0199 100644
 --- a/drivers/vhost/mpassthru.c
 +++ b/drivers/vhost/mpassthru.c
 @@ -67,6 +67,8 @@ static int debug;
  #define COPY_THRESHOLD (L1_CACHE_BYTES * 4)
  #define COPY_HDR_LEN   (L1_CACHE_BYTES  64 ? 64 : L1_CACHE_BYTES)
  
 +#define DEFAULT_NEED ((8192*2*2)*4096)
 +
  struct frag {
   u16 offset;
   u16 size;
 @@ -110,7 +112,8 @@ struct page_ctor {
   int rq_len;
   spinlock_t  read_lock;

try documenting what fields are protected by which lock, btw.

   /* record the locked pages */
 - int lock_pages;
 + int locked_pages;
 + int cur_pages;
   struct rlimit   o_rlim;

unused now?

   struct net_device   *dev;
   struct mpassthru_port   port;

This structure name should start with mp_ to avoid namespace pollution.
Also ctor implies a contructor function: see pgtable_page_ctor - it is
not a very good name for a structure.


 @@ -122,6 +125,7 @@ struct mp_struct {
   struct net_device   *dev;
   struct page_ctor*ctor;
   struct socket   socket;
 + struct task_struct  *user;
  
  #ifdef MPASSTHRU_DEBUG
   int debug;
 @@ -231,7 +235,8 @@ static int page_ctor_attach(struct mp_struct *mp)
   ctor-port.ctor = page_ctor;
   ctor-port.sock = mp-socket;
   ctor-port.hash = mp_lookup;
 - ctor-lock_pages = 0;
 + ctor-locked_pages = 0;
 + ctor-cur_pages = 0;
  
   /* locked by mp_mutex */
   dev-mp_port = ctor-port;
 @@ -264,37 +269,6 @@ struct page_info *info_dequeue(struct page_ctor *ctor)
   return info;
  }
  
 -static int set_memlock_rlimit(struct page_ctor *ctor, int resource,
 -   unsigned long cur, unsigned long max)
 -{
 - struct rlimit new_rlim, *old_rlim;
 - int retval;
 -
 - if (resource != RLIMIT_MEMLOCK)
 - return -EINVAL;
 - new_rlim.rlim_cur = cur;
 - new_rlim.rlim_max = max;
 -
 - old_rlim = current-signal-rlim + resource;
 -
 - /* remember the old rlimit value when backend enabled */
 - ctor-o_rlim.rlim_cur = old_rlim-rlim_cur;
 - ctor-o_rlim.rlim_max = old_rlim-rlim_max;
 -
 - if ((new_rlim.rlim_max  old_rlim-rlim_max) 
 - !capable(CAP_SYS_RESOURCE))
 - return -EPERM;
 -
 - retval = security_task_setrlimit(resource, new_rlim);
 - if (retval)
 - return retval;
 -
 - task_lock(current-group_leader);
 - *old_rlim = new_rlim;
 - task_unlock(current-group_leader);
 - return 0;
 -}
 -
  static void relinquish_resource(struct page_ctor *ctor)
  {
   if (!(ctor-dev-flags  IFF_UP) 
 @@ -323,7 +297,7 @@ static void mp_ki_dtor(struct kiocb *iocb)
   } else
   info-ctor-wq_len--;
   /* Decrement the number of locked pages */
 - info-ctor-lock_pages -= info-pnum;
 + info-ctor-cur_pages -= info-pnum;
   kmem_cache_free(ext_page_info_cache, info);
   relinquish_resource(info-ctor);
  
 @@ -357,6 +331,7 @@ static int page_ctor_detach(struct mp_struct *mp)
  {
   struct page_ctor *ctor;
   struct page_info *info;
 + struct task_struct *tsk = mp-user;
   int i;
  
   /* locked by mp_mutex */
 @@ -375,9 +350,9 @@ static int page_ctor_detach(struct mp_struct *mp)
  
   relinquish_resource(ctor);
  
 - set_memlock_rlimit(ctor, RLIMIT_MEMLOCK,
 -ctor-o_rlim.rlim_cur,
 -ctor-o_rlim.rlim_max);
 + down_write(tsk-mm-mmap_sem);
 + tsk-mm-locked_vm -= ctor-locked_pages;
 + up_write(tsk-mm-mmap_sem);
  
   /* locked by mp_mutex */
   ctor-dev-mp_port = NULL;
 @@ -514,7 +489,6 @@ static struct page_info *mp_hash_delete(struct page_ctor 
 *ctor,
  {
   key_mp_t key = mp_hash(info-pages[0], HASH_BUCKETS);
   struct page_info *tmp = NULL;
 - int i;
  
   tmp = ctor-hash_table[key];
   while (tmp) {
 @@ -565,14 +539,11 @@ static struct page_info *alloc_page_info(struct 
 page_ctor *ctor,
   int rc;
   int i, j, n = 0;
   int len;
 - unsigned long base, lock_limit;
 + unsigned long base;
   struct page_info *info = NULL;
  
 - lock_limit = current-signal-rlim[RLIMIT_MEMLOCK].rlim_cur;
 - lock_limit = PAGE_SHIFT;
 -
 - if (ctor-lock_pages + count  lock_limit  npages) {
 - 

Re: [PATCH 3/3] Add svm cpuid features

2010-09-28 Thread Roedel, Joerg
On Tue, Sep 28, 2010 at 05:37:58AM -0400, Avi Kivity wrote:
   On 09/28/2010 11:28 AM, Roedel, Joerg wrote:

  Weird, it worked here as I tested it. I had it on qemu/master and with
  all three patches. But patch 1 should not make the difference. I take a
  look, have you pushed the failing uq/master?
 
 Yes, 8fe6a21c76.
 
  What was your command line?
 
 qemu-system-x86_64 -m 2G -cpu kvm64,+svm,+npt -enable-kvm ...
 
 Note this is qemu.git, so -enable-kvm is needed.

Ok, I apparently forgot to force the CPUID xlevel to be 0x800A when
SVM is enabled, probably because I only tested CPUID models where xlevel
already defaults to 0x800A. Attached is a fix, thanks for catching
this.

Joerg


From d18d973d037f1af1869e114aea1f076ea3a1ee7a Mon Sep 17 00:00:00 2001
From: Joerg Roedel joerg.roe...@amd.com
Date: Tue, 28 Sep 2010 11:58:49 +0200
Subject: [PATCH 1/1] qemu: Force xlevel to at least 0x800A if SVM is enabled

This patch forces the extended CPUID level to be at least
0x800A if SVM is enabled in the CPU model.

Signed-off-by: Joerg Roedel joerg.roe...@amd.com
---
 target-i386/cpuid.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index 0e0bf60..0630fe1 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -714,6 +714,11 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
 x86_cpu_def-ext3_features = ~minus_ext3_features;
 x86_cpu_def-kvm_features = ~minus_kvm_features;
 x86_cpu_def-svm_features = ~minus_svm_features;
+if ((x86_cpu_def-ext3_features  CPUID_EXT3_SVM) 
+(x86_cpu_def-xlevel  0x800A)) {
+/* Force xlevel to at least 0x800A if SVM enabled */
+x86_cpu_def-xlevel = 0x800A;
+}
 if (check_cpuid) {
 if (check_features_against_host(x86_cpu_def)  enforce_cpuid)
 goto error;
-- 
1.7.0.4


-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] VFIO V4: VFIO driver: Non-privileged user level PCI drivers

2010-09-28 Thread Michael S. Tsirkin
On Mon, Sep 27, 2010 at 02:46:32PM -0600, Alex Williamson wrote:
 On Wed, 2010-09-22 at 14:18 -0700, Tom Lyon wrote:
  +ssize_t vfio_mem_readwrite(
  +   int write,
  +   struct vfio_dev *vdev,
  +   char __user *buf,
  +   size_t count,
  +   loff_t *ppos)
  +{
  +   struct pci_dev *pdev = vdev-pdev;
  +   resource_size_t end;
  +   void __iomem *io;
  +   loff_t pos;
  +   int pci_space;
  +
  +   pci_space = vfio_offset_to_pci_space(*ppos);
  +   pos = vfio_offset_to_pci_offset(*ppos);
  +
  +   if (!pci_resource_start(pdev, pci_space))
  +   return -EINVAL;
  +   end = pci_resource_len(pdev, pci_space);
  +   if (vdev-barmap[pci_space] == NULL)
  +   vdev-barmap[pci_space] = pci_iomap(pdev, pci_space, 0);
  +   io = vdev-barmap[pci_space];
  +
 
 So we do a pci_iomap, but never do corresponding pci_iounmap.  This also
 only works for the first 6 BARs since the ROM BAR needs pci_map_rom.


An issue with ROM is that I think it can not be enabled together
with BARs. This is why pci_read_rom/pci_write_rom do what
they do.

  I
 wonder if we should be doing all the BAR mapping at open and unmap at
 close so that we can fail if the device can't get basic resources.

I belive we should do this on ioctl so that e.g. hotunplug
can reset the device clean it up. Unused device should also not
consume resources.


 I
 believe we should also be calling pci_request_regions in here somewhere.
 Perhaps something like this:
 
 diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
 index a18e39a..d3886d9 100644
 --- a/drivers/vfio/vfio_main.c
 +++ b/drivers/vfio/vfio_main.c
 @@ -85,6 +85,53 @@ static inline int overlap(int a1, int b1, int a2, int b2)
   return !(b2 = a1 || b1 = a2);
  }
  
 +static int vfio_setup_pci(struct vfio_dev *vdev)
 +{
 + int ret, bar;
 +
 + ret = pci_enable_device(vdev-pdev);
 + if (ret)
 + return ret;
 + 
 + ret = pci_request_regions(vdev-pdev, VFIO);
 + if (ret) {
 + pci_disable_device(vdev-pdev);
 + return ret;
 + }
 +
 + for (bar = PCI_STD_RESOURCES; bar = PCI_ROM_RESOURCE; bar++) {
 + if (!pci_resource_len(vdev-pdev, bar))
 + continue;
 + if (bar != PCI_ROM_RESOURCE) {
 + if (!pci_resource_start(vdev-pdev, bar))
 + continue;
 + vdev-barmap[bar] = pci_iomap(vdev-pdev, bar, 0);
 + } else {
 + size_t size;
 + vdev-barmap[bar] = pci_map_rom(vdev-pdev, size);
 + }
 + }
 + return ret;
 +}
 +
 +static void vfio_disable_pci(struct vfio_dev *vdev)
 +{
 + int bar;
 +
 + for (bar = PCI_STD_RESOURCES; bar = PCI_ROM_RESOURCE; bar++) {
 + if (!vdev-barmap[bar])
 + continue;
 + if (bar != PCI_ROM_RESOURCE)
 + pci_iounmap(vdev-pdev, vdev-barmap[bar]);
 + else
 + pci_unmap_rom(vdev-pdev, vdev-barmap[bar]);
 + vdev-barmap[bar] = NULL;
 + }
 +
 + pci_release_regions(vdev-pdev);
 + pci_disable_device(vdev-pdev);
 +}
 +
  static int vfio_open(struct inode *inode, struct file *filep)
  {
   struct vfio_dev *vdev;
 @@ -110,7 +157,7 @@ static int vfio_open(struct inode *inode, struct file 
 *filep)
   INIT_LIST_HEAD(listener-dm_list);
   filep-private_data = listener;
   if (vdev-listeners == 0)
 - ret = pci_enable_device(vdev-pdev);
 + ret = vfio_setup_pci(vdev);
   if (ret == 0)
   vdev-listeners++;
   mutex_unlock(vdev-lgate);
 @@ -151,7 +198,7 @@ static int vfio_release(struct inode *inode, struct file 
 *filep)
   vdev-vconfig = NULL;
   kfree(vdev-pci_config_map);
   vdev-pci_config_map = NULL;
 - pci_disable_device(vdev-pdev);
 + vfio_disable_pci(vdev);
   vfio_domain_unset(vdev);
   wake_up(vdev-dev_idle_q);
   }
 diff --git a/drivers/vfio/vfio_rdwr.c b/drivers/vfio/vfio_rdwr.c
 index 1fd50a6..7705b45 100644
 --- a/drivers/vfio/vfio_rdwr.c
 +++ b/drivers/vfio/vfio_rdwr.c
 @@ -64,7 +64,7 @@ ssize_t vfio_io_readwrite(
   if (pos + count  end)
   return -EINVAL;
   if (vdev-barmap[pci_space] == NULL)
 - vdev-barmap[pci_space] = pci_iomap(pdev, pci_space, 0);
 + return -EINVAL;
   io = vdev-barmap[pci_space];
  
   while (count  0) {
 @@ -137,7 +137,12 @@ ssize_t vfio_mem_readwrite(
   return -EINVAL;
   end = pci_resource_len(pdev, pci_space);
   if (vdev-barmap[pci_space] == NULL)
 - vdev-barmap[pci_space] = pci_iomap(pdev, pci_space, 0);
 + return -EINVAL;
 + if (pci_space == PCI_ROM_RESOURCE) {
 + u32 rom = *(u32 *)(vdev-vconfig + PCI_ROM_ADDRESS);
 + if (!(rom  PCI_ROM_ADDRESS_ENABLE))
 

Re: [PATCH 3/3] Add svm cpuid features

2010-09-28 Thread Avi Kivity

 On 09/28/2010 12:05 PM, Roedel, Joerg wrote:

On Tue, Sep 28, 2010 at 05:37:58AM -0400, Avi Kivity wrote:
On 09/28/2010 11:28 AM, Roedel, Joerg wrote:

Weird, it worked here as I tested it. I had it on qemu/master and with
all three patches. But patch 1 should not make the difference. I take a
look, have you pushed the failing uq/master?

  Yes, 8fe6a21c76.

What was your command line?

  qemu-system-x86_64 -m 2G -cpu kvm64,+svm,+npt -enable-kvm ...

  Note this is qemu.git, so -enable-kvm is needed.

Ok, I apparently forgot to force the CPUID xlevel to be 0x800A when
SVM is enabled, probably because I only tested CPUID models where xlevel
already defaults to 0x800A. Attached is a fix, thanks for catching
this.

Joerg


 From d18d973d037f1af1869e114aea1f076ea3a1ee7a Mon Sep 17 00:00:00 2001
From: Joerg Roedeljoerg.roe...@amd.com
Date: Tue, 28 Sep 2010 11:58:49 +0200
Subject: [PATCH 1/1] qemu: Force xlevel to at least 0x800A if SVM is enabled

This patch forces the extended CPUID level to be at least
0x800A if SVM is enabled in the CPU model.

Signed-off-by: Joerg Roedeljoerg.roe...@amd.com
---
  target-i386/cpuid.c |5 +
  1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index 0e0bf60..0630fe1 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -714,6 +714,11 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
  x86_cpu_def-ext3_features= ~minus_ext3_features;
  x86_cpu_def-kvm_features= ~minus_kvm_features;
  x86_cpu_def-svm_features= ~minus_svm_features;
+if ((x86_cpu_def-ext3_features  CPUID_EXT3_SVM)
+(x86_cpu_def-xlevel  0x800A)) {
+/* Force xlevel to at least 0x800A if SVM enabled */
+x86_cpu_def-xlevel = 0x800A;
+}
  if (check_cpuid) {
  if (check_features_against_host(x86_cpu_def)  enforce_cpuid)
  goto error;


I can't say I like the interdependency, but saying something like -cpu 
kvm64,+svm,xlevel=0x800a is much worse.


If no one has a better idea, I'll apply the patch.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] VFIO V4: VFIO driver: Non-privileged user level PCI drivers

2010-09-28 Thread Michael S. Tsirkin
On Mon, Sep 27, 2010 at 02:57:49PM -0600, Alex Williamson wrote:
 On Mon, 2010-09-27 at 13:54 +0200, Michael S. Tsirkin wrote:
  On Wed, Sep 22, 2010 at 02:18:24PM -0700, Tom Lyon wrote:
   
   Signed-off-by: Tom Lyon p...@cisco.com
  
  Some comments on the pci bits:
  
  After going over them for the Nth time - something needs to be done
  with the rvirt/write tables. I doubt anyone besides me and you
  has gone over them: 
 
 /me bites tongue...

Ouch
I figured if anyone looked at them he'd have complained :)

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.35-rc1 regression with pvclock and smp guests

2010-09-28 Thread Michael Tokarev
Arjan Koers 0h61vkll2ly8 at xutrox.com writes:

[]
  I've attached the printk patches for 2.6.34.1 and 2.6.35, in case
 anyone needs them...
 
 
 Move a printk that's using the clock before it's ready
 
 Fix a hang during SMP kernel boot on KVM that showed up
 after commit 489fb490dbf8dab0249ad82b56688ae3842a79e8
 (2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0
 (2.6.34.1). The problem only occurs when
 CONFIG_PRINTK_TIME is set.
 
 Signed-off-by: Arjan Koers 0h61vkll2ly8 at xutrox.com
 
 diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
 index feaeb0d..71bf2df 100644
 --- a/arch/x86/kernel/kvmclock.c
 +++ b/arch/x86/kernel/kvmclock.c
 @@ -125,12 +125,15 @@ static struct clocksource kvm_clock = {
  static int kvm_register_clock(char *txt)
  {
   int cpu = smp_processor_id();
 - int low, high;
 + int low, high, ret;
 +
   low = (int)__pa(per_cpu(hv_clock, cpu)) | 1;
   high = ((u64)__pa(per_cpu(hv_clock, cpu))  32);
 + ret = native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
   printk(KERN_INFO kvm-clock: cpu %d, msr %x:%x, %s\n,
  cpu, high, low, txt);
 - return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high);
 +
 + return ret;
  }
 
  #ifdef CONFIG_X86_LOCAL_APIC


Folks, should this be sent to -stable kernel?  It is not in any
upstream kernel as far as I can see (not in linus tree too), but
this is quite an issue and is hitting people

The discussion were stalled quite a while ago too -- this email has
Date: Mon, 02 Aug 2010 23:35:28 +0200.

Thanks!

/mjt

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 13/17] Add mp(mediate passthru) device.

2010-09-28 Thread Michael S. Tsirkin
On Mon, Sep 27, 2010 at 10:23:33PM +0100, Ben Hutchings wrote:
  +/* The main function to transform the guest user space address
  + * to host kernel address via get_user_pages(). Thus the hardware
  + * can do DMA directly to the external buffer address.
  + */
  +static struct page_info *alloc_page_info(struct page_ctor *ctor,
  +   struct kiocb *iocb, struct iovec *iov,
  +   int count, struct frag *frags,
  +   int npages, int total)
  +{
  +   int rc;
  +   int i, j, n = 0;
  +   int len;
  +   unsigned long base, lock_limit;
  +   struct page_info *info = NULL;
  +
  +   lock_limit = current-signal-rlim[RLIMIT_MEMLOCK].rlim_cur;
  +   lock_limit = PAGE_SHIFT;
  +
  +   if (ctor-lock_pages + count  lock_limit  npages) {
  +   printk(KERN_INFO exceed the locked memory rlimit.);
  +   return NULL;
  +   }
 
 What if the process is locking pages with mlock() as well?  Doesn't this
 allow it to lock twice as many pages as it should be able to?

No, since locked_vm is incremented by both mp and mlock.
Or at least, that's the idea :)
In any case, twice as many would not be a big deal: admin can control
which processes can do this by controlling access to the device.

  +   info = kmem_cache_alloc(ext_page_info_cache, GFP_KERNEL);
  +   
  +   if (!info)
  +   return NULL;
  +   info-skb = NULL;
  +   info-next = info-prev = NULL;
  +
  +   for (i = j = 0; i  count; i++) {
  +   base = (unsigned long)iov[i].iov_base;
  +   len = iov[i].iov_len;
  +
  +   if (!len)
  +   continue;
  +   n = ((base  ~PAGE_MASK) + len + ~PAGE_MASK)  PAGE_SHIFT;
  +
  +   rc = get_user_pages_fast(base, n, npages ? 1 : 0,
  +   info-pages[j]);
  +   if (rc != n)
  +   goto failed;
  +
  +   while (n--) {
  +   frags[j].offset = base  ~PAGE_MASK;
  +   frags[j].size = min_t(int, len,
  +   PAGE_SIZE - frags[j].offset);
  +   len -= frags[j].size;
  +   base += frags[j].size;
  +   j++;
  +   }
  +   }
  +
  +#ifdef CONFIG_HIGHMEM
  +   if (npages  !(dev-features  NETIF_F_HIGHDMA)) {
  +   for (i = 0; i  j; i++) {
  +   if (PageHighMem(info-pages[i]))
  +   goto failed;
  +   }
  +   }
  +#endif
 
 Shouldn't you try to allocate lowmem pages explicitly, rather than
 failing at this point?

We don't allocate pages, we lock given pages. Once this is
integrated in macvtap presumably we'll fall back on data copy
for such devices.

...

  +   skb_reserve(skb, NET_IP_ALIGN);
  +   skb_put(skb, len);
  +
  +   if (skb_copy_datagram_from_iovec(skb, 0, iov, 0, len)) {
  +   kfree_skb(skb);
  +   return -EAGAIN;
  +   }
  +
  +   skb-protocol = eth_type_trans(skb, mp-dev);
 
 Why are you calling eth_type_trans() on transmit?

So that GSO can work.  BTW macvtap does:

skb_set_network_header(skb, ETH_HLEN);
skb_reset_mac_header(skb);
skb-protocol = eth_hdr(skb)-h_proto;

and I think this is broken for vlans. Arnd?

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 17/17]add two new ioctls for mp device.

2010-09-28 Thread Michael S. Tsirkin
On Mon, Sep 27, 2010 at 10:36:15PM +0100, Ben Hutchings wrote:
 On Sat, 2010-09-25 at 12:27 +0800, xiaohui@intel.com wrote:
  From: Xin Xiaohui xiaohui@intel.com
  
  The patch add two ioctls for mp device.
  One is for userspace to query how much memory locked to make mp device
  run smoothly. Another one is for userspace to set how much meory locked
  it really wants.
 [...]
  diff --git a/include/linux/mpassthru.h b/include/linux/mpassthru.h
  index ba8f320..083e9f7 100644
  --- a/include/linux/mpassthru.h
  +++ b/include/linux/mpassthru.h
  @@ -7,6 +7,8 @@
   /* ioctl defines */
   #define MPASSTHRU_BINDDEV  _IOW('M', 213, int)
   #define MPASSTHRU_UNBINDDEV_IO('M', 214)
  +#define MPASSTHRU_SET_MEM_LOCKED   _IOW('M', 215, unsigned long)
  +#define MPASSTHRU_GET_MEM_LOCKED_NEED  _IOR('M', 216, unsigned long)
 [...]
 
 These will need compat handling.  You can avoid that by defining them to
 use a parameter type of u32 or u64.
 
 Ben.

No need to introduce compat mess unless we have to.  I guess int is
sufficient: locking = 2G just for guest networking is insane.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] virtio_balloon: disable oom killer when fill balloon

2010-09-28 Thread Dave Young
Balloon could cause guest memory oom killing and panic. If we disable the oom 
killer it will be better at least avoid guest panic.

If alloc failed we can just adjust the balloon target to be equal to current 
number by call vdev-config-set

But during test I found the config-set num_pages does not change the config 
actually, Should I do hacks in userspace as well? If so where should I start to 
hack? 

Thanks

--- linux-2.6.orig/drivers/virtio/virtio_balloon.c  2010-09-25 
20:58:14.19001 +0800
+++ linux-2.6/drivers/virtio/virtio_balloon.c   2010-09-28 21:05:42.20675 
+0800
@@ -25,6 +25,7 @@
 #include linux/freezer.h
 #include linux/delay.h
 #include linux/slab.h
+#include linux/oom.h
 
 struct virtio_balloon
 {
@@ -97,8 +98,22 @@ static void tell_host(struct virtio_ball
wait_for_completion(vb-acked);
 }
 
-static void fill_balloon(struct virtio_balloon *vb, size_t num)
+static int cblimit(int times)
 {
+   static int t;
+
+   if (t  times)
+   t++;
+   else
+   t = 0;
+
+   return !t;
+}
+
+static int fill_balloon(struct virtio_balloon *vb, size_t num)
+{
+   int ret = 0;
+
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb-pfns));
 
@@ -106,10 +121,13 @@ static void fill_balloon(struct virtio_b
struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
__GFP_NOMEMALLOC | __GFP_NOWARN);
if (!page) {
-   if (printk_ratelimit())
+   if (cblimit(5)) {
dev_printk(KERN_INFO, vb-vdev-dev,
   Out of puff! Can't get %zu pages\n,
   num);
+   ret = -ENOMEM;
+   goto out;
+   }
/* Sleep for at least 1/5 of a second before retry. */
msleep(200);
break;
@@ -120,11 +138,11 @@ static void fill_balloon(struct virtio_b
list_add(page-lru, vb-pages);
}
 
-   /* Didn't get any?  Oh well. */
-   if (vb-num_pfns == 0)
-   return;
+out:
+   if (vb-num_pfns)
+   tell_host(vb, vb-inflate_vq);
 
-   tell_host(vb, vb-inflate_vq);
+   return ret;
 }
 
 static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -251,6 +269,14 @@ static void update_balloon_size(struct v
  actual, sizeof(actual));
 }
 
+static void update_balloon_target(struct virtio_balloon *vb)
+{
+   __le32 num_pages = cpu_to_le32(vb-num_pages);
+   vb-vdev-config-set(vb-vdev,
+ offsetof(struct virtio_balloon_config, num_pages),
+ num_pages, sizeof(num_pages));
+}
+
 static int balloon(void *_vballoon)
 {
struct virtio_balloon *vb = _vballoon;
@@ -267,9 +293,14 @@ static int balloon(void *_vballoon)
 || freezing(current));
if (vb-need_stats_update)
stats_handle_request(vb);
-   if (diff  0)
-   fill_balloon(vb, diff);
-   else if (diff  0)
+   if (diff  0) {
+   int oom;
+   oom_killer_disable();
+   oom = fill_balloon(vb, diff);
+   oom_killer_enable();
+   if (oom)
+   update_balloon_target(vb);
+   } else if (diff  0)
leak_balloon(vb, -diff);
update_balloon_size(vb);
}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/18] KVM test: Add basic file transfer test

2010-09-28 Thread Michael S. Tsirkin
On Mon, Sep 27, 2010 at 06:43:54PM -0400, Lucas Meneghel Rodrigues wrote:
 From: Amos Kong ak...@redhat.com
 
 This test is the basic test of transfering file between host and guest.
 Try to transfer a large file from host to guest, and transfer it back
 to host, then compare the files by calculate their md5 hash.
 
 The default file size is 4000M, scp timeout is 1000s. It means if the
 average speed is less than 4M/s, this test will be fail.
 We can extend
 this test by using another disk later, then we can transfer larger
 files without the limit of first disk size.
 
 Changes from v1:
 - Use md5 to verify the integrity of files
 - Try to use autotest API, such as, utils.system()
 
 Signed-off-by: Amos Kong ak...@redhat.com

Why scp_timeout? Not transfer_timeout?
Is this really only scp file transfer to/from linux guest?
Need to either name it so or generalize. Other things that
need testing are NFS for linux guest, scp from windows, samba
for linux and windows guests.

 ---
  client/tests/kvm/tests/file_transfer.py |   58 
 +++
  client/tests/kvm/tests_base.cfg.sample  |7 +++-
  2 files changed, 64 insertions(+), 1 deletions(-)
  create mode 100644 client/tests/kvm/tests/file_transfer.py
 
 diff --git a/client/tests/kvm/tests/file_transfer.py 
 b/client/tests/kvm/tests/file_transfer.py
 new file mode 100644
 index 000..a0c6cff
 --- /dev/null
 +++ b/client/tests/kvm/tests/file_transfer.py
 @@ -0,0 +1,58 @@
 +import logging, commands, re
 +from autotest_lib.client.common_lib import error
 +from autotest_lib.client.bin import utils
 +import kvm_utils, kvm_test_utils
 +
 +def run_file_transfer(test, params, env):
 +
 +Test ethrnet device function by ethtool
 +
 +1) Boot up a VM.
 +2) Create a large file by dd on host.
 +3) Copy this file from host to guest.
 +4) Copy this file from guest to host.
 +5) Check if file transfers ended good.
 +
 +@param test: KVM test object.
 +@param params: Dictionary with the test parameters.
 +@param env: Dictionary with test environment.
 +
 +vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
 +timeout=int(params.get(login_timeout, 360))
 +
 +logging.info(Trying to log into guest '%s' by serial, vm.name)
 +session = kvm_utils.wait_for(lambda: vm.serial_login(),
 + timeout, 0, step=2)
 +if not session:
 +raise error.TestFail(Could not log into guest '%s' % vm.name)
 +
 +dir = test.tmpdir
 +scp_timeout = int(params.get(scp_timeout))
 +cmd = dd if=/dev/urandom of=%s/a.out bs=1M count=%d %  (dir, int(
 + params.get(filesize, 4000)))
 +
 +try:
 +logging.info(Create file by dd command on host, cmd: %s % cmd)
 +utils.run(cmd)
 +
 +logging.info(Transfer file from host to guest)
 +if not vm.copy_files_to(%s/a.out % dir, /tmp/b.out,
 +timeout=scp_timeout):

/tmp/b.out won't work on windows guest, will it?
maybe let guest select dest dir?

 +raise error.TestFail(Fail to transfer file from host to guest)
 +
 +logging.info(Transfer file from guest to host)
 +if not vm.copy_files_from(/tmp/b.out, %s/c.out % dir,
 +  timeout=scp_timeout):
 +raise error.TestFail(Fail to transfer file from guest to host)
 +
 +logging.debug(commands.getoutput(ls -l %s/[ac].out % dir))
 +md5_orig = utils.hash_file(%s/a.out % dir, method=md5)
 +md5_new = utils.hash_file(%s/c.out % dir, method=md5)
 +
 +if md5_orig != md5_new:
 +raise error.TestFail(File changed after transfer)
 +
 +finally:
 +session.get_command_status(rm -f /tmp/b.out)
 +utils.run(rm -f %s/[ac].out % dir)
 +session.close()
 diff --git a/client/tests/kvm/tests_base.cfg.sample 
 b/client/tests/kvm/tests_base.cfg.sample
 index cf1deaf..bc014c8 100644
 --- a/client/tests/kvm/tests_base.cfg.sample
 +++ b/client/tests/kvm/tests_base.cfg.sample
 @@ -476,6 +476,11 @@ variants:
  - jumbo: install setup unattended_install.cdrom
  type = jumbo
  
 +- file_transfer: install setup unattended_install.cdrom
 +type = file_transfer
 +filesize = 4000
 +scp_timeout = 1000
 +
  - physical_resources_check: install setup unattended_install.cdrom
  type = physical_resources_check
  catch_uuid_cmd = dmidecode | awk -F: '/UUID/ {print $2}'
 @@ -1250,7 +1255,7 @@ variants:
  
  # Windows section
  - @Windows:
 -no autotest linux_s3 vlan_tag ioquit 
 unattended_install.(url|nfs|remote_ks) jumbo
 +no autotest linux_s3 vlan_tag ioquit 
 unattended_install.(url|nfs|remote_ks) jumbo file_transfer
  shutdown_command = shutdown /s /f /t 0
  reboot_command = shutdown /r /f /t 0
  status_test_command = echo %errorlevel%
 -- 
 1.7.1
 
 --
 To unsubscribe from this 

Re: [RFC] virtio_balloon: disable oom killer when fill balloon

2010-09-28 Thread Anthony Liguori

On 09/28/2010 08:19 AM, Dave Young wrote:

Balloon could cause guest memory oom killing and panic. If we disable the oom 
killer it will be better at least avoid guest panic.

If alloc failed we can just adjust the balloon target to be equal to current number 
by call vdev-config-set

But during test I found the config-set num_pages does not change the config 
actually, Should I do hacks in userspace as well? If so where should I start to 
hack?
   


The guest is not supposed to set the target field in it's config.  This 
is a host read/write, guest read-only field.


The problem with your approach generally speaking is that it's unclear 
whether this is the right policy.  For instance, what if another 
application held a very large allocation which caused the fill request 
to stop but then shortly afterwards, the aforementioned application 
released that allocation.  If instead of just stopping, we paused and 
tried again later, we could potentially succeed.


I think a better approach might be a graceful back off.  For instance, 
when you hit this condition, deflate the balloon by 10% based on the 
assumption that you probably already gone too far.  Before you attempt 
to allocate to the target again, set a timeout that increases in 
duration exponentially until you reach some maximum (say 10s).


This isn't the only approach, but hopefully it conveys the idea of 
gracefully backing off without really giving up.


Regards,

Anthony Liguori


Thanks

--- linux-2.6.orig/drivers/virtio/virtio_balloon.c  2010-09-25 
20:58:14.19001 +0800
+++ linux-2.6/drivers/virtio/virtio_balloon.c   2010-09-28 21:05:42.20675 
+0800
@@ -25,6 +25,7 @@
  #includelinux/freezer.h
  #includelinux/delay.h
  #includelinux/slab.h
+#includelinux/oom.h

  struct virtio_balloon
  {
@@ -97,8 +98,22 @@ static void tell_host(struct virtio_ball
wait_for_completion(vb-acked);
  }

-static void fill_balloon(struct virtio_balloon *vb, size_t num)
+static int cblimit(int times)
  {
+   static int t;
+
+   if (t  times)
+   t++;
+   else
+   t = 0;
+
+   return !t;
+}
+
+static int fill_balloon(struct virtio_balloon *vb, size_t num)
+{
+   int ret = 0;
+
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb-pfns));

@@ -106,10 +121,13 @@ static void fill_balloon(struct virtio_b
struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY |
__GFP_NOMEMALLOC | __GFP_NOWARN);
if (!page) {
-   if (printk_ratelimit())
+   if (cblimit(5)) {
dev_printk(KERN_INFO,vb-vdev-dev,
   Out of puff! Can't get %zu pages\n,
   num);
+   ret = -ENOMEM;
+   goto out;
+   }
/* Sleep for at least 1/5 of a second before retry. */
msleep(200);
break;
@@ -120,11 +138,11 @@ static void fill_balloon(struct virtio_b
list_add(page-lru,vb-pages);
}

-   /* Didn't get any?  Oh well. */
-   if (vb-num_pfns == 0)
-   return;
+out:
+   if (vb-num_pfns)
+   tell_host(vb, vb-inflate_vq);

-   tell_host(vb, vb-inflate_vq);
+   return ret;
  }

  static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -251,6 +269,14 @@ static void update_balloon_size(struct v
actual, sizeof(actual));
  }

+static void update_balloon_target(struct virtio_balloon *vb)
+{
+   __le32 num_pages = cpu_to_le32(vb-num_pages);
+   vb-vdev-config-set(vb-vdev,
+ offsetof(struct virtio_balloon_config, num_pages),
+   num_pages, sizeof(num_pages));
+}
+
  static int balloon(void *_vballoon)
  {
struct virtio_balloon *vb = _vballoon;
@@ -267,9 +293,14 @@ static int balloon(void *_vballoon)
 || freezing(current));
if (vb-need_stats_update)
stats_handle_request(vb);
-   if (diff  0)
-   fill_balloon(vb, diff);
-   else if (diff  0)
+   if (diff  0) {
+   int oom;
+   oom_killer_disable();
+   oom = fill_balloon(vb, diff);
+   oom_killer_enable();
+   if (oom)
+   update_balloon_target(vb);
+   } else if (diff  0)
leak_balloon(vb, -diff);
update_balloon_size(vb);
}
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To 

Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices

2010-09-28 Thread Michael S. Tsirkin
On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
 This patch enable per-vector mask for assigned devices using MSI-X.
 
 Signed-off-by: Sheng Yang sh...@linux.intel.com
 ---
  arch/x86/kvm/x86.c   |1 +
  include/linux/kvm.h  |9 -
  include/linux/kvm_host.h |1 +
  virt/kvm/assigned-dev.c  |   39 +++
  4 files changed, 49 insertions(+), 1 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 8412c91..e6933e6 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
   case KVM_CAP_DEBUGREGS:
   case KVM_CAP_X86_ROBUST_SINGLESTEP:
   case KVM_CAP_XSAVE:
 + case KVM_CAP_DEVICE_MSIX_MASK:
   r = 1;
   break;
   case KVM_CAP_COALESCED_MMIO:
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index 919ae53..f2b7cdc 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
  #endif
  #define KVM_CAP_PPC_GET_PVINFO 57
  #define KVM_CAP_PPC_IRQ_LEVEL 58
 +#ifdef __KVM_HAVE_MSIX
 +#define KVM_CAP_DEVICE_MSIX_MASK 59
 +#endif
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
  };
  
  #define KVM_MAX_MSIX_PER_DEV 256
 +
 +#define KVM_MSIX_FLAG_MASK   1
 +
  struct kvm_assigned_msix_entry {
   __u32 assigned_dev_id;
   __u32 gsi;
   __u16 entry; /* The index of entry in the MSI-X table */
 - __u16 padding[3];
 + __u16 flags;
 + __u16 padding[2];
  };
  
  #endif /* __LINUX_KVM_H */
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index 0b89d00..a43405c 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
  };
  
  #define KVM_ASSIGNED_MSIX_PENDING0x1
 +#define KVM_ASSIGNED_MSIX_MASK   0x2
  struct kvm_guest_msix_entry {
   u32 vector;
   u16 entry;
 diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
 index 7c98928..15b8c32 100644
 --- a/virt/kvm/assigned-dev.c
 +++ b/virt/kvm/assigned-dev.c
 @@ -17,6 +17,8 @@
  #include linux/pci.h
  #include linux/interrupt.h
  #include linux/slab.h
 +#include linux/irqnr.h
 +
  #include irq.h
  
  static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct 
 list_head *head,
 @@ -666,6 +668,30 @@ msix_nr_out:
   return r;
  }
  
 +static void update_msix_mask(struct kvm_assigned_dev_kernel *assigned_dev,
 +  int index)
 +{
 + int irq;
 +
 + if (!assigned_dev-dev-msix_enabled ||
 + !(assigned_dev-irq_requested_type  KVM_DEV_IRQ_HOST_MSIX))
 + return;
 +
 + irq = assigned_dev-host_msix_entries[index].vector;
 +
 + ASSERT(irq != 0);
 +
 + if (assigned_dev-guest_msix_entries[index].flags 
 + KVM_ASSIGNED_MSIX_MASK)
 + disable_irq(irq);
 + else {
 + enable_irq(irq);
 + if (assigned_dev-guest_msix_entries[index].flags 
 + KVM_ASSIGNED_MSIX_PENDING)
 + schedule_work(assigned_dev-interrupt_work);
 + }

Hmm, won't this lose interrupts which were sent while bit was pending?
It is also pretty heavy if as you say guests touch the mask a lot.
I think we must keep the interrupt disabled, just set a bit
and delay interrupt injection until vector is unmasked
or deleted. The interface to do this will need more thought:
e.g. how can userspace clear this bit then?


 +}
 +
  static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
  struct kvm_assigned_msix_entry *entry)
  {
 @@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
   adev-guest_msix_entries[i].entry = entry-entry;
   adev-guest_msix_entries[i].vector = entry-gsi;
   adev-host_msix_entries[i].entry = entry-entry;
 + if ((entry-flags  KVM_MSIX_FLAG_MASK) 
 + !(adev-guest_msix_entries[i].flags 
 + KVM_ASSIGNED_MSIX_MASK)) {
 + adev-guest_msix_entries[i].flags |=
 + KVM_ASSIGNED_MSIX_MASK;
 + update_msix_mask(adev, i);
 + } else if (!(entry-flags  KVM_MSIX_FLAG_MASK) 
 + (adev-guest_msix_entries[i].flags 
 + KVM_ASSIGNED_MSIX_MASK)) {
 + adev-guest_msix_entries[i].flags =
 + ~KVM_ASSIGNED_MSIX_MASK;
 + update_msix_mask(adev, i);
 + }
   break;
   }
   if (i == adev-entries_nr) {
 -- 
 1.7.0.1
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 

Re: [RFC] virtio_balloon: disable oom killer when fill balloon

2010-09-28 Thread Dave Young
On Tue, Sep 28, 2010 at 9:34 PM, Anthony Liguori anth...@codemonkey.ws wrote:
 On 09/28/2010 08:19 AM, Dave Young wrote:

 Balloon could cause guest memory oom killing and panic. If we disable the
 oom killer it will be better at least avoid guest panic.

 If alloc failed we can just adjust the balloon target to be equal to
 current number by call vdev-config-set

 But during test I found the config-set num_pages does not change the
 config actually, Should I do hacks in userspace as well? If so where should
 I start to hack?



Hi,

Thanks your comments.

 The guest is not supposed to set the target field in it's config.  This is a
 host read/write, guest read-only field.

Could you tell where to set it? If so, IMHO set config api should
fail, isn't it?


 The problem with your approach generally speaking is that it's unclear
 whether this is the right policy.  For instance, what if another application
 held a very large allocation which caused the fill request to stop but then
 shortly afterwards, the aforementioned application released that allocation.
  If instead of just stopping, we paused and tried again later, we could
 potentially succeed.

Yes, it is possible. But maybe better to do balloon from qemu monitor later?


 I think a better approach might be a graceful back off.  For instance, when
 you hit this condition, deflate the balloon by 10% based on the assumption
 that you probably already gone too far.  Before you attempt to allocate to
 the target again, set a timeout that increases in duration exponentially
 until you reach some maximum (say 10s).

I'm afraid most times it will keep doing inflate/deflate circularly.


 This isn't the only approach, but hopefully it conveys the idea of
 gracefully backing off without really giving up.

 Regards,

 Anthony Liguori

 Thanks

 --- linux-2.6.orig/drivers/virtio/virtio_balloon.c      2010-09-25
 20:58:14.19001 +0800
 +++ linux-2.6/drivers/virtio/virtio_balloon.c   2010-09-28
 21:05:42.20675 +0800
 @@ -25,6 +25,7 @@
  #includelinux/freezer.h
  #includelinux/delay.h
  #includelinux/slab.h
 +#includelinux/oom.h

  struct virtio_balloon
  {
 @@ -97,8 +98,22 @@ static void tell_host(struct virtio_ball
        wait_for_completion(vb-acked);
  }

 -static void fill_balloon(struct virtio_balloon *vb, size_t num)
 +static int cblimit(int times)
  {
 +       static int t;
 +
 +       if (t  times)
 +               t++;
 +       else
 +               t = 0;
 +
 +       return !t;
 +}
 +
 +static int fill_balloon(struct virtio_balloon *vb, size_t num)
 +{
 +       int ret = 0;
 +
        /* We can only do one array worth at a time. */
        num = min(num, ARRAY_SIZE(vb-pfns));

 @@ -106,10 +121,13 @@ static void fill_balloon(struct virtio_b
                struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY
 |
                                        __GFP_NOMEMALLOC | __GFP_NOWARN);
                if (!page) {
 -                       if (printk_ratelimit())
 +                       if (cblimit(5)) {
                                dev_printk(KERN_INFO,vb-vdev-dev,
                                           Out of puff! Can't get %zu
 pages\n,
                                           num);
 +                               ret = -ENOMEM;
 +                               goto out;
 +                       }
                        /* Sleep for at least 1/5 of a second before retry.
 */
                        msleep(200);
                        break;
 @@ -120,11 +138,11 @@ static void fill_balloon(struct virtio_b
                list_add(page-lru,vb-pages);
        }

 -       /* Didn't get any?  Oh well. */
 -       if (vb-num_pfns == 0)
 -               return;
 +out:
 +       if (vb-num_pfns)
 +               tell_host(vb, vb-inflate_vq);

 -       tell_host(vb, vb-inflate_vq);
 +       return ret;
  }

  static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
 @@ -251,6 +269,14 @@ static void update_balloon_size(struct v
                        actual, sizeof(actual));
  }

 +static void update_balloon_target(struct virtio_balloon *vb)
 +{
 +       __le32 num_pages = cpu_to_le32(vb-num_pages);
 +       vb-vdev-config-set(vb-vdev,
 +                             offsetof(struct virtio_balloon_config,
 num_pages),
 +                       num_pages, sizeof(num_pages));
 +}
 +
  static int balloon(void *_vballoon)
  {
        struct virtio_balloon *vb = _vballoon;
 @@ -267,9 +293,14 @@ static int balloon(void *_vballoon)
                                         || freezing(current));
                if (vb-need_stats_update)
                        stats_handle_request(vb);
 -               if (diff  0)
 -                       fill_balloon(vb, diff);
 -               else if (diff  0)
 +               if (diff  0) {
 +                       int oom;
 +                       oom_killer_disable();
 +                       oom = fill_balloon(vb, diff);
 +                       oom_killer_enable();
 +                       if 

Re: [RFC] virtio_balloon: disable oom killer when fill balloon

2010-09-28 Thread Dave Young
On Tue, Sep 28, 2010 at 9:49 PM, Dave Young hidave.darks...@gmail.com wrote:
 On Tue, Sep 28, 2010 at 9:34 PM, Anthony Liguori anth...@codemonkey.ws 
 wrote:
 On 09/28/2010 08:19 AM, Dave Young wrote:

 Balloon could cause guest memory oom killing and panic. If we disable the
 oom killer it will be better at least avoid guest panic.

 If alloc failed we can just adjust the balloon target to be equal to
 current number by call vdev-config-set

 But during test I found the config-set num_pages does not change the
 config actually, Should I do hacks in userspace as well? If so where should
 I start to hack?



 Hi,

 Thanks your comments.

 The guest is not supposed to set the target field in it's config.  This is a
 host read/write, guest read-only field.

 Could you tell where to set it? If so, IMHO set config api should
 fail, isn't it?

Get it, in hw/virtio-balloon.c function virtio_balloon_set_config



 The problem with your approach generally speaking is that it's unclear
 whether this is the right policy.  For instance, what if another application
 held a very large allocation which caused the fill request to stop but then
 shortly afterwards, the aforementioned application released that allocation.
  If instead of just stopping, we paused and tried again later, we could
 potentially succeed.

 Yes, it is possible. But maybe better to do balloon from qemu monitor later?


 I think a better approach might be a graceful back off.  For instance, when
 you hit this condition, deflate the balloon by 10% based on the assumption
 that you probably already gone too far.  Before you attempt to allocate to
 the target again, set a timeout that increases in duration exponentially
 until you reach some maximum (say 10s).

 I'm afraid most times it will keep doing inflate/deflate circularly.


 This isn't the only approach, but hopefully it conveys the idea of
 gracefully backing off without really giving up.

 Regards,

 Anthony Liguori

 Thanks

 --- linux-2.6.orig/drivers/virtio/virtio_balloon.c      2010-09-25
 20:58:14.19001 +0800
 +++ linux-2.6/drivers/virtio/virtio_balloon.c   2010-09-28
 21:05:42.20675 +0800
 @@ -25,6 +25,7 @@
  #includelinux/freezer.h
  #includelinux/delay.h
  #includelinux/slab.h
 +#includelinux/oom.h

  struct virtio_balloon
  {
 @@ -97,8 +98,22 @@ static void tell_host(struct virtio_ball
        wait_for_completion(vb-acked);
  }

 -static void fill_balloon(struct virtio_balloon *vb, size_t num)
 +static int cblimit(int times)
  {
 +       static int t;
 +
 +       if (t  times)
 +               t++;
 +       else
 +               t = 0;
 +
 +       return !t;
 +}
 +
 +static int fill_balloon(struct virtio_balloon *vb, size_t num)
 +{
 +       int ret = 0;
 +
        /* We can only do one array worth at a time. */
        num = min(num, ARRAY_SIZE(vb-pfns));

 @@ -106,10 +121,13 @@ static void fill_balloon(struct virtio_b
                struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY
 |
                                        __GFP_NOMEMALLOC | __GFP_NOWARN);
                if (!page) {
 -                       if (printk_ratelimit())
 +                       if (cblimit(5)) {
                                dev_printk(KERN_INFO,vb-vdev-dev,
                                           Out of puff! Can't get %zu
 pages\n,
                                           num);
 +                               ret = -ENOMEM;
 +                               goto out;
 +                       }
                        /* Sleep for at least 1/5 of a second before retry.
 */
                        msleep(200);
                        break;
 @@ -120,11 +138,11 @@ static void fill_balloon(struct virtio_b
                list_add(page-lru,vb-pages);
        }

 -       /* Didn't get any?  Oh well. */
 -       if (vb-num_pfns == 0)
 -               return;
 +out:
 +       if (vb-num_pfns)
 +               tell_host(vb, vb-inflate_vq);

 -       tell_host(vb, vb-inflate_vq);
 +       return ret;
  }

  static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
 @@ -251,6 +269,14 @@ static void update_balloon_size(struct v
                        actual, sizeof(actual));
  }

 +static void update_balloon_target(struct virtio_balloon *vb)
 +{
 +       __le32 num_pages = cpu_to_le32(vb-num_pages);
 +       vb-vdev-config-set(vb-vdev,
 +                             offsetof(struct virtio_balloon_config,
 num_pages),
 +                       num_pages, sizeof(num_pages));
 +}
 +
  static int balloon(void *_vballoon)
  {
        struct virtio_balloon *vb = _vballoon;
 @@ -267,9 +293,14 @@ static int balloon(void *_vballoon)
                                         || freezing(current));
                if (vb-need_stats_update)
                        stats_handle_request(vb);
 -               if (diff  0)
 -                       fill_balloon(vb, diff);
 -               else if (diff  0)
 +               if (diff  0) {
 +                       int oom;
 +                    

Re: KVM call agenda for Sept 28

2010-09-28 Thread Avi Kivity

 On 09/28/2010 12:14 AM, Chris Wright wrote:

Please send in any agenda items you are interested in covering.



no agenda - no call.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] virtio_balloon: disable oom killer when fill balloon

2010-09-28 Thread Anthony Liguori

On 09/28/2010 08:49 AM, Dave Young wrote:

On Tue, Sep 28, 2010 at 9:34 PM, Anthony Liguorianth...@codemonkey.ws  wrote:
   

On 09/28/2010 08:19 AM, Dave Young wrote:
 

Balloon could cause guest memory oom killing and panic. If we disable the
oom killer it will be better at least avoid guest panic.

If alloc failed we can just adjust the balloon target to be equal to
current number by call vdev-config-set

But during test I found the config-set num_pages does not change the
config actually, Should I do hacks in userspace as well? If so where should
I start to hack?

   
 

Hi,

Thanks your comments.

   

The guest is not supposed to set the target field in it's config.  This is a
host read/write, guest read-only field.
 

Could you tell where to set it? If so, IMHO set config api should
fail, isn't it?

   

The problem with your approach generally speaking is that it's unclear
whether this is the right policy.  For instance, what if another application
held a very large allocation which caused the fill request to stop but then
shortly afterwards, the aforementioned application released that allocation.
  If instead of just stopping, we paused and tried again later, we could
potentially succeed.
 

Yes, it is possible. But maybe better to do balloon from qemu monitor later?
   


It's part of the specification, not something that's enforced or even 
visible within the APIs.



I think a better approach might be a graceful back off.  For instance, when
you hit this condition, deflate the balloon by 10% based on the assumption
that you probably already gone too far.  Before you attempt to allocate to
the target again, set a timeout that increases in duration exponentially
until you reach some maximum (say 10s).
 

I'm afraid most times it will keep doing inflate/deflate circularly.
   


With sufficiently large timeouts, does it matter?

The other side of the argument is that the host should be more careful 
about doing balloon requests to the guest.  Alternatively, you can argue 
that the guest should be able to balloon itself and that's where the 
logic should be.


But I think split policy across the guest and host would prove to be 
prohibitively complex to deal with.


Regards,

Anthony Liguori

   

This isn't the only approach, but hopefully it conveys the idea of
gracefully backing off without really giving up.

Regards,

Anthony Liguori

 

Thanks

--- linux-2.6.orig/drivers/virtio/virtio_balloon.c  2010-09-25
20:58:14.19001 +0800
+++ linux-2.6/drivers/virtio/virtio_balloon.c   2010-09-28
21:05:42.20675 +0800
@@ -25,6 +25,7 @@
  #includelinux/freezer.h
  #includelinux/delay.h
  #includelinux/slab.h
+#includelinux/oom.h

  struct virtio_balloon
  {
@@ -97,8 +98,22 @@ static void tell_host(struct virtio_ball
wait_for_completion(vb-acked);
  }

-static void fill_balloon(struct virtio_balloon *vb, size_t num)
+static int cblimit(int times)
  {
+   static int t;
+
+   if (ttimes)
+   t++;
+   else
+   t = 0;
+
+   return !t;
+}
+
+static int fill_balloon(struct virtio_balloon *vb, size_t num)
+{
+   int ret = 0;
+
/* We can only do one array worth at a time. */
num = min(num, ARRAY_SIZE(vb-pfns));

@@ -106,10 +121,13 @@ static void fill_balloon(struct virtio_b
struct page *page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY
|
__GFP_NOMEMALLOC | __GFP_NOWARN);
if (!page) {
-   if (printk_ratelimit())
+   if (cblimit(5)) {
dev_printk(KERN_INFO,vb-vdev-dev,
   Out of puff! Can't get %zu
pages\n,
   num);
+   ret = -ENOMEM;
+   goto out;
+   }
/* Sleep for at least 1/5 of a second before retry.
*/
msleep(200);
break;
@@ -120,11 +138,11 @@ static void fill_balloon(struct virtio_b
list_add(page-lru,vb-pages);
}

-   /* Didn't get any?  Oh well. */
-   if (vb-num_pfns == 0)
-   return;
+out:
+   if (vb-num_pfns)
+   tell_host(vb, vb-inflate_vq);

-   tell_host(vb, vb-inflate_vq);
+   return ret;
  }

  static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
@@ -251,6 +269,14 @@ static void update_balloon_size(struct v
actual, sizeof(actual));
  }

+static void update_balloon_target(struct virtio_balloon *vb)
+{
+   __le32 num_pages = cpu_to_le32(vb-num_pages);
+   vb-vdev-config-set(vb-vdev,
+ offsetof(struct virtio_balloon_config,
num_pages),
+num_pages, sizeof(num_pages));
+}
+
  static int balloon(void *_vballoon)
  {
struct virtio_balloon *vb = _vballoon;
@@ -267,9 +293,14 @@ static int 

Re: [PATCH 3/3] VFIO V4: VFIO driver: Non-privileged user level PCI drivers

2010-09-28 Thread Alex Williamson
On Tue, 2010-09-28 at 12:09 +0200, Michael S. Tsirkin wrote:
 On Mon, Sep 27, 2010 at 02:46:32PM -0600, Alex Williamson wrote:
  On Wed, 2010-09-22 at 14:18 -0700, Tom Lyon wrote:
   +ssize_t vfio_mem_readwrite(
   + int write,
   + struct vfio_dev *vdev,
   + char __user *buf,
   + size_t count,
   + loff_t *ppos)
   +{
   + struct pci_dev *pdev = vdev-pdev;
   + resource_size_t end;
   + void __iomem *io;
   + loff_t pos;
   + int pci_space;
   +
   + pci_space = vfio_offset_to_pci_space(*ppos);
   + pos = vfio_offset_to_pci_offset(*ppos);
   +
   + if (!pci_resource_start(pdev, pci_space))
   + return -EINVAL;
   + end = pci_resource_len(pdev, pci_space);
   + if (vdev-barmap[pci_space] == NULL)
   + vdev-barmap[pci_space] = pci_iomap(pdev, pci_space, 0);
   + io = vdev-barmap[pci_space];
   +
  
  So we do a pci_iomap, but never do corresponding pci_iounmap.  This also
  only works for the first 6 BARs since the ROM BAR needs pci_map_rom.
 
 
 An issue with ROM is that I think it can not be enabled together
 with BARs. This is why pci_read_rom/pci_write_rom do what
 they do.

I don't think that's an issue.  pci_read/write_rom doesn't do anything
about disabling BARs when the ROM is read.  The reason that it maps and
unmaps around the read is to be resource friendly.  For vfio, since
we've assigned a device to vfio and it's open, I think it's valid to
assign all the resources and sit on them.  Otherwise I don't think we
can guarantee that a read that worked last time will still get resources
and work this time.

   I
  wonder if we should be doing all the BAR mapping at open and unmap at
  close so that we can fail if the device can't get basic resources.
 
 I belive we should do this on ioctl so that e.g. hotunplug
 can reset the device clean it up. Unused device should also not
 consume resources.

If it's assigned to vfio and opened, then it's not unused.  A hotunplug
would involve a close of the vfio device, which could then reset the
device.  I think we probably are still missing a device reset, but I'm
not sure if it needs an ioctl or if a reset on open would be sufficient.

Alex

  I
  believe we should also be calling pci_request_regions in here somewhere.
  Perhaps something like this:
  
  diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
  index a18e39a..d3886d9 100644
  --- a/drivers/vfio/vfio_main.c
  +++ b/drivers/vfio/vfio_main.c
  @@ -85,6 +85,53 @@ static inline int overlap(int a1, int b1, int a2, int b2)
  return !(b2 = a1 || b1 = a2);
   }
   
  +static int vfio_setup_pci(struct vfio_dev *vdev)
  +{
  +   int ret, bar;
  +
  +   ret = pci_enable_device(vdev-pdev);
  +   if (ret)
  +   return ret;
  +   
  +   ret = pci_request_regions(vdev-pdev, VFIO);
  +   if (ret) {
  +   pci_disable_device(vdev-pdev);
  +   return ret;
  +   }
  +
  +   for (bar = PCI_STD_RESOURCES; bar = PCI_ROM_RESOURCE; bar++) {
  +   if (!pci_resource_len(vdev-pdev, bar))
  +   continue;
  +   if (bar != PCI_ROM_RESOURCE) {
  +   if (!pci_resource_start(vdev-pdev, bar))
  +   continue;
  +   vdev-barmap[bar] = pci_iomap(vdev-pdev, bar, 0);
  +   } else {
  +   size_t size;
  +   vdev-barmap[bar] = pci_map_rom(vdev-pdev, size);
  +   }
  +   }
  +   return ret;
  +}
  +
  +static void vfio_disable_pci(struct vfio_dev *vdev)
  +{
  +   int bar;
  +
  +   for (bar = PCI_STD_RESOURCES; bar = PCI_ROM_RESOURCE; bar++) {
  +   if (!vdev-barmap[bar])
  +   continue;
  +   if (bar != PCI_ROM_RESOURCE)
  +   pci_iounmap(vdev-pdev, vdev-barmap[bar]);
  +   else
  +   pci_unmap_rom(vdev-pdev, vdev-barmap[bar]);
  +   vdev-barmap[bar] = NULL;
  +   }
  +
  +   pci_release_regions(vdev-pdev);
  +   pci_disable_device(vdev-pdev);
  +}
  +
   static int vfio_open(struct inode *inode, struct file *filep)
   {
  struct vfio_dev *vdev;
  @@ -110,7 +157,7 @@ static int vfio_open(struct inode *inode, struct file 
  *filep)
  INIT_LIST_HEAD(listener-dm_list);
  filep-private_data = listener;
  if (vdev-listeners == 0)
  -   ret = pci_enable_device(vdev-pdev);
  +   ret = vfio_setup_pci(vdev);
  if (ret == 0)
  vdev-listeners++;
  mutex_unlock(vdev-lgate);
  @@ -151,7 +198,7 @@ static int vfio_release(struct inode *inode, struct 
  file *filep)
  vdev-vconfig = NULL;
  kfree(vdev-pci_config_map);
  vdev-pci_config_map = NULL;
  -   pci_disable_device(vdev-pdev);
  +   vfio_disable_pci(vdev);
  vfio_domain_unset(vdev);
  wake_up(vdev-dev_idle_q);
  }
  diff --git a/drivers/vfio/vfio_rdwr.c b/drivers/vfio/vfio_rdwr.c
  index 1fd50a6..7705b45 100644
  --- a/drivers/vfio/vfio_rdwr.c
  +++ 

KVM: VMX: Add AX to list of registers clobbered by guest switch

2010-09-28 Thread Jan Kiszka
By chance this caused no harm so far. We overwrite AX during switch
to/from guest context, so we must declare this.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 arch/x86/kvm/vmx.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1a5ecfd..e315930 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4013,7 +4013,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
 #endif
[cr2]i(offsetof(struct vcpu_vmx, vcpu.arch.cr2))
  : cc, memory
-   , Rbx, Rdi, Rsi
+   , Rax, Rbx, Rdi, Rsi
 #ifdef CONFIG_X86_64
, r8, r9, r10, r11, r12, r13, r14, r15
 #endif
-- 
1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 13/17] Add mp(mediate passthru) device.

2010-09-28 Thread Arnd Bergmann
On Tuesday 28 September 2010, Michael S. Tsirkin wrote:
   +   skb_reserve(skb, NET_IP_ALIGN);
   +   skb_put(skb, len);
   +
   +   if (skb_copy_datagram_from_iovec(skb, 0, iov, 0, len)) {
   +   kfree_skb(skb);
   +   return -EAGAIN;
   +   }
   +
   +   skb-protocol = eth_type_trans(skb, mp-dev);
  
  Why are you calling eth_type_trans() on transmit?
 
 So that GSO can work.  BTW macvtap does:
 
 skb_set_network_header(skb, ETH_HLEN);
 skb_reset_mac_header(skb);
 skb-protocol = eth_hdr(skb)-h_proto;
 
 and I think this is broken for vlans. Arnd?

Hmm, that code (besides set_network_header) was added by Sridhar
for GSO support. I believe I originally did eth_type_trans but
had to change it before that time because it broke something.
Unfortunately, my memory on that is not very good any more.

Can you be more specific what the problem is? Do you think
it breaks when a guest sends VLAN tagged frames or when macvtap
is connected to a VLAN interface that adds another tag (or
only the combination)?

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] VFIO V4: VFIO driver: Non-privileged user level PCI drivers

2010-09-28 Thread Michael S. Tsirkin
On Tue, Sep 28, 2010 at 08:26:21AM -0600, Alex Williamson wrote:
 On Tue, 2010-09-28 at 12:09 +0200, Michael S. Tsirkin wrote:
  On Mon, Sep 27, 2010 at 02:46:32PM -0600, Alex Williamson wrote:
   On Wed, 2010-09-22 at 14:18 -0700, Tom Lyon wrote:
+ssize_t vfio_mem_readwrite(
+   int write,
+   struct vfio_dev *vdev,
+   char __user *buf,
+   size_t count,
+   loff_t *ppos)
+{
+   struct pci_dev *pdev = vdev-pdev;
+   resource_size_t end;
+   void __iomem *io;
+   loff_t pos;
+   int pci_space;
+
+   pci_space = vfio_offset_to_pci_space(*ppos);
+   pos = vfio_offset_to_pci_offset(*ppos);
+
+   if (!pci_resource_start(pdev, pci_space))
+   return -EINVAL;
+   end = pci_resource_len(pdev, pci_space);
+   if (vdev-barmap[pci_space] == NULL)
+   vdev-barmap[pci_space] = pci_iomap(pdev, pci_space, 0);
+   io = vdev-barmap[pci_space];
+
   
   So we do a pci_iomap, but never do corresponding pci_iounmap.  This also
   only works for the first 6 BARs since the ROM BAR needs pci_map_rom.
  
  
  An issue with ROM is that I think it can not be enabled together
  with BARs. This is why pci_read_rom/pci_write_rom do what
  they do.
 
 I don't think that's an issue.  pci_read/write_rom doesn't do anything
 about disabling BARs when the ROM is read.

Oh yes it does: it calls pci_map rom which calls pci_enable_rom.
And this enables ROM space decoder, which, PCI spec says,
can disable BAR decoder in device.

See it now?

  The reason that it maps and
 unmaps around the read is to be resource friendly.

Not only that. you can not use both ROM and BARs
at the same time: if you enable ROM you must not
access BARs.

  For vfio, since
 we've assigned a device to vfio and it's open, I think it's valid to
 assign all the resources and sit on them.  Otherwise I don't think we
 can guarantee that a read that worked last time will still get resources
 and work this time.

The issue is that BARs are not accessible while ROM is enabled.  It
might be a security hole to try to do that from an app, and I do know
there is hardware where it will not work, and PCI spec says as much:

6.2.5.2:
In order to minimize the number of address decoders needed, a device may
share a decoder between the Expansion ROM Base Address register and
other Base Address registers.47 When expansion ROM decode is enabled,
the decoder is used for accesses to the expansion ROM and device
independent software must not access the device through any other Base
Address registers.

Thus I think applications must shadow the ROM and kernel must prevent
ROM access after mapping BARs. We could shadow in kernel as well but this
would be expensive in memory as ROMs are sometimes quite large.

I
   wonder if we should be doing all the BAR mapping at open and unmap at
   close so that we can fail if the device can't get basic resources.
  
  I belive we should do this on ioctl so that e.g. hotunplug
  can reset the device clean it up. Unused device should also not
  consume resources.
 
 If it's assigned to vfio and opened, then it's not unused.  A hotunplug
 would involve a close of the vfio device, which could then reset the
 device.  I think we probably are still missing a device reset, but I'm
 not sure if it needs an ioctl or if a reset on open would be sufficient.
 
 Alex

You are only looking at qemu, but look at the whole stack:

First, I think we want reset on close, so the device is in sane
state when it's returned to the OS.

Second, we typically will want libvirt to open the device and pass it to qemu,
qemu might start using it much later, and it might keep it around,
unused, for as long as it wants. So we do want a way to reset without close.

Finally, we will want a way for libvirt to verify that qemu has
closed the device, so it's safe to pass it to another guest.

   I
   believe we should also be calling pci_request_regions in here somewhere.
   Perhaps something like this:
   
   diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
   index a18e39a..d3886d9 100644
   --- a/drivers/vfio/vfio_main.c
   +++ b/drivers/vfio/vfio_main.c
   @@ -85,6 +85,53 @@ static inline int overlap(int a1, int b1, int a2, int 
   b2)
 return !(b2 = a1 || b1 = a2);
}

   +static int vfio_setup_pci(struct vfio_dev *vdev)
   +{
   + int ret, bar;
   +
   + ret = pci_enable_device(vdev-pdev);
   + if (ret)
   + return ret;
   + 
   + ret = pci_request_regions(vdev-pdev, VFIO);
   + if (ret) {
   + pci_disable_device(vdev-pdev);
   + return ret;
   + }
   +
   + for (bar = PCI_STD_RESOURCES; bar = PCI_ROM_RESOURCE; bar++) {
   + if (!pci_resource_len(vdev-pdev, bar))
   + continue;
   + if (bar != PCI_ROM_RESOURCE) {
   + 

Re: [PATCH v11 13/17] Add mp(mediate passthru) device.

2010-09-28 Thread Michael S. Tsirkin
On Tue, Sep 28, 2010 at 04:39:59PM +0200, Arnd Bergmann wrote:
 On Tuesday 28 September 2010, Michael S. Tsirkin wrote:
+   skb_reserve(skb, NET_IP_ALIGN);
+   skb_put(skb, len);
+
+   if (skb_copy_datagram_from_iovec(skb, 0, iov, 0, len)) {
+   kfree_skb(skb);
+   return -EAGAIN;
+   }
+
+   skb-protocol = eth_type_trans(skb, mp-dev);
   
   Why are you calling eth_type_trans() on transmit?
  
  So that GSO can work.  BTW macvtap does:
  
  skb_set_network_header(skb, ETH_HLEN);
  skb_reset_mac_header(skb);
  skb-protocol = eth_hdr(skb)-h_proto;
  
  and I think this is broken for vlans. Arnd?
 
 Hmm, that code (besides set_network_header) was added by Sridhar
 for GSO support. I believe I originally did eth_type_trans but
 had to change it before that time because it broke something.
 Unfortunately, my memory on that is not very good any more.
 
 Can you be more specific what the problem is? Do you think
 it breaks when a guest sends VLAN tagged frames or when macvtap
 is connected to a VLAN interface that adds another tag (or
 only the combination)?
 
   Arnd

I expect the protocol value to be wrong when guest sends vlan tagged
frames as 802.1q frames have a different format.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 13/17] Add mp(mediate passthru) device.

2010-09-28 Thread Arnd Bergmann
On Tuesday 28 September 2010, Michael S. Tsirkin wrote:
 On Tue, Sep 28, 2010 at 04:39:59PM +0200, Arnd Bergmann wrote:
  Can you be more specific what the problem is? Do you think
  it breaks when a guest sends VLAN tagged frames or when macvtap
  is connected to a VLAN interface that adds another tag (or
  only the combination)?

 I expect the protocol value to be wrong when guest sends vlan tagged
 frames as 802.1q frames have a different format.

Ok, I see. Would that be fixed by using eth_type_trans()? I don't
see any code in there that tries to deal with the VLAN tag, so
do we have the same problem in the tun/tap driver?

Also, I wonder how we handle the case where both the guest and
the host do VLAN tagging. Does the host transparently override
the guest tag, or does it add a nested tag? More importantly,
what should it do?

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, nmi: workaround sti; hlt race vs nmi; intr

2010-09-28 Thread H. Peter Anvin
On 09/28/2010 01:50 AM, Avi Kivity wrote:
 
 Personally I think the safer route is to take the patch.  There are
 other processors besides Intel and AMD and we can't test all of them,
 not to mention various emulators and virtual machine monitors out there.
 

Speaking for the smoltering crater that used to be *Transmeta*, I'm
(from memory) quite certain they blocked NMI and that this was
intentional behavior.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86, nmi: workaround sti; hlt race vs nmi; intr

2010-09-28 Thread Avi Kivity

 On 09/28/2010 05:34 PM, H. Peter Anvin wrote:

On 09/28/2010 01:50 AM, Avi Kivity wrote:

  Personally I think the safer route is to take the patch.  There are
  other processors besides Intel and AMD and we can't test all of them,
  not to mention various emulators and virtual machine monitors out there.


Speaking for the smoltering crater that used to be *Transmeta*, I'm
(from memory) quite certain they blocked NMI and that this was
intentional behavior.



We'll need:

- Intel
- AMD
- Geode
- Via
- kvm (currently no, but plan to)
- qemu
- vmware
- others?

It should be relatively simple to write a small test case to test this.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] VFIO V4: VFIO driver: Non-privileged user level PCI drivers

2010-09-28 Thread Alex Williamson
On Tue, 2010-09-28 at 16:40 +0200, Michael S. Tsirkin wrote:
 On Tue, Sep 28, 2010 at 08:26:21AM -0600, Alex Williamson wrote:
  On Tue, 2010-09-28 at 12:09 +0200, Michael S. Tsirkin wrote:
   On Mon, Sep 27, 2010 at 02:46:32PM -0600, Alex Williamson wrote:
On Wed, 2010-09-22 at 14:18 -0700, Tom Lyon wrote:
 +ssize_t vfio_mem_readwrite(
 + int write,
 + struct vfio_dev *vdev,
 + char __user *buf,
 + size_t count,
 + loff_t *ppos)
 +{
 + struct pci_dev *pdev = vdev-pdev;
 + resource_size_t end;
 + void __iomem *io;
 + loff_t pos;
 + int pci_space;
 +
 + pci_space = vfio_offset_to_pci_space(*ppos);
 + pos = vfio_offset_to_pci_offset(*ppos);
 +
 + if (!pci_resource_start(pdev, pci_space))
 + return -EINVAL;
 + end = pci_resource_len(pdev, pci_space);
 + if (vdev-barmap[pci_space] == NULL)
 + vdev-barmap[pci_space] = pci_iomap(pdev, pci_space, 0);
 + io = vdev-barmap[pci_space];
 +

So we do a pci_iomap, but never do corresponding pci_iounmap.  This also
only works for the first 6 BARs since the ROM BAR needs pci_map_rom.
   
   
   An issue with ROM is that I think it can not be enabled together
   with BARs. This is why pci_read_rom/pci_write_rom do what
   they do.
  
  I don't think that's an issue.  pci_read/write_rom doesn't do anything
  about disabling BARs when the ROM is read.
 
 Oh yes it does: it calls pci_map rom which calls pci_enable_rom.
 And this enables ROM space decoder, which, PCI spec says,
 can disable BAR decoder in device.
 
 See it now?

Ok, I see the comment there now, thanks.

   The reason that it maps and
  unmaps around the read is to be resource friendly.
 
 Not only that. you can not use both ROM and BARs
 at the same time: if you enable ROM you must not
 access BARs.

So we either need to shadow the ROM if we want to support an mmap to it
or map/unmap around reads.

   For vfio, since
  we've assigned a device to vfio and it's open, I think it's valid to
  assign all the resources and sit on them.  Otherwise I don't think we
  can guarantee that a read that worked last time will still get resources
  and work this time.
 
 The issue is that BARs are not accessible while ROM is enabled.  It
 might be a security hole to try to do that from an app, and I do know
 there is hardware where it will not work, and PCI spec says as much:
 
 6.2.5.2:
   In order to minimize the number of address decoders needed, a device may
   share a decoder between the Expansion ROM Base Address register and
   other Base Address registers.47 When expansion ROM decode is enabled,
   the decoder is used for accesses to the expansion ROM and device
   independent software must not access the device through any other Base
   Address registers.
 
 Thus I think applications must shadow the ROM and kernel must prevent
 ROM access after mapping BARs. We could shadow in kernel as well but this
 would be expensive in memory as ROMs are sometimes quite large.

Shadowing in the VFIO kernel driver seems too resource intensive,
probably best to disable mmap for the ROM and only allow read write,
letting the app shadow if it wants.  Preventing BAR access is
troublesome since we support mmaps.  I don't like the idea of overly
restrictive ordering, but should we shutdown ROM access after the domain
is set(?)

 I
wonder if we should be doing all the BAR mapping at open and unmap at
close so that we can fail if the device can't get basic resources.
   
   I belive we should do this on ioctl so that e.g. hotunplug
   can reset the device clean it up. Unused device should also not
   consume resources.
  
  If it's assigned to vfio and opened, then it's not unused.  A hotunplug
  would involve a close of the vfio device, which could then reset the
  device.  I think we probably are still missing a device reset, but I'm
  not sure if it needs an ioctl or if a reset on open would be sufficient.
  
  Alex
 
 You are only looking at qemu, but look at the whole stack:
 
 First, I think we want reset on close, so the device is in sane
 state when it's returned to the OS.

Yep, I'd agree with that.

 Second, we typically will want libvirt to open the device and pass it to qemu,
 qemu might start using it much later, and it might keep it around,
 unused, for as long as it wants. So we do want a way to reset without close.

I'm fine with that, but we also need a reset on open to prevent leaking
host state to guest/userspace.

 Finally, we will want a way for libvirt to verify that qemu has
 closed the device, so it's safe to pass it to another guest.

This gets tricky since we'd really like a revoke to insure there isn't a
mmap still lingering.  What do you propose?

I
believe we should also be calling pci_request_regions in here somewhere.
Perhaps 

Re: [PATCH 3/3] VFIO V4: VFIO driver: Non-privileged user level PCI drivers

2010-09-28 Thread Michael S. Tsirkin
On Tue, Sep 28, 2010 at 11:10:42AM -0600, Alex Williamson wrote:
 On Tue, 2010-09-28 at 16:40 +0200, Michael S. Tsirkin wrote:
  On Tue, Sep 28, 2010 at 08:26:21AM -0600, Alex Williamson wrote:
   On Tue, 2010-09-28 at 12:09 +0200, Michael S. Tsirkin wrote:
On Mon, Sep 27, 2010 at 02:46:32PM -0600, Alex Williamson wrote:
 On Wed, 2010-09-22 at 14:18 -0700, Tom Lyon wrote:
  +ssize_t vfio_mem_readwrite(
  +   int write,
  +   struct vfio_dev *vdev,
  +   char __user *buf,
  +   size_t count,
  +   loff_t *ppos)
  +{
  +   struct pci_dev *pdev = vdev-pdev;
  +   resource_size_t end;
  +   void __iomem *io;
  +   loff_t pos;
  +   int pci_space;
  +
  +   pci_space = vfio_offset_to_pci_space(*ppos);
  +   pos = vfio_offset_to_pci_offset(*ppos);
  +
  +   if (!pci_resource_start(pdev, pci_space))
  +   return -EINVAL;
  +   end = pci_resource_len(pdev, pci_space);
  +   if (vdev-barmap[pci_space] == NULL)
  +   vdev-barmap[pci_space] = pci_iomap(pdev, pci_space, 0);
  +   io = vdev-barmap[pci_space];
  +
 
 So we do a pci_iomap, but never do corresponding pci_iounmap.  This 
 also
 only works for the first 6 BARs since the ROM BAR needs pci_map_rom.


An issue with ROM is that I think it can not be enabled together
with BARs. This is why pci_read_rom/pci_write_rom do what
they do.
   
   I don't think that's an issue.  pci_read/write_rom doesn't do anything
   about disabling BARs when the ROM is read.
  
  Oh yes it does: it calls pci_map rom which calls pci_enable_rom.
  And this enables ROM space decoder, which, PCI spec says,
  can disable BAR decoder in device.
  
  See it now?
 
 Ok, I see the comment there now, thanks.
 
The reason that it maps and
   unmaps around the read is to be resource friendly.
  
  Not only that. you can not use both ROM and BARs
  at the same time: if you enable ROM you must not
  access BARs.
 
 So we either need to shadow the ROM if we want to support an mmap to it
 or map/unmap around reads.
For vfio, since
   we've assigned a device to vfio and it's open, I think it's valid to
   assign all the resources and sit on them.  Otherwise I don't think we
   can guarantee that a read that worked last time will still get resources
   and work this time.
  
  The issue is that BARs are not accessible while ROM is enabled.  It
  might be a security hole to try to do that from an app, and I do know
  there is hardware where it will not work, and PCI spec says as much:
  
  6.2.5.2:
  In order to minimize the number of address decoders needed, a device may
  share a decoder between the Expansion ROM Base Address register and
  other Base Address registers.47 When expansion ROM decode is enabled,
  the decoder is used for accesses to the expansion ROM and device
  independent software must not access the device through any other Base
  Address registers.
  
  Thus I think applications must shadow the ROM and kernel must prevent
  ROM access after mapping BARs. We could shadow in kernel as well but this
  would be expensive in memory as ROMs are sometimes quite large.
 
 Shadowing in the VFIO kernel driver seems too resource intensive,
 probably best to disable mmap for the ROM and only allow read write,
 letting the app shadow if it wants.  Preventing BAR access is
 troublesome since we support mmaps.  I don't like the idea of overly
 restrictive ordering, but should we shutdown ROM access after the domain
 is set(?)

We probably must disable it on BAR mmap.  Further existing sysfs
lacks locking so I think it's racy wrt multiple readers, etc.


  I
 wonder if we should be doing all the BAR mapping at open and unmap at
 close so that we can fail if the device can't get basic resources.

I belive we should do this on ioctl so that e.g. hotunplug
can reset the device clean it up. Unused device should also not
consume resources.
   
   If it's assigned to vfio and opened, then it's not unused.  A hotunplug
   would involve a close of the vfio device, which could then reset the
   device.  I think we probably are still missing a device reset, but I'm
   not sure if it needs an ioctl or if a reset on open would be sufficient.
   
   Alex
  
  You are only looking at qemu, but look at the whole stack:
  
  First, I think we want reset on close, so the device is in sane
  state when it's returned to the OS.
 
 Yep, I'd agree with that.
 
  Second, we typically will want libvirt to open the device and pass it to 
  qemu,
  qemu might start using it much later, and it might keep it around,
  unused, for as long as it wants. So we do want a way to reset without close.
 
 I'm fine with that, but we also need a reset on open to prevent leaking
 host state to guest/userspace.

Can't hurt ...

  Finally, we will want a way for libvirt to verify that 

Support for nested vmx

2010-09-28 Thread Vandeir Eduardo
Hi,

I work at an university computer lab and would like
to create linux vitual machines using kvm on a big
linux intel-vt server we have.

Each linux vitual machine would be assigned to a group of
students and they will use those vms to study and create another
linux vitual machines using kvm inside of it.

Searching the list archive, I saw nested kvm vm is possible
on AMD processor but not yet on Intel ones.

Is there some place where I could get patches/instructions
to test nested kvm vm on Intel-vt processors?

Thanks.

__
Vandeir Eduardo
Laboratório de Computação e Informática (LCI) - Campus III
Fundacao Universidade Regional de Blumenau (FURB)
Blumenau, SC, Brasil.

Re: pci passthrough with KVM

2010-09-28 Thread Alex Williamson
On Tue, Sep 28, 2010 at 2:27 AM, Inigo Losada ilos...@ibex.es wrote:

 We are using pci passthrough with an SCSI Adapter card. The system is:

 -  O.S:  Ubuntu 10.04.1 LTS
 -  KVM Packages:

 kvm    1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9.2
 kvm-pxe   5.4.4-1ubuntu1
 qemu-kvm     0.12.3+noroms-0ubuntu9.2
 libvirt-bin      0.7.5-5ubuntu27.2
 python-libvirt    0.7.5-5ubuntu27.2
 libvirt0        0.7.5-5ubuntu27.2

 - Kernel  2.6.32.15+drm33.5.iommu    recompiled with following options :

 CONFIG_DMAR=y
 CONFIG_INTR_REMAP=y

 - Apparmor is stopped

 When we started the virtual machine we obtain the following error:

 char device redirected to /dev/pts/3
 device: 04:04.0: driver=pci-assign host=04:04.0
 Failed to assign irq for 04:04.0: Operation not permitted
 Perhaps you are assigning a device that shares an IRQ with another device?

Does the device share an IRQ with another device?  If not, does it
work if you launch qemu manually with sudo?   It might be that the
kernel and tool set aren't new enough to fully support running a
de-privileged guest with device assignment.  Thanks,

Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to debug unhandled vm exit: 0x11?

2010-09-28 Thread Neo Jia
On Tue, Jul 27, 2010 at 3:04 AM, Avi Kivity a...@redhat.com wrote:
  On 07/26/2010 08:58 PM, ewheeler wrote:

 O
 n 07/26/2010 07:01 PM, Neo Jia wrote:

 hi,

 I am seeing an unhandled vm exit: 0x11 on Win7 with KVM-88 release and
 wondering if I am still able to dump the code from guest OS when this
 happens. But it looks that all instructions are 0s after adding one
 more print code after dumping the guest registers.

 And it is very likely that this problem is fixed in the latest qemu
 code base but I still would like to know how to debug and investigate
 this kind of problem. BTW, I am using 32-bit qemu + 64-bit KVM kernel
 module.

 unhandled vm exit: 0x11

Avi,

I found the instruction that caused this problem:

emulation failed (failure) rip 71f14651 66 0f 7f 07

And according to Intel, this is a MOVDQA. So, do we already have this
instruction emulated as I am using a pretty old version of KVM
(release 88)? If yes, could you point me to the file I need to look at
for that specific patch?

Currently, I am trying to use coalesced_mmio as you suggested in
another thread:
http://www.mail-archive.com/kvm@vger.kernel.org/msg25695.html

Thanks,
Neo


 I happened to be in the intel docs today:
   0x11 (17) RSM. Guest software attempted to execute RSM in SMM.

 As it happens, the 0x11 should be looked up as a KVM_EXIT_REASON (kvm.h),
 not in the manuals as I said.  It's a kvm internal error.

 What is RSM and SMM?

 These are documented in the manuals.

 --
 error compiling committee.c: too many arguments to function





-- 
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to debug unhandled vm exit: 0x11?

2010-09-28 Thread Neo Jia
On Tue, Sep 28, 2010 at 11:40 AM, Neo Jia neo...@gmail.com wrote:
 On Tue, Jul 27, 2010 at 3:04 AM, Avi Kivity a...@redhat.com wrote:
  On 07/26/2010 08:58 PM, ewheeler wrote:

 O
 n 07/26/2010 07:01 PM, Neo Jia wrote:

 hi,

 I am seeing an unhandled vm exit: 0x11 on Win7 with KVM-88 release and
 wondering if I am still able to dump the code from guest OS when this
 happens. But it looks that all instructions are 0s after adding one
 more print code after dumping the guest registers.

 And it is very likely that this problem is fixed in the latest qemu
 code base but I still would like to know how to debug and investigate
 this kind of problem. BTW, I am using 32-bit qemu + 64-bit KVM kernel
 module.

 unhandled vm exit: 0x11

 Avi,

 I found the instruction that caused this problem:

 emulation failed (failure) rip 71f14651 66 0f 7f 07

 And according to Intel, this is a MOVDQA. So, do we already have this
 instruction emulated as I am using a pretty old version of KVM
 (release 88)? If yes, could you point me to the file I need to look at
 for that specific patch?

 Currently, I am trying to use coalesced_mmio as you suggested in
 another thread:
 http://www.mail-archive.com/kvm@vger.kernel.org/msg25695.html

Just found out that coalesced_mmio doesn't help in my case.

Thanks,
Neo


 Thanks,
 Neo


 I happened to be in the intel docs today:
   0x11 (17) RSM. Guest software attempted to execute RSM in SMM.

 As it happens, the 0x11 should be looked up as a KVM_EXIT_REASON (kvm.h),
 not in the manuals as I said.  It's a kvm internal error.

 What is RSM and SMM?

 These are documented in the manuals.

 --
 error compiling committee.c: too many arguments to function





 --
 I would remember that if researchers were not ambitious
 probably today we haven't the technology we are using!




-- 
I would remember that if researchers were not ambitious
probably today we haven't the technology we are using!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 13/17] Add mp(mediate passthru) device.

2010-09-28 Thread Sridhar Samudrala

 On 9/28/2010 8:18 AM, Arnd Bergmann wrote:

On Tuesday 28 September 2010, Michael S. Tsirkin wrote:

On Tue, Sep 28, 2010 at 04:39:59PM +0200, Arnd Bergmann wrote:

Can you be more specific what the problem is? Do you think
it breaks when a guest sends VLAN tagged frames or when macvtap
is connected to a VLAN interface that adds another tag (or
only the combination)?

I expect the protocol value to be wrong when guest sends vlan tagged
frames as 802.1q frames have a different format.

Ok, I see. Would that be fixed by using eth_type_trans()? I don't
see any code in there that tries to deal with the VLAN tag, so
do we have the same problem in the tun/tap driver?
tun_get_user() does call eth_type_trans(). Not sure why i didn't use it 
in macvtap code.

Need to test it with guest VLAN tagging to make sure it works.

Also, I wonder how we handle the case where both the guest and
the host do VLAN tagging. Does the host transparently override
the guest tag, or does it add a nested tag? More importantly,
what should it do?

I would think If both guest and host do VLAN tagging, the tags will be 
nested.


Thanks
Sridhar



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


disk image snapshot functionality

2010-09-28 Thread Peter Doherty

Hi,

I'm using Centos5.5 on the host, and the KVM that's available in the  
repos.  I'm using linux VMs too.  My disk images are qcow2 files.

Here's what I want:
To be able to, on the host, create a snapshot of the guest's disk  
image, without shutting down the guest, so that I can then restore  
back to a point in time for the guest.

I thought I could do this with the qcow2 images.
I've used:
qemu-img snapshot -c snapname disk_image.qcow2
to create the snapshot.

It doesn't work.  The snapshots claim to be created, but if I shut  
down the guest, apply the snapshot

( qemu-img snapshot -a snapname disk_image.qcow2 )
the guest either:
a.) no longer boots (No bootable disk found)
b.) boots, but is just how it was when I shut it down (it hasn't  
reverted back to what it was like when the snapshot was made)



It makes no sense.  I can sometimes apply the first snapshot, and it  
has worked...but subsequent snapshots are a no go.

One thing that is suspicious is that the VM SIZE and CLOCK are zero:
# qemu-img snapshot -l test1.qcow2
Snapshot list:
IDTAG VM SIZEDATE   VM CLOCK
1 with100mb 0 2010-09-28 11:48:23   00:00:00.000
2 with200mb 0 2010-09-28 11:50:53   00:00:00.000
3 with300mb 0 2010-09-28 11:52:49   00:00:00.000
4 whenoff   0 2010-09-28 11:56:41   00:00:00.000


# /usr/libexec/qemu-kvm --help
QEMU PC emulator version 0.9.1 (kvm-83-maint-snapshot-20090205),  
Copyright (c) 2003-2008 Fabrice Bellard



I can't find much info about using qcow2 images when I search.  Any  
help would be appreciated.


Thanks.

Peter
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Support for nested vmx

2010-09-28 Thread Nadav Har'El
On Tue, Sep 28, 2010, Vandeir Eduardo wrote about Support for nested vmx:
 Is there some place where I could get patches/instructions
 to test nested kvm vm on Intel-vt processors?

Hi,

The latest patches that I submitted on this mailing list around two months
ago apply to the then-current trunk of KVM, but do not apply cleanly to today's
trunk. So I started today the effort to rebase them to the current trunk.

Unfortunately (but not too unexpectedly) the rebase did not go smoothly,
and the resulting code doesn't work (L2 hangs). But I'll debug it, and expect
that I'll be able to release a new version of the patches within a few days.

Please note that while nested VMX should work with these patches, it still
has a few limitations. Currently it was only tested with KVM as a nested
(L1) hypervisor - we got VMWare Server working, but the necessary patches are
not yet included in this patch set. Also, only Linux is sure to work as L2
nested guest - Windows *did* work at some point, but it didn't seem to work
in this patch set last time I tried - and some debugging work is still needed
to figure out why.

Nadav.


-- 
Nadav Har'El| Tuesday, Sep 28 2010, 21 Tishri 5771
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |You do not need a parachute to skydive.
http://nadav.harel.org.il   |You only need one to skydive twice.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices

2010-09-28 Thread Sheng Yang
On Tuesday 28 September 2010 21:36:00 Michael S. Tsirkin wrote:
 On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
  This patch enable per-vector mask for assigned devices using MSI-X.
  
  Signed-off-by: Sheng Yang sh...@linux.intel.com
  ---
  
   arch/x86/kvm/x86.c   |1 +
   include/linux/kvm.h  |9 -
   include/linux/kvm_host.h |1 +
   virt/kvm/assigned-dev.c  |   39 
+++
   4 files changed, 49 insertions(+), 1 deletions(-)
  
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index 8412c91..e6933e6 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  
  case KVM_CAP_DEBUGREGS:
  case KVM_CAP_X86_ROBUST_SINGLESTEP:
  
  case KVM_CAP_XSAVE:
  +   case KVM_CAP_DEVICE_MSIX_MASK:
  r = 1;
  break;
  
  case KVM_CAP_COALESCED_MMIO:
  diff --git a/include/linux/kvm.h b/include/linux/kvm.h
  index 919ae53..f2b7cdc 100644
  --- a/include/linux/kvm.h
  +++ b/include/linux/kvm.h
  @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
  
   #endif
   #define KVM_CAP_PPC_GET_PVINFO 57
   #define KVM_CAP_PPC_IRQ_LEVEL 58
  
  +#ifdef __KVM_HAVE_MSIX
  +#define KVM_CAP_DEVICE_MSIX_MASK 59
  +#endif
  
   #ifdef KVM_CAP_IRQ_ROUTING
  
  @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
  
   };
   
   #define KVM_MAX_MSIX_PER_DEV   256
  
  +
  +#define KVM_MSIX_FLAG_MASK 1
  +
  
   struct kvm_assigned_msix_entry {
   
  __u32 assigned_dev_id;
  __u32 gsi;
  __u16 entry; /* The index of entry in the MSI-X table */
  
  -   __u16 padding[3];
  +   __u16 flags;
  +   __u16 padding[2];
  
   };
   
   #endif /* __LINUX_KVM_H */
  
  diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
  index 0b89d00..a43405c 100644
  --- a/include/linux/kvm_host.h
  +++ b/include/linux/kvm_host.h
  @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
  
   };
   
   #define KVM_ASSIGNED_MSIX_PENDING  0x1
  
  +#define KVM_ASSIGNED_MSIX_MASK 0x2
  
   struct kvm_guest_msix_entry {
   
  u32 vector;
  u16 entry;
  
  diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
  index 7c98928..15b8c32 100644
  --- a/virt/kvm/assigned-dev.c
  +++ b/virt/kvm/assigned-dev.c
  @@ -17,6 +17,8 @@
  
   #include linux/pci.h
   #include linux/interrupt.h
   #include linux/slab.h
  
  +#include linux/irqnr.h
  +
  
   #include irq.h
   
   static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct
   list_head *head,
  
  @@ -666,6 +668,30 @@ msix_nr_out:
  return r;
   
   }
  
  +static void update_msix_mask(struct kvm_assigned_dev_kernel
  *assigned_dev, + int index)
  +{
  +   int irq;
  +
  +   if (!assigned_dev-dev-msix_enabled ||
  +   !(assigned_dev-irq_requested_type  KVM_DEV_IRQ_HOST_MSIX))
  +   return;
  +
  +   irq = assigned_dev-host_msix_entries[index].vector;
  +
  +   ASSERT(irq != 0);
  +
  +   if (assigned_dev-guest_msix_entries[index].flags 
  +   KVM_ASSIGNED_MSIX_MASK)
  +   disable_irq(irq);
  +   else {
  +   enable_irq(irq);
  +   if (assigned_dev-guest_msix_entries[index].flags 
  +   KVM_ASSIGNED_MSIX_PENDING)
  +   schedule_work(assigned_dev-interrupt_work);
  +   }
 
 Hmm, won't this lose interrupts which were sent while bit was pending?
 It is also pretty heavy if as you say guests touch the mask a lot.
 I think we must keep the interrupt disabled, just set a bit
 and delay interrupt injection until vector is unmasked
 or deleted. The interface to do this will need more thought:
 e.g. how can userspace clear this bit then?

I think it's fine. Because we didn't modify pending bit here, and the interrupt 
handler would schedule an work to check it regardless of if the IRQ is disable. 
By 
this meaning, no interrupt would be lose.

But the checking for pending bit on unmask action seems unnecessary? Because if 
the bit is set, then definitely one work has been scheduled to deal with it.

--
regards
Yang, Sheng

 
  +}
  +
  
   static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
   
 struct kvm_assigned_msix_entry *entry)
   
   {
  
  @@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm
  *kvm,
  
  adev-guest_msix_entries[i].entry = entry-entry;
  adev-guest_msix_entries[i].vector = entry-gsi;
  adev-host_msix_entries[i].entry = entry-entry;
  
  +   if ((entry-flags  KVM_MSIX_FLAG_MASK) 
  +   !(adev-guest_msix_entries[i].flags 
  +   KVM_ASSIGNED_MSIX_MASK)) {
  +   adev-guest_msix_entries[i].flags |=
  +   KVM_ASSIGNED_MSIX_MASK;
  +   

Re: [RFC] virtio_balloon: disable oom killer when fill balloon

2010-09-28 Thread Dave Young
On Tue, Sep 28, 2010 at 10:03 PM, Anthony Liguori anth...@codemonkey.ws wrote:
 On 09/28/2010 08:49 AM, Dave Young wrote:

 On Tue, Sep 28, 2010 at 9:34 PM, Anthony Liguorianth...@codemonkey.ws
  wrote:


 On 09/28/2010 08:19 AM, Dave Young wrote:


 Balloon could cause guest memory oom killing and panic. If we disable
 the
 oom killer it will be better at least avoid guest panic.

 If alloc failed we can just adjust the balloon target to be equal to
 current number by call vdev-config-set

 But during test I found the config-set num_pages does not change the
 config actually, Should I do hacks in userspace as well? If so where
 should
 I start to hack?





 Hi,

 Thanks your comments.



 The guest is not supposed to set the target field in it's config.  This
 is a
 host read/write, guest read-only field.


 Could you tell where to set it? If so, IMHO set config api should
 fail, isn't it?



 The problem with your approach generally speaking is that it's unclear
 whether this is the right policy.  For instance, what if another
 application
 held a very large allocation which caused the fill request to stop but
 then
 shortly afterwards, the aforementioned application released that
 allocation.
  If instead of just stopping, we paused and tried again later, we could
 potentially succeed.


 Yes, it is possible. But maybe better to do balloon from qemu monitor
 later?


 It's part of the specification, not something that's enforced or even
 visible within the APIs.

 I think a better approach might be a graceful back off.  For instance,
 when
 you hit this condition, deflate the balloon by 10% based on the
 assumption
 that you probably already gone too far.  Before you attempt to allocate
 to
 the target again, set a timeout that increases in duration exponentially
 until you reach some maximum (say 10s).


 I'm afraid most times it will keep doing inflate/deflate circularly.


 With sufficiently large timeouts, does it matter?

 The other side of the argument is that the host should be more careful about
 doing balloon requests to the guest.  Alternatively, you can argue that the
 guest should be able to balloon itself and that's where the logic should be.

 But I think split policy across the guest and host would prove to be
 prohibitively complex to deal with.

Thanks.

What do you think about add an option like:

-balloon virtio,nofail
With nofail option we stop balloon if oom

The default behavior without nofail will be as your said,  ie. retry
every  5 minutes


 Regards,

 Anthony Liguori



 This isn't the only approach, but hopefully it conveys the idea of
 gracefully backing off without really giving up.

 Regards,

 Anthony Liguori



 Thanks

 --- linux-2.6.orig/drivers/virtio/virtio_balloon.c      2010-09-25
 20:58:14.19001 +0800
 +++ linux-2.6/drivers/virtio/virtio_balloon.c   2010-09-28
 21:05:42.20675 +0800
 @@ -25,6 +25,7 @@
  #includelinux/freezer.h
  #includelinux/delay.h
  #includelinux/slab.h
 +#includelinux/oom.h

  struct virtio_balloon
  {
 @@ -97,8 +98,22 @@ static void tell_host(struct virtio_ball
        wait_for_completion(vb-acked);
  }

 -static void fill_balloon(struct virtio_balloon *vb, size_t num)
 +static int cblimit(int times)
  {
 +       static int t;
 +
 +       if (t    times)
 +               t++;
 +       else
 +               t = 0;
 +
 +       return !t;
 +}
 +
 +static int fill_balloon(struct virtio_balloon *vb, size_t num)
 +{
 +       int ret = 0;
 +
        /* We can only do one array worth at a time. */
        num = min(num, ARRAY_SIZE(vb-pfns));

 @@ -106,10 +121,13 @@ static void fill_balloon(struct virtio_b
                struct page *page = alloc_page(GFP_HIGHUSER |
 __GFP_NORETRY
 |
                                        __GFP_NOMEMALLOC | __GFP_NOWARN);
                if (!page) {
 -                       if (printk_ratelimit())
 +                       if (cblimit(5)) {
                                dev_printk(KERN_INFO,vb-vdev-dev,
                                           Out of puff! Can't get %zu
 pages\n,
                                           num);
 +                               ret = -ENOMEM;
 +                               goto out;
 +                       }
                        /* Sleep for at least 1/5 of a second before
 retry.
 */
                        msleep(200);
                        break;
 @@ -120,11 +138,11 @@ static void fill_balloon(struct virtio_b
                list_add(page-lru,vb-pages);
        }

 -       /* Didn't get any?  Oh well. */
 -       if (vb-num_pfns == 0)
 -               return;
 +out:
 +       if (vb-num_pfns)
 +               tell_host(vb, vb-inflate_vq);

 -       tell_host(vb, vb-inflate_vq);
 +       return ret;
  }

  static void release_pages_by_pfn(const u32 pfns[], unsigned int num)
 @@ -251,6 +269,14 @@ static void update_balloon_size(struct v
                        actual, sizeof(actual));
  }

 +static void update_balloon_target(struct virtio_balloon *vb)
 +{
 +      

IOMMU in guest OS

2010-09-28 Thread Thawan Kooburat
Hi,

Can someone tell my if my understanding is correct or not. Currently,
KVM only use IOMMU to allow PCI/PCI Express device to be assigned as a
PCI device inside guest OS. So right now guest OS does not see IOMMU
as a device, but this may change when IOMMU emulation (I saw some
patches on the list) is merged into development branch.


-- 
Thawan Kooburat

Graduate Student
Department of Computer Science
UW-Madison
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] macvtap: TX zero copy between guest and host kernel

2010-09-28 Thread Shirley Ma
Hello Michael,

On Wed, 2010-09-15 at 07:52 -0700, Shirley Ma wrote:
Don't you think once I address vhost_add_used_and_signal update
   issue, it is a simple and complete patch for macvtap TX zero copy?
   
   Thanks
   Shirley
  
  I like the fact that the patch is simple. Unfortunately
  I suspect it'll stop being simple by the time it's complete :) 
 
 I can make a try. :)

I compared several approaches for addressing the issue being raised here
on how/when to update vhost_add_used_and_signal. The simple approach I
have found is:

1. Adding completion field in struct virtqueue;
2. when it is a zero copy packet, put vhost thread wait for completion
to update vhost_add_used_and_signal;
3. passing vq from vhost to macvtap as skb destruct_arg;
4. when skb is freed for the last reference, signal vq completion

The test results show same performance as the original patch. How do you
think? If it sounds good to you. I will resubmit this reversion patch.
The patch still keeps as simple as it was before. :)

Thanks
Shirley


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: IOMMU in guest OS

2010-09-28 Thread Muli Ben-Yehuda
On Tue, Sep 28, 2010 at 08:56:57PM -0500, Thawan Kooburat wrote:

 Can someone tell my if my understanding is correct or
 not. Currently, KVM only use IOMMU to allow PCI/PCI Express device
 to be assigned as a PCI device inside guest OS. So right now guest
 OS does not see IOMMU as a device, but this may change when IOMMU
 emulation (I saw some patches on the list) is merged into
 development branch.

Correct.

Cheers,
Muli
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html