Re: regression: nested: L1 3.15+ fails to load kvm-intel on L0 3.15

2015-03-18 Thread Paolo Bonzini


On 18/03/2015 09:46, Stefan Bader wrote:
 
 Regardless of that, I wonder whether the below (this version untested) sound
 acceptable for upstream? At least it would make debugging much simpler. :)
 
 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2953,8 +2953,11 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 
 ct
 ctl |= vmx_msr_low;  /* bit == 1 in low word  == must be one  */
 
 /* Ensure minimum (required) set of control bits are supported. */
 -   if (ctl_min  ~ctl)
 +   if (ctl_min  ~ctl) {
 +   printk(KERN_ERR vmx: msr(%08x) does not match requirements. 
 +   req=%08x cur=%08x\n, msr, ctl_min, ctl);
 return -EIO;
 +   }
 
 *result = ctl;
 return 0;

Yes, this is nice.  Maybe -ENODEV.

Also, a minimal patch for Ubuntu would probably be:

@@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
*vmcs_conf)
  vmx_capability.ept, vmx_capability.vpid);
}
 
-   min = 0;
+   min = VM_EXIT_SAVE_DEBUG_CONTROLS;
 #ifdef CONFIG_X86_64
min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
 #endif

but I don't think it's a good idea to add it to stable kernels.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: regression: nested: L1 3.15+ fails to load kvm-intel on L0 3.15

2015-03-18 Thread Stefan Bader
On 18.03.2015 11:27, Paolo Bonzini wrote:
 
 
 On 18/03/2015 10:59, Stefan Bader wrote:
 @@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct
 vmcs_config *vmcs_conf) vmx_capability.ept,
 vmx_capability.vpid); }

 -   min = 0; +  min = VM_EXIT_SAVE_DEBUG_CONTROLS; #ifdef
 CONFIG_X86_64 min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; #endif

 but I don't think it's a good idea to add it to stable kernels.

 Why is that? Because it has a risk of causing the module failing to
 load on L0 where it did work before?
 
 Because if we wanted to make 3.14 nested VMX stable-ish we would need
 several more, at least these:
 
   KVM: nVMX: fix lifetime issues for vmcs02
   KVM: nVMX: clean up nested_release_vmcs12 and code around it
   KVM: nVMX: Rework interception of IRQs and NMIs
   KVM: nVMX: Do not inject NMI vmexits when L2 has a pending
  interrupt
   KVM: nVMX: Disable preemption while reading from shadow VMCS
 
 and for 3.13:
 
   KVM: nVMX: Leave VMX mode on clearing of feature control MSR
 
 There are also several L2-crash-L1 bugs too in Nadav Amit's patches.
 
 Basically, nested VMX was never considered stable-worthy.  Perhaps
 that can change soon---but not retroactively.
 
 So I'd rather avoid giving false impressions of the stability of nVMX
 in 3.14.
 
 Even if we considered nVMX stable, I'd _really_ not want to consider
 the L1-L2 boundary a secure one for a longer time.
 
 Which would be something I would rather avoid. Generally I think it
 would be good to have something that can be generally applied.
 Given the speed that cloud service providers tend to move forward
 (ok they may not actively push the ability to go nested).
 
 And if they did, I'd really not want them to do it with a 3.14 kernel.

3.14... you are optimistic. :) But thanks a lot for the detailed info.

-Stefan

 
 Paolo
 




signature.asc
Description: OpenPGP digital signature


regression: nested: L1 3.15+ fails to load kvm-intel on L0 3.15

2015-03-18 Thread Stefan Bader
Someone reported[1] that some of their L1 guests fail to load the kvm-intel
module (without much details). Turns out that this was (at least) caused by

KVM: vmx: Allow the guest to run with dirty debug registers

as this adds VM_EXIT_SAVE_DEBUG_CONTROLS to the required MSR_IA32_VMX_EXIT_CTLS
bits. Not sure this should be fixed up in pre 3.15 kernels or the other way
round. Maybe naively asked but would it be sufficient to add this as required to
older kernels vmcs setup (without the code to make any use of it)?

Regardless of that, I wonder whether the below (this version untested) sound
acceptable for upstream? At least it would make debugging much simpler. :)

--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2953,8 +2953,11 @@ static __init int adjust_vmx_controls(u32 ctl_min, u32 ct
ctl |= vmx_msr_low;  /* bit == 1 in low word  == must be one  */

/* Ensure minimum (required) set of control bits are supported. */
-   if (ctl_min  ~ctl)
+   if (ctl_min  ~ctl) {
+   printk(KERN_ERR vmx: msr(%08x) does not match requirements. 
+   req=%08x cur=%08x\n, msr, ctl_min, ctl);
return -EIO;
+   }

*result = ctl;
return 0;

Thanks,
-Stefan

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1431473



signature.asc
Description: OpenPGP digital signature


Re: regression: nested: L1 3.15+ fails to load kvm-intel on L0 3.15

2015-03-18 Thread Stefan Bader
On 18.03.2015 10:18, Paolo Bonzini wrote:
 
 
 On 18/03/2015 09:46, Stefan Bader wrote:

 Regardless of that, I wonder whether the below (this version untested) sound
 acceptable for upstream? At least it would make debugging much simpler. :)

 --- a/arch/x86/kvm/vmx.c
 +++ b/arch/x86/kvm/vmx.c
 @@ -2953,8 +2953,11 @@ static __init int adjust_vmx_controls(u32 ctl_min, 
 u32 ct
 ctl |= vmx_msr_low;  /* bit == 1 in low word  == must be one  */

 /* Ensure minimum (required) set of control bits are supported. */
 -   if (ctl_min  ~ctl)
 +   if (ctl_min  ~ctl) {
 +   printk(KERN_ERR vmx: msr(%08x) does not match requirements. 
 
 +   req=%08x cur=%08x\n, msr, ctl_min, ctl);
 return -EIO;
 +   }

 *result = ctl;
 return 0;
 
 Yes, this is nice.  Maybe -ENODEV.

Maybe, though I did not change that. Just added to give some kind of hint when
the module would otherwise fail with just an IO error.

 
 Also, a minimal patch for Ubuntu would probably be:
 
 @@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct vmcs_config 
 *vmcs_conf)
 vmx_capability.ept, vmx_capability.vpid);
   }
  
 - min = 0;
 + min = VM_EXIT_SAVE_DEBUG_CONTROLS;
  #ifdef CONFIG_X86_64
   min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
  #endif
 
 but I don't think it's a good idea to add it to stable kernels.

Why is that? Because it has a risk of causing the module failing to load on L0
where it did work before? Which would be something I would rather avoid.
Generally I think it would be good to have something that can be generally
applied. Given the speed that cloud service providers tend to move forward (ok
they may not actively push the ability to go nested).

-Stefan
 
 Paolo
 




signature.asc
Description: OpenPGP digital signature


Re: regression: nested: L1 3.15+ fails to load kvm-intel on L0 3.15

2015-03-18 Thread Paolo Bonzini


On 18/03/2015 10:59, Stefan Bader wrote:
 @@ -2850,7 +2851,7 @@ static __init int setup_vmcs_config(struct
 vmcs_config *vmcs_conf) vmx_capability.ept,
 vmx_capability.vpid); }
 
 -min = 0; +  min = VM_EXIT_SAVE_DEBUG_CONTROLS; #ifdef
 CONFIG_X86_64 min |= VM_EXIT_HOST_ADDR_SPACE_SIZE; #endif
 
 but I don't think it's a good idea to add it to stable kernels.
 
 Why is that? Because it has a risk of causing the module failing to
 load on L0 where it did work before?

Because if we wanted to make 3.14 nested VMX stable-ish we would need
several more, at least these:

  KVM: nVMX: fix lifetime issues for vmcs02
  KVM: nVMX: clean up nested_release_vmcs12 and code around it
  KVM: nVMX: Rework interception of IRQs and NMIs
  KVM: nVMX: Do not inject NMI vmexits when L2 has a pending
 interrupt
  KVM: nVMX: Disable preemption while reading from shadow VMCS

and for 3.13:

  KVM: nVMX: Leave VMX mode on clearing of feature control MSR

There are also several L2-crash-L1 bugs too in Nadav Amit's patches.

Basically, nested VMX was never considered stable-worthy.  Perhaps
that can change soon---but not retroactively.

So I'd rather avoid giving false impressions of the stability of nVMX
in 3.14.

Even if we considered nVMX stable, I'd _really_ not want to consider
the L1-L2 boundary a secure one for a longer time.

 Which would be something I would rather avoid. Generally I think it
 would be good to have something that can be generally applied.
 Given the speed that cloud service providers tend to move forward
 (ok they may not actively push the ability to go nested).

And if they did, I'd really not want them to do it with a 3.14 kernel.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 04/11] KVM: s390: Guest's memory access functions get access registers

2015-03-18 Thread Christian Borntraeger
From: Alexander Yarygin yary...@linux.vnet.ibm.com

In access register mode, the write_guest() read_guest() and other
functions will invoke the access register translation, which
requires an ar, designated by one of the instruction fields.

Signed-off-by: Alexander Yarygin yary...@linux.vnet.ibm.com
Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/kvm/diag.c  |  4 +--
 arch/s390/kvm/gaccess.c   |  4 +--
 arch/s390/kvm/gaccess.h   | 14 +
 arch/s390/kvm/intercept.c |  4 +--
 arch/s390/kvm/kvm-s390.c  |  2 +-
 arch/s390/kvm/kvm-s390.h  | 25 +---
 arch/s390/kvm/priv.c  | 72 ---
 arch/s390/kvm/sigp.c  |  4 +--
 8 files changed, 81 insertions(+), 48 deletions(-)

diff --git a/arch/s390/kvm/diag.c b/arch/s390/kvm/diag.c
index 9254aff..89140dd 100644
--- a/arch/s390/kvm/diag.c
+++ b/arch/s390/kvm/diag.c
@@ -77,7 +77,7 @@ static int __diag_page_ref_service(struct kvm_vcpu *vcpu)
 
if (vcpu-run-s.regs.gprs[rx]  7)
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
-   rc = read_guest(vcpu, vcpu-run-s.regs.gprs[rx], parm, sizeof(parm));
+   rc = read_guest(vcpu, vcpu-run-s.regs.gprs[rx], rx, parm, 
sizeof(parm));
if (rc)
return kvm_s390_inject_prog_cond(vcpu, rc);
if (parm.parm_version != 2 || parm.parm_len  5 || parm.code != 0x258)
@@ -230,7 +230,7 @@ static int __diag_virtio_hypercall(struct kvm_vcpu *vcpu)
 
 int kvm_s390_handle_diag(struct kvm_vcpu *vcpu)
 {
-   int code = kvm_s390_get_base_disp_rs(vcpu)  0x;
+   int code = kvm_s390_get_base_disp_rs(vcpu, NULL)  0x;
 
if (vcpu-arch.sie_block-gpsw.mask  PSW_MASK_PSTATE)
return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP);
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index c230904..494131e 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -578,7 +578,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned 
long ga,
return 0;
 }
 
-int access_guest(struct kvm_vcpu *vcpu, unsigned long ga, void *data,
+int access_guest(struct kvm_vcpu *vcpu, unsigned long ga, ar_t ar, void *data,
 unsigned long len, int write)
 {
psw_t *psw = vcpu-arch.sie_block-gpsw;
@@ -652,7 +652,7 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long 
gra,
  * Note: The IPTE lock is not taken during this function, so the caller
  * has to take care of this.
  */
-int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva,
+int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, ar_t ar,
unsigned long *gpa, int write)
 {
struct kvm_s390_pgm_info *pgm = vcpu-arch.pgm;
diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
index 20de77e..7c2866b 100644
--- a/arch/s390/kvm/gaccess.h
+++ b/arch/s390/kvm/gaccess.h
@@ -156,9 +156,9 @@ int read_guest_lc(struct kvm_vcpu *vcpu, unsigned long gra, 
void *data,
 }
 
 int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva,
-   unsigned long *gpa, int write);
+   ar_t ar, unsigned long *gpa, int write);
 
-int access_guest(struct kvm_vcpu *vcpu, unsigned long ga, void *data,
+int access_guest(struct kvm_vcpu *vcpu, unsigned long ga, ar_t ar, void *data,
 unsigned long len, int write);
 
 int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra,
@@ -168,6 +168,7 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned long 
gra,
  * write_guest - copy data from kernel space to guest space
  * @vcpu: virtual cpu
  * @ga: guest address
+ * @ar: access register
  * @data: source address in kernel space
  * @len: number of bytes to copy
  *
@@ -210,16 +211,17 @@ int access_guest_real(struct kvm_vcpu *vcpu, unsigned 
long gra,
  *  if data has been changed in guest space in case of an exception.
  */
 static inline __must_check
-int write_guest(struct kvm_vcpu *vcpu, unsigned long ga, void *data,
+int write_guest(struct kvm_vcpu *vcpu, unsigned long ga, ar_t ar, void *data,
unsigned long len)
 {
-   return access_guest(vcpu, ga, data, len, 1);
+   return access_guest(vcpu, ga, ar, data, len, 1);
 }
 
 /**
  * read_guest - copy data from guest space to kernel space
  * @vcpu: virtual cpu
  * @ga: guest address
+ * @ar: access register
  * @data: destination address in kernel space
  * @len: number of bytes to copy
  *
@@ -229,10 +231,10 @@ int write_guest(struct kvm_vcpu *vcpu, unsigned long ga, 
void *data,
  * data will be copied from guest space to kernel space.
  */
 static inline __must_check
-int read_guest(struct kvm_vcpu *vcpu, unsigned long ga, void *data,
+int read_guest(struct kvm_vcpu *vcpu, unsigned long ga, ar_t ar, void *data,
   unsigned long len)
 {
-   

[GIT PULL 07/11] KVM: s390: Add MEMOP ioctls for reading/writing guest memory

2015-03-18 Thread Christian Borntraeger
From: Thomas Huth th...@linux.vnet.ibm.com

On s390, we've got to make sure to hold the IPTE lock while accessing
logical memory. So let's add an ioctl for reading and writing logical
memory to provide this feature for userspace, too.
The maximum transfer size of this call is limited to 64kB to prevent
that the guest can trigger huge copy_from/to_user transfers. QEMU
currently only requests up to one or two pages so far, so 16*4kB seems
to be a reasonable limit here.

Signed-off-by: Thomas Huth th...@linux.vnet.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 Documentation/virtual/kvm/api.txt | 46 
 arch/s390/kvm/gaccess.c   | 22 
 arch/s390/kvm/gaccess.h   |  2 ++
 arch/s390/kvm/kvm-s390.c  | 74 +++
 include/uapi/linux/kvm.h  | 21 +++
 5 files changed, 165 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index ee47998e..281179d 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2716,6 +2716,52 @@ The fields in each entry are defined as follows:
eax, ebx, ecx, edx: the values returned by the cpuid instruction for
  this function/index combination
 
+4.89 KVM_S390_MEM_OP
+
+Capability: KVM_CAP_S390_MEM_OP
+Architectures: s390
+Type: vcpu ioctl
+Parameters: struct kvm_s390_mem_op (in)
+Returns: = 0 on success,
+  0 on generic error (e.g. -EFAULT or -ENOMEM),
+  0 if an exception occurred while walking the page tables
+
+Read or write data from/to the logical (virtual) memory of a VPCU.
+
+Parameters are specified via the following structure:
+
+struct kvm_s390_mem_op {
+   __u64 gaddr;/* the guest address */
+   __u64 flags;/* flags */
+   __u32 size; /* amount of bytes */
+   __u32 op;   /* type of operation */
+   __u64 buf;  /* buffer in userspace */
+   __u8 ar;/* the access register number */
+   __u8 reserved[31];  /* should be set to 0 */
+};
+
+The type of operation is specified in the op field. It is either
+KVM_S390_MEMOP_LOGICAL_READ for reading from logical memory space or
+KVM_S390_MEMOP_LOGICAL_WRITE for writing to logical memory space. The
+KVM_S390_MEMOP_F_CHECK_ONLY flag can be set in the flags field to check
+whether the corresponding memory access would create an access exception
+(without touching the data in the memory at the destination). In case an
+access exception occurred while walking the MMU tables of the guest, the
+ioctl returns a positive error number to indicate the type of exception.
+This exception is also raised directly at the corresponding VCPU if the
+flag KVM_S390_MEMOP_F_INJECT_EXCEPTION is set in the flags field.
+
+The start address of the memory region has to be specified in the gaddr
+field, and the length of the region in the size field. buf is the buffer
+supplied by the userspace application where the read data should be written
+to for KVM_S390_MEMOP_LOGICAL_READ, or where the data that should be written
+is stored for a KVM_S390_MEMOP_LOGICAL_WRITE. buf is unused and can be NULL
+when KVM_S390_MEMOP_F_CHECK_ONLY is specified. ar designates the access
+register number to be used.
+
+The reserved field is meant for future extensions. It is not used by
+KVM with the currently defined set of flags.
+
 5. The kvm_run structure
 
 
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index ea38d71..a7559f7 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -864,6 +864,28 @@ int guest_translate_address(struct kvm_vcpu *vcpu, 
unsigned long gva, ar_t ar,
 }
 
 /**
+ * check_gva_range - test a range of guest virtual addresses for accessibility
+ */
+int check_gva_range(struct kvm_vcpu *vcpu, unsigned long gva, ar_t ar,
+   unsigned long length, int is_write)
+{
+   unsigned long gpa;
+   unsigned long currlen;
+   int rc = 0;
+
+   ipte_lock(vcpu);
+   while (length  0  !rc) {
+   currlen = min(length, PAGE_SIZE - (gva % PAGE_SIZE));
+   rc = guest_translate_address(vcpu, gva, ar, gpa, is_write);
+   gva += currlen;
+   length -= currlen;
+   }
+   ipte_unlock(vcpu);
+
+   return rc;
+}
+
+/**
  * kvm_s390_check_low_addr_prot_real - check for low-address protection
  * @gra: Guest real address
  *
diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
index 835e557..ef03726 100644
--- a/arch/s390/kvm/gaccess.h
+++ b/arch/s390/kvm/gaccess.h
@@ -157,6 +157,8 @@ int read_guest_lc(struct kvm_vcpu *vcpu, unsigned long gra, 
void *data,
 
 int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva,
ar_t ar, unsigned long *gpa, int write);
+int check_gva_range(struct kvm_vcpu *vcpu, unsigned long gva, ar_t ar,
+  

[GIT PULL 00/11] KVM: s390: Features and fixes for 4.1 (kvm/next)

2015-03-18 Thread Christian Borntraeger
Paolo, Marcelo,

here is the followup pull request. As Marcelo has not yet pushed out
queue or next to git.kernel.org, this request is based on the previous
s390 pull request and should merge without conflicts.

For details see tag description.

Christian

The following changes since commit 13211ea7b47db3d8ee2ff258a9a973a6d3aa3d43:

  KVM: s390: Enable vector support for capable guest (2015-03-06 13:49:35 +0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git  
tags/kvm-s390-next-20150318

for you to fetch changes up to 18280d8b4bcd4a2b174ee3cd748166c6190acacb:

  KVM: s390: represent SIMD cap in kvm facility (2015-03-17 16:33:14 +0100)


KVM: s390: Features and fixes for 4.1 (kvm/next)

1. Fixes
2. Implement access register mode in KVM
3. Provide a userspace post handler for the STSI instruction
4. Provide an interface for compliant memory accesses
5. Provide an interface for getting/setting the guest storage key
6. Fixup for the vector facility patches: do not announce the
   vector facility in the guest for old QEMUs.

1-5 were initially shown as RFC in

http://www.spinics.net/lists/kvm/msg114720.html

some small review changes
- added some ACKs
- have the AR mode patches first
- get rid of unnecessary AR_INVAL define
- typos and language

6. two new patches
The two new patches fixup the vector support patches that were
introduced in the last pull request for QEMU versions that dont
know about vector support and guests that do. (We announce the
facility bit, but dont enable the facility so vector aware guests
will crash on vector instructions).


Alexander Yarygin (4):
  KVM: s390: Fix low-address protection for real addresses
  KVM: s390: Guest's memory access functions get access registers
  KVM: s390: Optimize paths where get_vcpu_asce() is invoked
  KVM: s390: Add access register mode

Dominik Dingel (1):
  KVM: s390: cleanup jump lables in kvm_arch_init_vm

Ekaterina Tumanova (1):
  KVM: s390: introduce post handlers for STSI

Geert Uytterhoeven (1):
  KVM: s390: Spelling s/intance/instance/

Jason J. Herne (1):
  KVM: s390: Create ioctl for Getting/Setting guest storage keys

Michael Mueller (2):
  KVM: s390: drop SIMD bit from kvm_s390_fac_list_mask
  KVM: s390: represent SIMD cap in kvm facility

Thomas Huth (1):
  KVM: s390: Add MEMOP ioctls for reading/writing guest memory

 Documentation/virtual/kvm/api.txt | 132 +
 arch/s390/include/asm/kvm_host.h  |   2 +-
 arch/s390/kvm/diag.c  |   4 +-
 arch/s390/kvm/gaccess.c   | 294 +++---
 arch/s390/kvm/gaccess.h   |  21 +--
 arch/s390/kvm/intercept.c |   4 +-
 arch/s390/kvm/kvm-s390.c  | 238 +++---
 arch/s390/kvm/kvm-s390.h  |  38 -
 arch/s390/kvm/priv.c  |  93 +++-
 arch/s390/kvm/sigp.c  |   4 +-
 include/uapi/linux/kvm.h  |  46 ++
 11 files changed, 752 insertions(+), 124 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 09/11] KVM: s390: Create ioctl for Getting/Setting guest storage keys

2015-03-18 Thread Christian Borntraeger
From: Jason J. Herne jjhe...@linux.vnet.ibm.com

Provide the KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS ioctl which can be used
to get/set guest storage keys. This functionality is needed for live migration
of s390 guests that use storage keys.

Signed-off-by: Jason J. Herne jjhe...@linux.vnet.ibm.com
Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 Documentation/virtual/kvm/api.txt |  58 ++
 arch/s390/kvm/kvm-s390.c  | 123 ++
 include/uapi/linux/kvm.h  |  14 +
 3 files changed, 195 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index c1fcb7a..0d7fc66 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2762,6 +2762,64 @@ register number to be used.
 The reserved field is meant for future extensions. It is not used by
 KVM with the currently defined set of flags.
 
+4.90 KVM_S390_GET_SKEYS
+
+Capability: KVM_CAP_S390_SKEYS
+Architectures: s390
+Type: vm ioctl
+Parameters: struct kvm_s390_skeys
+Returns: 0 on success, KVM_S390_GET_KEYS_NONE if guest is not using storage
+ keys, negative value on error
+
+This ioctl is used to get guest storage key values on the s390
+architecture. The ioctl takes parameters via the kvm_s390_skeys struct.
+
+struct kvm_s390_skeys {
+   __u64 start_gfn;
+   __u64 count;
+   __u64 skeydata_addr;
+   __u32 flags;
+   __u32 reserved[9];
+};
+
+The start_gfn field is the number of the first guest frame whose storage keys
+you want to get.
+
+The count field is the number of consecutive frames (starting from start_gfn)
+whose storage keys to get. The count field must be at least 1 and the maximum
+allowed value is defined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range
+will cause the ioctl to return -EINVAL.
+
+The skeydata_addr field is the address to a buffer large enough to hold count
+bytes. This buffer will be filled with storage key data by the ioctl.
+
+4.91 KVM_S390_SET_SKEYS
+
+Capability: KVM_CAP_S390_SKEYS
+Architectures: s390
+Type: vm ioctl
+Parameters: struct kvm_s390_skeys
+Returns: 0 on success, negative value on error
+
+This ioctl is used to set guest storage key values on the s390
+architecture. The ioctl takes parameters via the kvm_s390_skeys struct.
+See section on KVM_S390_GET_SKEYS for struct definition.
+
+The start_gfn field is the number of the first guest frame whose storage keys
+you want to set.
+
+The count field is the number of consecutive frames (starting from start_gfn)
+whose storage keys to get. The count field must be at least 1 and the maximum
+allowed value is defined as KVM_S390_SKEYS_ALLOC_MAX. Values outside this range
+will cause the ioctl to return -EINVAL.
+
+The skeydata_addr field is the address to a buffer containing count bytes of
+storage keys. Each byte in the buffer will be set as the storage key for a
+single frame starting at start_gfn for count frames.
+
+Note: If any architecturally invalid key value is found in the given data then
+the ioctl will return -EINVAL.
+
 5. The kvm_run structure
 
 
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index fdfa106..0dc22ba 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -179,6 +179,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_MP_STATE:
case KVM_CAP_S390_USER_SIGP:
case KVM_CAP_S390_USER_STSI:
+   case KVM_CAP_S390_SKEYS:
r = 1;
break;
case KVM_CAP_S390_MEM_OP:
@@ -729,6 +730,108 @@ static int kvm_s390_vm_has_attr(struct kvm *kvm, struct 
kvm_device_attr *attr)
return ret;
 }
 
+static long kvm_s390_get_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
+{
+   uint8_t *keys;
+   uint64_t hva;
+   unsigned long curkey;
+   int i, r = 0;
+
+   if (args-flags != 0)
+   return -EINVAL;
+
+   /* Is this guest using storage keys? */
+   if (!mm_use_skey(current-mm))
+   return KVM_S390_GET_SKEYS_NONE;
+
+   /* Enforce sane limit on memory allocation */
+   if (args-count  1 || args-count  KVM_S390_SKEYS_MAX)
+   return -EINVAL;
+
+   keys = kmalloc_array(args-count, sizeof(uint8_t),
+GFP_KERNEL | __GFP_NOWARN);
+   if (!keys)
+   keys = vmalloc(sizeof(uint8_t) * args-count);
+   if (!keys)
+   return -ENOMEM;
+
+   for (i = 0; i  args-count; i++) {
+   hva = gfn_to_hva(kvm, args-start_gfn + i);
+   if (kvm_is_error_hva(hva)) {
+   r = -EFAULT;
+   goto out;
+   }
+
+   curkey = get_guest_storage_key(current-mm, hva);
+   if (IS_ERR_VALUE(curkey)) {
+   r = curkey;
+   goto out;
+  

[GIT PULL 01/11] KVM: s390: Spelling s/intance/instance/

2015-03-18 Thread Christian Borntraeger
From: Geert Uytterhoeven geert+rene...@glider.be

Signed-off-by: Geert Uytterhoeven geert+rene...@glider.be
Message-Id: 1425932832-6244-1-git-send-email-geert+rene...@glider.be
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/kvm/kvm-s390.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index fda3f31..83f32a1 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -125,7 +125,7 @@ static inline void kvm_s390_set_psw_cc(struct kvm_vcpu 
*vcpu, unsigned long cc)
vcpu-arch.sie_block-gpsw.mask |= cc  44;
 }
 
-/* test availability of facility in a kvm intance */
+/* test availability of facility in a kvm instance */
 static inline int test_kvm_facility(struct kvm *kvm, unsigned long nr)
 {
return __test_facility(nr, kvm-arch.model.fac-mask) 
-- 
2.3.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v14 19/20] vfio: initialize the virqfd workqueue in VFIO generic code

2015-03-18 Thread Baptiste Reynal
Hello Alex,

The module solution seems fine for me, I have no argument against it.
I used your patch on my tests, they are running ok.

Regards,
Baptiste

On Wed, Mar 18, 2015 at 12:04 AM, Alex Williamson
alex.william...@redhat.com wrote:
 On Tue, 2015-03-17 at 16:29 -0600, Alex Williamson wrote:
 On Mon, 2015-03-02 at 17:59 +0100, Baptiste Reynal wrote:
  From: Antonios Motakis a.mota...@virtualopensystems.com
 
  Now we have finally completely decoupled virqfd from VFIO_PCI. We can
  initialize it from the VFIO generic code, in order to safely use it from
  multiple independent VFIO bus drivers.
 
  Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
  Signed-off-by: Baptiste Reynal b.rey...@virtualopensystems.com
  ---
   drivers/vfio/Makefile   | 4 +++-
   drivers/vfio/pci/Makefile   | 3 +--
   drivers/vfio/pci/vfio_pci.c | 8 
   drivers/vfio/vfio.c | 8 
   4 files changed, 12 insertions(+), 11 deletions(-)
 
  diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
  index dadf0ca..d798b09 100644
  --- a/drivers/vfio/Makefile
  +++ b/drivers/vfio/Makefile
  @@ -1,4 +1,6 @@
  -obj-$(CONFIG_VFIO) += vfio.o
  +vfio_core-y := vfio.o virqfd.o
  +
  +obj-$(CONFIG_VFIO) += vfio_core.o
   obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
   obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
   obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o

 This inadvertently (I assume) renames the main vfio module to vfio_core.
 That potentially breaks numerous userspace scripts that might try to
 load the vfio module.  I don't think that's acceptable.  A brute force
 way to fix this would be to rename vfio.c to vfio_core.c and change the
 Makefile to:

 vfio-y := vfio_core.o virqfd.o
 obj-$(CONFIG_VFIO) += vfio.o

 Is there any other trickery available to us that could include virqfd.o
 in vfio.o w/o source file renaming?  Thanks,

 Maybe a better option, we could let virqfd be it's own support module
 for bus drivers that need it.  Then we keep it out of vfio core.
 Something like this:

 commit f4d91ec4b72ce11e9dba861d6bf2dba93b72f0ba
 Author: Alex Williamson alex.william...@redhat.com
 Date:   Tue Mar 17 08:33:38 2015 -0600

 vfio: Split virqfd into a separate module for vfio bus drivers

 An unintended consequence of splittng virqfd support to be shared
 by bus drivers is renaming the core vfio module to vfio_core.  This
 is not very friendly to user scripts that may try to load the vfio
 module.  To resolve that and to make it clear that virqfd is a bus
 driver service and not a dependency of vfio core, move this to a
 separate module on which the bus drivers will depend.

 Signed-off-by: Alex Williamson alex.william...@redhat.com

 diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
 index d5322a4..7d092dd 100644
 --- a/drivers/vfio/Kconfig
 +++ b/drivers/vfio/Kconfig
 @@ -13,6 +13,11 @@ config VFIO_SPAPR_EEH
 depends on EEH  VFIO_IOMMU_SPAPR_TCE
 default n

 +config VFIO_VIRQFD
 +   tristate
 +   depends on VFIO  EVENTFD
 +   default n
 +
  menuconfig VFIO
 tristate VFIO Non-Privileged userspace driver framework
 depends on IOMMU_API
 diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
 index d798b09..7b8a31f 100644
 --- a/drivers/vfio/Makefile
 +++ b/drivers/vfio/Makefile
 @@ -1,6 +1,7 @@
 -vfio_core-y := vfio.o virqfd.o
 +vfio_virqfd-y := virqfd.o

 -obj-$(CONFIG_VFIO) += vfio_core.o
 +obj-$(CONFIG_VFIO) += vfio.o
 +obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
  obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
  obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
  obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
 diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
 index c6bb5da..579d83b 100644
 --- a/drivers/vfio/pci/Kconfig
 +++ b/drivers/vfio/pci/Kconfig
 @@ -1,6 +1,7 @@
  config VFIO_PCI
 tristate VFIO support for PCI devices
 depends on VFIO  PCI  EVENTFD
 +   select VFIO_VIRQFD
 help
   Support for the PCI VFIO bus driver.  This is required to make
   use of PCI drivers using the VFIO framework.
 diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
 index c0a3bff..4ec14af 100644
 --- a/drivers/vfio/platform/Kconfig
 +++ b/drivers/vfio/platform/Kconfig
 @@ -1,6 +1,7 @@
  config VFIO_PLATFORM
 tristate VFIO support for platform devices
 depends on VFIO  EVENTFD  ARM
 +   select VFIO_VIRQFD
 help
   Support for platform devices with VFIO. This is required to make
   use of platform devices present on the system using the VFIO
 @@ -11,6 +12,7 @@ config VFIO_PLATFORM
  config VFIO_AMBA
 tristate VFIO support for AMBA devices
 depends on VFIO_PLATFORM  ARM_AMBA
 +   select VFIO_VIRQFD
 help
   Support for ARM AMBA devices with VFIO. This is required to make
   use of ARM AMBA devices present 

[GIT PULL 06/11] KVM: s390: Add access register mode

2015-03-18 Thread Christian Borntraeger
From: Alexander Yarygin yary...@linux.vnet.ibm.com

Access register mode is one of the modes that control dynamic address
translation. In this mode the address space is specified by values of
the access registers. The effective address-space-control element is
obtained from the result of the access register translation. See
the Access-Register Introduction section of the chapter 5 Program
Execution in Principles of Operations for more details.

Signed-off-by: Alexander Yarygin yary...@linux.vnet.ibm.com
Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/kvm/gaccess.c | 234 +---
 arch/s390/kvm/gaccess.h |   3 +-
 2 files changed, 202 insertions(+), 35 deletions(-)

diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index c74462a..ea38d71 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -10,6 +10,7 @@
 #include asm/pgtable.h
 #include kvm-s390.h
 #include gaccess.h
+#include asm/switch_to.h
 
 union asce {
unsigned long val;
@@ -207,6 +208,54 @@ union raddress {
unsigned long pfra : 52; /* Page-Frame Real Address */
 };
 
+union alet {
+   u32 val;
+   struct {
+   u32 reserved : 7;
+   u32 p: 1;
+   u32 alesn: 8;
+   u32 alen : 16;
+   };
+};
+
+union ald {
+   u32 val;
+   struct {
+   u32 : 1;
+   u32 alo : 24;
+   u32 all : 7;
+   };
+};
+
+struct ale {
+   unsigned long i  : 1; /* ALEN-Invalid Bit */
+   unsigned long: 5;
+   unsigned long fo : 1; /* Fetch-Only Bit */
+   unsigned long p  : 1; /* Private Bit */
+   unsigned long alesn  : 8; /* Access-List-Entry Sequence Number */
+   unsigned long aleax  : 16; /* Access-List-Entry Authorization Index */
+   unsigned long: 32;
+   unsigned long: 1;
+   unsigned long asteo  : 25; /* ASN-Second-Table-Entry Origin */
+   unsigned long: 6;
+   unsigned long astesn : 32; /* ASTE Sequence Number */
+} __packed;
+
+struct aste {
+   unsigned long i  : 1; /* ASX-Invalid Bit */
+   unsigned long ato: 29; /* Authority-Table Origin */
+   unsigned long: 1;
+   unsigned long b  : 1; /* Base-Space Bit */
+   unsigned long ax : 16; /* Authorization Index */
+   unsigned long atl: 12; /* Authority-Table Length */
+   unsigned long: 2;
+   unsigned long ca : 1; /* Controlled-ASN Bit */
+   unsigned long ra : 1; /* Reusable-ASN Bit */
+   unsigned long asce   : 64; /* Address-Space-Control Element */
+   unsigned long ald: 32;
+   unsigned long astesn : 32;
+   /* .. more fields there */
+} __packed;
 
 int ipte_lock_held(struct kvm_vcpu *vcpu)
 {
@@ -307,15 +356,157 @@ void ipte_unlock(struct kvm_vcpu *vcpu)
ipte_unlock_simple(vcpu);
 }
 
-static unsigned long get_vcpu_asce(struct kvm_vcpu *vcpu)
+static int ar_translation(struct kvm_vcpu *vcpu, union asce *asce, ar_t ar,
+ int write)
+{
+   union alet alet;
+   struct ale ale;
+   struct aste aste;
+   unsigned long ald_addr, authority_table_addr;
+   union ald ald;
+   int eax, rc;
+   u8 authority_table;
+
+   if (ar = NUM_ACRS)
+   return -EINVAL;
+
+   save_access_regs(vcpu-run-s.regs.acrs);
+   alet.val = vcpu-run-s.regs.acrs[ar];
+
+   if (ar == 0 || alet.val == 0) {
+   asce-val = vcpu-arch.sie_block-gcr[1];
+   return 0;
+   } else if (alet.val == 1) {
+   asce-val = vcpu-arch.sie_block-gcr[7];
+   return 0;
+   }
+
+   if (alet.reserved)
+   return PGM_ALET_SPECIFICATION;
+
+   if (alet.p)
+   ald_addr = vcpu-arch.sie_block-gcr[5];
+   else
+   ald_addr = vcpu-arch.sie_block-gcr[2];
+   ald_addr = 0x7fc0;
+
+   rc = read_guest_real(vcpu, ald_addr + 16, ald.val, sizeof(union ald));
+   if (rc)
+   return rc;
+
+   if (alet.alen / 8  ald.all)
+   return PGM_ALEN_TRANSLATION;
+
+   if (0x7fff - ald.alo * 128  alet.alen * 16)
+   return PGM_ADDRESSING;
+
+   rc = read_guest_real(vcpu, ald.alo * 128 + alet.alen * 16, ale,
+sizeof(struct ale));
+   if (rc)
+   return rc;
+
+   if (ale.i == 1)
+   return PGM_ALEN_TRANSLATION;
+   if (ale.alesn != alet.alesn)
+   return PGM_ALE_SEQUENCE;
+
+   rc = read_guest_real(vcpu, ale.asteo * 64, aste, sizeof(struct aste));
+   if (rc)
+   return rc;
+
+   if (aste.i)
+   return PGM_ASTE_VALIDITY;
+   if (aste.astesn != ale.astesn)
+   return PGM_ASTE_SEQUENCE;
+
+   if (ale.p == 1) {
+   eax = 

[GIT PULL 11/11] KVM: s390: represent SIMD cap in kvm facility

2015-03-18 Thread Christian Borntraeger
From: Michael Mueller m...@linux.vnet.ibm.com

The patch represents capability KVM_CAP_S390_VECTOR_REGISTERS by means
of the SIMD facility bit. This allows to a) disable the use of SIMD when
used in conjunction with a not-SIMD-aware QEMU, b) to enable SIMD when
used with a SIMD-aware version of QEMU and c) finally by means of a QEMU
version using the future cpu model ioctls.

Signed-off-by: Michael Mueller m...@linux.vnet.ibm.com
Reviewed-by: Eric Farman far...@linux.vnet.ibm.com
Tested-by: Eric Farman far...@linux.vnet.ibm.com
Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/include/asm/kvm_host.h |  1 -
 arch/s390/kvm/kvm-s390.c | 19 +++
 arch/s390/kvm/kvm-s390.h | 11 +++
 3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 2356a8c..b8d1e97 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -562,7 +562,6 @@ struct kvm_arch{
int css_support;
int use_irqchip;
int use_cmma;
-   int use_vectors;
int user_cpu_state_ctrl;
int user_sigp;
int user_stsi;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 42b8a25..9072127 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -278,8 +278,12 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct 
kvm_enable_cap *cap)
r = 0;
break;
case KVM_CAP_S390_VECTOR_REGISTERS:
-   kvm-arch.use_vectors = MACHINE_HAS_VX;
-   r = MACHINE_HAS_VX ? 0 : -EINVAL;
+   if (MACHINE_HAS_VX) {
+   set_kvm_facility(kvm-arch.model.fac-mask, 129);
+   set_kvm_facility(kvm-arch.model.fac-list, 129);
+   r = 0;
+   } else
+   r = -EINVAL;
break;
case KVM_CAP_S390_USER_STSI:
kvm-arch.user_stsi = 1;
@@ -1084,7 +1088,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
kvm-arch.css_support = 0;
kvm-arch.use_irqchip = 0;
-   kvm-arch.use_vectors = 0;
kvm-arch.epoch = 0;
 
spin_lock_init(kvm-arch.start_stop_lock);
@@ -1186,12 +1189,12 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
save_fp_ctl(vcpu-arch.host_fpregs.fpc);
-   if (vcpu-kvm-arch.use_vectors)
+   if (test_kvm_facility(vcpu-kvm, 129))
save_vx_regs((__vector128 *)vcpu-arch.host_vregs-vrs);
else
save_fp_regs(vcpu-arch.host_fpregs.fprs);
save_access_regs(vcpu-arch.host_acrs);
-   if (vcpu-kvm-arch.use_vectors) {
+   if (test_kvm_facility(vcpu-kvm, 129)) {
restore_fp_ctl(vcpu-run-s.regs.fpc);
restore_vx_regs((__vector128 *)vcpu-run-s.regs.vrs);
} else {
@@ -1207,7 +1210,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
atomic_clear_mask(CPUSTAT_RUNNING, vcpu-arch.sie_block-cpuflags);
gmap_disable(vcpu-arch.gmap);
-   if (vcpu-kvm-arch.use_vectors) {
+   if (test_kvm_facility(vcpu-kvm, 129)) {
save_fp_ctl(vcpu-run-s.regs.fpc);
save_vx_regs((__vector128 *)vcpu-run-s.regs.vrs);
} else {
@@ -1216,7 +1219,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
}
save_access_regs(vcpu-run-s.regs.acrs);
restore_fp_ctl(vcpu-arch.host_fpregs.fpc);
-   if (vcpu-kvm-arch.use_vectors)
+   if (test_kvm_facility(vcpu-kvm, 129))
restore_vx_regs((__vector128 *)vcpu-arch.host_vregs-vrs);
else
restore_fp_regs(vcpu-arch.host_fpregs.fprs);
@@ -1316,7 +1319,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
vcpu-arch.sie_block-eca |= 1;
if (sclp_has_sigpif())
vcpu-arch.sie_block-eca |= 0x1000U;
-   if (vcpu-kvm-arch.use_vectors) {
+   if (test_kvm_facility(vcpu-kvm, 129)) {
vcpu-arch.sie_block-eca |= 0x0002;
vcpu-arch.sie_block-ecd |= 0x2000;
}
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 5d54191..c5aefef 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -149,6 +149,17 @@ static inline int test_kvm_facility(struct kvm *kvm, 
unsigned long nr)
__test_facility(nr, kvm-arch.model.fac-list);
 }
 
+static inline int set_kvm_facility(u64 *fac_list, unsigned long nr)
+{
+   unsigned char *ptr;
+
+   if (nr = MAX_FACILITY_BIT)
+   return -EINVAL;
+   ptr = (unsigned char *) fac_list + (nr  3);
+   *ptr |= (0x80UL  (nr  7));
+   return 0;
+}
+
 /* are cpu states controlled by user space */
 static inline int kvm_s390_user_cpu_state_ctrl(struct kvm *kvm)
 {
-- 
2.3.0

--
To unsubscribe 

[GIT PULL 03/11] KVM: s390: Fix low-address protection for real addresses

2015-03-18 Thread Christian Borntraeger
From: Alexander Yarygin yary...@linux.vnet.ibm.com

The kvm_s390_check_low_addr_protection() function is used only with real
addresses. According to the POP (the Low-Address Protection
paragraph in chapter 3), if the effective address is real or absolute,
the low-address protection procedure should raise a PROTECTION exception
only when the low-address protection is enabled in the control register
0 and the address is low.
This patch removes ASCE checks from the function and renames it to
better reflect its behavior.

Cc: Thomas Huth th...@linux.vnet.ibm.com
Signed-off-by: Alexander Yarygin yary...@linux.vnet.ibm.com
Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/kvm/gaccess.c | 11 ++-
 arch/s390/kvm/gaccess.h |  2 +-
 arch/s390/kvm/priv.c|  4 ++--
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 633fe9b..c230904 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -697,28 +697,29 @@ int guest_translate_address(struct kvm_vcpu *vcpu, 
unsigned long gva,
 }
 
 /**
- * kvm_s390_check_low_addr_protection - check for low-address protection
- * @ga: Guest address
+ * kvm_s390_check_low_addr_prot_real - check for low-address protection
+ * @gra: Guest real address
  *
  * Checks whether an address is subject to low-address protection and set
  * up vcpu-arch.pgm accordingly if necessary.
  *
  * Return: 0 if no protection exception, or PGM_PROTECTION if protected.
  */
-int kvm_s390_check_low_addr_protection(struct kvm_vcpu *vcpu, unsigned long ga)
+int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra)
 {
struct kvm_s390_pgm_info *pgm = vcpu-arch.pgm;
psw_t *psw = vcpu-arch.sie_block-gpsw;
struct trans_exc_code_bits *tec_bits;
+   union ctlreg0 ctlreg0 = {.val = vcpu-arch.sie_block-gcr[0]};
 
-   if (!is_low_address(ga) || !low_address_protection_enabled(vcpu))
+   if (!ctlreg0.lap || !is_low_address(gra))
return 0;
 
memset(pgm, 0, sizeof(*pgm));
tec_bits = (struct trans_exc_code_bits *)pgm-trans_exc_code;
tec_bits-fsi = FSI_STORE;
tec_bits-as = psw_bits(*psw).as;
-   tec_bits-addr = ga  PAGE_SHIFT;
+   tec_bits-addr = gra  PAGE_SHIFT;
pgm-code = PGM_PROTECTION;
 
return pgm-code;
diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
index 0149cf1..20de77e 100644
--- a/arch/s390/kvm/gaccess.h
+++ b/arch/s390/kvm/gaccess.h
@@ -330,6 +330,6 @@ int read_guest_real(struct kvm_vcpu *vcpu, unsigned long 
gra, void *data,
 void ipte_lock(struct kvm_vcpu *vcpu);
 void ipte_unlock(struct kvm_vcpu *vcpu);
 int ipte_lock_held(struct kvm_vcpu *vcpu);
-int kvm_s390_check_low_addr_protection(struct kvm_vcpu *vcpu, unsigned long 
ga);
+int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long 
gra);
 
 #endif /* __KVM_S390_GACCESS_H */
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index b982fbc..5f26425 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -207,7 +207,7 @@ static int handle_test_block(struct kvm_vcpu *vcpu)
kvm_s390_get_regs_rre(vcpu, NULL, reg2);
addr = vcpu-run-s.regs.gprs[reg2]  PAGE_MASK;
addr = kvm_s390_logical_to_effective(vcpu, addr);
-   if (kvm_s390_check_low_addr_protection(vcpu, addr))
+   if (kvm_s390_check_low_addr_prot_real(vcpu, addr))
return kvm_s390_inject_prog_irq(vcpu, vcpu-arch.pgm);
addr = kvm_s390_real_to_abs(vcpu, addr);
 
@@ -680,7 +680,7 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
}
 
if (vcpu-run-s.regs.gprs[reg1]  PFMF_CF) {
-   if (kvm_s390_check_low_addr_protection(vcpu, start))
+   if (kvm_s390_check_low_addr_prot_real(vcpu, start))
return kvm_s390_inject_prog_irq(vcpu, vcpu-arch.pgm);
}
 
-- 
2.3.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 10/11] KVM: s390: drop SIMD bit from kvm_s390_fac_list_mask

2015-03-18 Thread Christian Borntraeger
From: Michael Mueller m...@linux.vnet.ibm.com

Setting the SIMD bit in the KVM mask is an issue because it makes the
facility visible but not usable to the guest, thus it needs to be
removed again.

Signed-off-by: Michael Mueller m...@linux.vnet.ibm.com
Reviewed-by: Eric Farman far...@linux.vnet.ibm.com
Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/kvm/kvm-s390.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 0dc22ba..42b8a25 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -107,7 +107,6 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 unsigned long kvm_s390_fac_list_mask[] = {
0xff82fffbf4fc2000UL,
0x005cUL,
-   0x4000UL,
 };
 
 unsigned long kvm_s390_fac_list_mask_size(void)
-- 
2.3.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 02/11] KVM: s390: cleanup jump lables in kvm_arch_init_vm

2015-03-18 Thread Christian Borntraeger
From: Dominik Dingel din...@linux.vnet.ibm.com

As all cleanup functions can handle their respective NULL case
there is no need to have more than one error jump label.

Signed-off-by: Dominik Dingel din...@linux.vnet.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/kvm/kvm-s390.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 02e03c8..4075acb 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -897,7 +897,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 
kvm-arch.dbf = debug_register(debug_name, 8, 2, 8 * sizeof(long));
if (!kvm-arch.dbf)
-   goto out_nodbf;
+   goto out_err;
 
/*
 * The architectural maximum amount of facilities is 16 kbit. To store
@@ -909,7 +909,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm-arch.model.fac =
(struct kvm_s390_fac *) get_zeroed_page(GFP_KERNEL | GFP_DMA);
if (!kvm-arch.model.fac)
-   goto out_nofac;
+   goto out_err;
 
/* Populate the facility mask initially. */
memcpy(kvm-arch.model.fac-mask, S390_lowcore.stfle_fac_list,
@@ -929,7 +929,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
kvm-arch.model.ibc = sclp_get_ibc()  0x0fff;
 
if (kvm_s390_crypto_init(kvm)  0)
-   goto out_crypto;
+   goto out_err;
 
spin_lock_init(kvm-arch.float_int.lock);
INIT_LIST_HEAD(kvm-arch.float_int.list);
@@ -944,7 +944,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
} else {
kvm-arch.gmap = gmap_alloc(current-mm, (1UL  44) - 1);
if (!kvm-arch.gmap)
-   goto out_nogmap;
+   goto out_err;
kvm-arch.gmap-private = kvm;
kvm-arch.gmap-pfault_enabled = 0;
}
@@ -957,15 +957,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
spin_lock_init(kvm-arch.start_stop_lock);
 
return 0;
-out_nogmap:
+out_err:
kfree(kvm-arch.crypto.crycb);
-out_crypto:
free_page((unsigned long)kvm-arch.model.fac);
-out_nofac:
debug_unregister(kvm-arch.dbf);
-out_nodbf:
free_page((unsigned long)(kvm-arch.sca));
-out_err:
return rc;
 }
 
-- 
2.3.0

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL 08/11] KVM: s390: introduce post handlers for STSI

2015-03-18 Thread Christian Borntraeger
From: Ekaterina Tumanova tuman...@linux.vnet.ibm.com

The Store System Information (STSI) instruction currently collects all
information it relays to the caller in the kernel. Some information,
however, is only available in user space. An example of this is the
guest name: The kernel always sets KVMGuest, but user space knows the
actual guest name.

This patch introduces a new exit, KVM_EXIT_S390_STSI, guarded by a
capability that can be enabled by user space if it wants to be able to
insert such data. User space will be provided with the target buffer
and the requested STSI function code.

Reviewed-by: Eric Farman far...@linux.vnet.ibm.com
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Ekaterina Tumanova tuman...@linux.vnet.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 Documentation/virtual/kvm/api.txt | 28 
 arch/s390/include/asm/kvm_host.h  |  1 +
 arch/s390/kvm/kvm-s390.c  |  5 +
 arch/s390/kvm/priv.c  | 17 -
 include/uapi/linux/kvm.h  | 11 +++
 5 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 281179d..c1fcb7a 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3304,3 +3304,31 @@ Returns: 0 on success, negative value on error
 Allows use of the vector registers introduced with z13 processor, and
 provides for the synchronization between host and user space.  Will
 return -EINVAL if the machine does not support vectors.
+
+7.4 KVM_CAP_S390_USER_STSI
+
+Architectures: s390
+Parameters: none
+
+This capability allows post-handlers for the STSI instruction. After
+initial handling in the kernel, KVM exits to user space with
+KVM_EXIT_S390_STSI to allow user space to insert further data.
+
+Before exiting to userspace, kvm handlers should fill in s390_stsi field of
+vcpu-run:
+struct {
+   __u64 addr;
+   __u8 ar;
+   __u8 reserved;
+   __u8 fc;
+   __u8 sel1;
+   __u16 sel2;
+} s390_stsi;
+
+@addr - guest address of STSI SYSIB
+@fc   - function code
+@sel1 - selector 1
+@sel2 - selector 2
+@ar   - access register number
+
+KVM handlers should exit to userspace with rc = -EREMOTE.
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 347a333..2356a8c 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -565,6 +565,7 @@ struct kvm_arch{
int use_vectors;
int user_cpu_state_ctrl;
int user_sigp;
+   int user_stsi;
struct s390_io_adapter *adapters[MAX_S390_IO_ADAPTERS];
wait_queue_head_t ipte_wq;
int ipte_lock_count;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index b7ecef9..fdfa106 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -178,6 +178,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_VM_ATTRIBUTES:
case KVM_CAP_MP_STATE:
case KVM_CAP_S390_USER_SIGP:
+   case KVM_CAP_S390_USER_STSI:
r = 1;
break;
case KVM_CAP_S390_MEM_OP:
@@ -280,6 +281,10 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct 
kvm_enable_cap *cap)
kvm-arch.use_vectors = MACHINE_HAS_VX;
r = MACHINE_HAS_VX ? 0 : -EINVAL;
break;
+   case KVM_CAP_S390_USER_STSI:
+   kvm-arch.user_stsi = 1;
+   r = 0;
+   break;
default:
r = -EINVAL;
break;
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index f4fe02e..5e4658d 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -496,6 +496,17 @@ static void handle_stsi_3_2_2(struct kvm_vcpu *vcpu, 
struct sysinfo_3_2_2 *mem)
ASCEBC(mem-vm[0].cpi, 16);
 }
 
+static void insert_stsi_usr_data(struct kvm_vcpu *vcpu, u64 addr, ar_t ar,
+u8 fc, u8 sel1, u16 sel2)
+{
+   vcpu-run-exit_reason = KVM_EXIT_S390_STSI;
+   vcpu-run-s390_stsi.addr = addr;
+   vcpu-run-s390_stsi.ar = ar;
+   vcpu-run-s390_stsi.fc = fc;
+   vcpu-run-s390_stsi.sel1 = sel1;
+   vcpu-run-s390_stsi.sel2 = sel2;
+}
+
 static int handle_stsi(struct kvm_vcpu *vcpu)
 {
int fc = (vcpu-run-s.regs.gprs[0]  0xf000)  28;
@@ -556,11 +567,15 @@ static int handle_stsi(struct kvm_vcpu *vcpu)
rc = kvm_s390_inject_prog_cond(vcpu, rc);
goto out;
}
+   if (vcpu-kvm-arch.user_stsi) {
+   insert_stsi_usr_data(vcpu, operand2, ar, fc, sel1, sel2);
+   rc = -EREMOTE;
+   }
trace_kvm_s390_handle_stsi(vcpu, fc, sel1, sel2, operand2);
free_page(mem);
kvm_s390_set_psw_cc(vcpu, 0);
vcpu-run-s.regs.gprs[0] = 0;
-   return 0;
+   return rc;
 out_no_data:
kvm_s390_set_psw_cc(vcpu, 3);
 

[GIT PULL 05/11] KVM: s390: Optimize paths where get_vcpu_asce() is invoked

2015-03-18 Thread Christian Borntraeger
From: Alexander Yarygin yary...@linux.vnet.ibm.com

During dynamic address translation the get_vcpu_asce()
function can be invoked several times. It's ok for usual modes, but will
be slow if CPUs are in AR mode. Let's call the get_vcpu_asce() once and
pass the result to the called functions.

Signed-off-by: Alexander Yarygin yary...@linux.vnet.ibm.com
Reviewed-by: Thomas Huth th...@linux.vnet.ibm.com
Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com
Acked-by: Cornelia Huck cornelia.h...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com
---
 arch/s390/kvm/gaccess.c | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 494131e..c74462a 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -330,6 +330,7 @@ static int deref_table(struct kvm *kvm, unsigned long gpa, 
unsigned long *val)
  * @vcpu: virtual cpu
  * @gva: guest virtual address
  * @gpa: points to where guest physical (absolute) address should be stored
+ * @asce: effective asce
  * @write: indicates if access is a write access
  *
  * Translate a guest virtual address into a guest absolute address by means
@@ -345,7 +346,8 @@ static int deref_table(struct kvm *kvm, unsigned long gpa, 
unsigned long *val)
  *   by the architecture
  */
 static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva,
-unsigned long *gpa, int write)
+unsigned long *gpa, const union asce asce,
+int write)
 {
union vaddress vaddr = {.addr = gva};
union raddress raddr = {.addr = gva};
@@ -354,12 +356,10 @@ static unsigned long guest_translate(struct kvm_vcpu 
*vcpu, unsigned long gva,
union ctlreg0 ctlreg0;
unsigned long ptr;
int edat1, edat2;
-   union asce asce;
 
ctlreg0.val = vcpu-arch.sie_block-gcr[0];
edat1 = ctlreg0.edat  test_kvm_facility(vcpu-kvm, 8);
edat2 = edat1  test_kvm_facility(vcpu-kvm, 78);
-   asce.val = get_vcpu_asce(vcpu);
if (asce.r)
goto real_address;
ptr = asce.origin * 4096;
@@ -506,15 +506,14 @@ static inline int is_low_address(unsigned long ga)
return (ga  ~0x11fful) == 0;
 }
 
-static int low_address_protection_enabled(struct kvm_vcpu *vcpu)
+static int low_address_protection_enabled(struct kvm_vcpu *vcpu,
+ const union asce asce)
 {
union ctlreg0 ctlreg0 = {.val = vcpu-arch.sie_block-gcr[0]};
psw_t *psw = vcpu-arch.sie_block-gpsw;
-   union asce asce;
 
if (!ctlreg0.lap)
return 0;
-   asce.val = get_vcpu_asce(vcpu);
if (psw_bits(*psw).t  asce.p)
return 0;
return 1;
@@ -536,7 +535,7 @@ enum {
 
 static int guest_page_range(struct kvm_vcpu *vcpu, unsigned long ga,
unsigned long *pages, unsigned long nr_pages,
-   int write)
+   const union asce asce, int write)
 {
struct kvm_s390_pgm_info *pgm = vcpu-arch.pgm;
psw_t *psw = vcpu-arch.sie_block-gpsw;
@@ -547,7 +546,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned 
long ga,
tec_bits = (struct trans_exc_code_bits *)pgm-trans_exc_code;
tec_bits-fsi = write ? FSI_STORE : FSI_FETCH;
tec_bits-as = psw_bits(*psw).as;
-   lap_enabled = low_address_protection_enabled(vcpu);
+   lap_enabled = low_address_protection_enabled(vcpu, asce);
while (nr_pages) {
ga = kvm_s390_logical_to_effective(vcpu, ga);
tec_bits-addr = ga  PAGE_SHIFT;
@@ -557,7 +556,7 @@ static int guest_page_range(struct kvm_vcpu *vcpu, unsigned 
long ga,
}
ga = PAGE_MASK;
if (psw_bits(*psw).t) {
-   rc = guest_translate(vcpu, ga, pages, write);
+   rc = guest_translate(vcpu, ga, pages, asce, write);
if (rc  0)
return rc;
if (rc == PGM_PROTECTION)
@@ -604,7 +603,7 @@ int access_guest(struct kvm_vcpu *vcpu, unsigned long ga, 
ar_t ar, void *data,
need_ipte_lock = psw_bits(*psw).t  !asce.r;
if (need_ipte_lock)
ipte_lock(vcpu);
-   rc = guest_page_range(vcpu, ga, pages, nr_pages, write);
+   rc = guest_page_range(vcpu, ga, pages, nr_pages, asce, write);
for (idx = 0; idx  nr_pages  !rc; idx++) {
gpa = *(pages + idx) + (ga  ~PAGE_MASK);
_len = min(PAGE_SIZE - (gpa  ~PAGE_MASK), len);
@@ -671,16 +670,16 @@ int guest_translate_address(struct kvm_vcpu *vcpu, 
unsigned long gva, ar_t ar,
tec-as = psw_bits(*psw).as;
tec-fsi = write ? FSI_STORE : FSI_FETCH;
tec-addr = gva  PAGE_SHIFT;
-   if 

Re: [Qemu-devel] PCI passthrough of 40G ethernet interface (Openstack/KVM)

2015-03-18 Thread Bandan Das
[Ccing netdev and Stefan]
Bandan Das b...@redhat.com writes:

 jacob jacob opstk...@gmail.com writes:

 On Mon, Mar 16, 2015 at 2:12 PM, Bandan Das b...@redhat.com wrote:
 jacob jacob opstk...@gmail.com writes:

 I also see the following in dmesg in the VM.

 [0.095758] ACPI: PCI Root Bridge [PCI0] (domain  [bus 00-ff])
 [0.096006] acpi PNP0A03:00: ACPI _OSC support notification failed,
 disabling PCIe ASPM
 [0.096915] acpi PNP0A03:00: Unable to request _OSC control (_OSC
 support mask: 0x08)
 IIRC, For OSC control, after BIOS is done with (whatever initialization
 it needs to do), it clears a bit so that the OS can take over. This message,
 you are getting is a sign of a bug in the BIOS (usually). But I don't
 know if this is related to your problem. Does dmesg | grep -e DMAR -e 
 IOMMU
 give anything useful ?

 Do not see anything useful in the output..

 Ok, Thanks. Can you please post the output as well ?

 [0.097072] acpi PNP0A03:00: fail to add MMCONFIG information,
 can't access extended PCI configuration space under this bridge.

 Does this indicate any issue related to PCI passthrough?

 Would really appreciate any input on how to bebug this further.

 Did you get a chance to try a newer kernel ?
 Currently am using 3.18.7-200.fc21.x86_64 which is pretty recent.
 Are you suggesting trying the newer kernel just on the host? (or VM too?)
 Both preferably to 3.19. But it's just a wild guess. I saw i40e related fixes,
 particularly i40e: fix un-necessary Tx hangs in 3.19-rc5. This is not 
 exactly
 what you are seeing but I was still wondering if it could help.

Actually, Stefan suggests that support for this card is still sketchy
and your best bet is to try out net-next
http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git

Also, could you please post more information about your hardware setup
(chipset/processor/firmware version on the card etc) ?

Thanks,
Bandan

 Meanwhile, I am trying to get hold of a card myself to try and reproduce
 it at my end.

 Thanks,
 Bandan

 On Fri, Mar 13, 2015 at 10:08 AM, jacob jacob opstk...@gmail.com wrote:
 So, it could be the i40e driver then ? Because IIUC, VFs use a separate
 driver. Just to rule out the possibility that there might be some driver 
 fixes that
 could help with this, it might be a good idea to try a 3.19 or later 
 upstream
 kernel.


 I tried with the latest DPDK release too (dpdk-1.8.0) and see the same 
 issue.
 As mentioned earlier, i do not see any issues at all when running
 tests using either i40e or dpdk on the host itself.
 This is the reason why i am suspecting if it is anything to do with 
 KVM/libvirt.
 Both with regular PCI passthrough and VF passthrough i see issues. It
 is always pointing to some issue with packet transmission. Receive
 seems to work ok.


 On Thu, Mar 12, 2015 at 8:02 PM, Bandan Das b...@redhat.com wrote:
 jacob jacob opstk...@gmail.com writes:

 On Thu, Mar 12, 2015 at 3:07 PM, Bandan Das b...@redhat.com wrote:
 jacob jacob opstk...@gmail.com writes:

  Hi,

  Seeing failures when trying to do PCI passthrough of Intel XL710 40G
 interface to KVM vm.
  0a:00.1 Ethernet controller: Intel Corporation Ethernet
 Controller XL710 for 40GbE QSFP+ (rev 01)

 You are assigning the PF right ? Does assigning VFs work or it's
 the same behavior ?

 Yes.Assigning VFs worked ok.But this had other issues while bringing 
 down VMs.
 Interested in finding out if PCI passthrough of 40G intel XL710
 interface is qualified in some specific kernel/kvm release.

 So, it could be the i40e driver then ? Because IIUC, VFs use a separate
 driver. Just to rule out the possibility that there might be some driver 
 fixes that
 could help with this, it might be a good idea to try a 3.19 or later 
 upstream
 kernel.

 From dmesg on host:

 [80326.559674] kvm: zapping shadow pages for mmio generation 
 wraparound
 [80327.271191] kvm [175994]: vcpu0 unhandled rdmsr: 0x1c9
 [80327.271689] kvm [175994]: vcpu0 unhandled rdmsr: 0x1a6
 [80327.272201] kvm [175994]: vcpu0 unhandled rdmsr: 0x1a7
 [80327.272681] kvm [175994]: vcpu0 unhandled rdmsr: 0x3f6
 [80327.376186] kvm [175994]: vcpu0 unhandled rdmsr: 0x606

 These are harmless and are related to unimplemented PMU msrs,
 not VFIO.

 Bandan
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GSoC] project proposal

2015-03-18 Thread Catalin Vasile
Hi,

My name is Catalin Vasile and I want to participate with a project for
qemu at GSoC.
From what I understand from the rules, I can participate with things I
could also use for my college projects.
This is my last bachelor year and I'm doing my diploma project, which
is related to virtualization, more specific qemu-kvm.
I'm trying to do a paravirtualized device using virtio and vhost.
I've already done some work.
To be more exact, I want to make a virtio-crypto device to emulate a
virtual cryptographic offloading device that will send jobs from the
guest to a vhost that will process the jobs. This mechanism will link
CryptoAPI from the guest to the CryptoAPI from the host. This way,
whatever it's beneath CryptoAPI from the host will be used as
offloading for the guest.
Is there a mentor interested in getting involved in this kind of project?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC] project proposal

2015-03-18 Thread Paolo Bonzini


On 18/03/2015 17:01, Catalin Vasile wrote:
 To be more exact, I want to make a virtio-crypto device to emulate a
 virtual cryptographic offloading device that will send jobs from the
 guest to a vhost that will process the jobs. This mechanism will link
 CryptoAPI from the guest to the CryptoAPI from the host. This way,
 whatever it's beneath CryptoAPI from the host will be used as
 offloading for the guest.
 Is there a mentor interested in getting involved in this kind of project?

I think it's very likely that you'll find a mentor.  Please submit a
proposal, also detailing the advantage of vhost over a userspace
solution (using any of gnutls, AF_ALG, cryptodev).

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] PCI passthrough of 40G ethernet interface (Openstack/KVM)

2015-03-18 Thread jacob jacob
On Wed, Mar 18, 2015 at 11:24 AM, Bandan Das b...@redhat.com wrote:
 [Ccing netdev and Stefan]
 Bandan Das b...@redhat.com writes:

 jacob jacob opstk...@gmail.com writes:

 On Mon, Mar 16, 2015 at 2:12 PM, Bandan Das b...@redhat.com wrote:
 jacob jacob opstk...@gmail.com writes:

 I also see the following in dmesg in the VM.

 [0.095758] ACPI: PCI Root Bridge [PCI0] (domain  [bus 00-ff])
 [0.096006] acpi PNP0A03:00: ACPI _OSC support notification failed,
 disabling PCIe ASPM
 [0.096915] acpi PNP0A03:00: Unable to request _OSC control (_OSC
 support mask: 0x08)
 IIRC, For OSC control, after BIOS is done with (whatever initialization
 it needs to do), it clears a bit so that the OS can take over. This 
 message,
 you are getting is a sign of a bug in the BIOS (usually). But I don't
 know if this is related to your problem. Does dmesg | grep -e DMAR -e 
 IOMMU
 give anything useful ?

 Do not see anything useful in the output..

 Ok, Thanks. Can you please post the output as well ?

 [0.097072] acpi PNP0A03:00: fail to add MMCONFIG information,
 can't access extended PCI configuration space under this bridge.

 Does this indicate any issue related to PCI passthrough?

 Would really appreciate any input on how to bebug this further.

 Did you get a chance to try a newer kernel ?
 Currently am using 3.18.7-200.fc21.x86_64 which is pretty recent.
 Are you suggesting trying the newer kernel just on the host? (or VM too?)
 Both preferably to 3.19. But it's just a wild guess. I saw i40e related 
 fixes,
 particularly i40e: fix un-necessary Tx hangs in 3.19-rc5. This is not 
 exactly
 what you are seeing but I was still wondering if it could help.

 Actually, Stefan suggests that support for this card is still sketchy
 and your best bet is to try out net-next
 http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git

 Also, could you please post more information about your hardware setup
 (chipset/processor/firmware version on the card etc) ?

Host CPU : Model name:Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz

Manufacturer Part Number:  XL710QDA1BLK
Ethernet controller: Intel Corporation Ethernet Controller XL710 for
40GbE QSFP+ (rev 01)
 #ethtool -i enp9s0
driver: i40e
version: 1.2.6-k
firmware-version: f4.22 a1.1 n04.24 e800013fd
bus-info: :09:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no


 Thanks,
 Bandan

 Meanwhile, I am trying to get hold of a card myself to try and reproduce
 it at my end.

 Thanks,
 Bandan

 On Fri, Mar 13, 2015 at 10:08 AM, jacob jacob opstk...@gmail.com wrote:
 So, it could be the i40e driver then ? Because IIUC, VFs use a separate
 driver. Just to rule out the possibility that there might be some 
 driver fixes that
 could help with this, it might be a good idea to try a 3.19 or later 
 upstream
 kernel.


 I tried with the latest DPDK release too (dpdk-1.8.0) and see the same 
 issue.
 As mentioned earlier, i do not see any issues at all when running
 tests using either i40e or dpdk on the host itself.
 This is the reason why i am suspecting if it is anything to do with 
 KVM/libvirt.
 Both with regular PCI passthrough and VF passthrough i see issues. It
 is always pointing to some issue with packet transmission. Receive
 seems to work ok.


 On Thu, Mar 12, 2015 at 8:02 PM, Bandan Das b...@redhat.com wrote:
 jacob jacob opstk...@gmail.com writes:

 On Thu, Mar 12, 2015 at 3:07 PM, Bandan Das b...@redhat.com wrote:
 jacob jacob opstk...@gmail.com writes:

  Hi,

  Seeing failures when trying to do PCI passthrough of Intel XL710 40G
 interface to KVM vm.
  0a:00.1 Ethernet controller: Intel Corporation Ethernet
 Controller XL710 for 40GbE QSFP+ (rev 01)

 You are assigning the PF right ? Does assigning VFs work or it's
 the same behavior ?

 Yes.Assigning VFs worked ok.But this had other issues while bringing 
 down VMs.
 Interested in finding out if PCI passthrough of 40G intel XL710
 interface is qualified in some specific kernel/kvm release.

 So, it could be the i40e driver then ? Because IIUC, VFs use a separate
 driver. Just to rule out the possibility that there might be some 
 driver fixes that
 could help with this, it might be a good idea to try a 3.19 or later 
 upstream
 kernel.

 From dmesg on host:

 [80326.559674] kvm: zapping shadow pages for mmio generation 
 wraparound
 [80327.271191] kvm [175994]: vcpu0 unhandled rdmsr: 0x1c9
 [80327.271689] kvm [175994]: vcpu0 unhandled rdmsr: 0x1a6
 [80327.272201] kvm [175994]: vcpu0 unhandled rdmsr: 0x1a7
 [80327.272681] kvm [175994]: vcpu0 unhandled rdmsr: 0x3f6
 [80327.376186] kvm [175994]: vcpu0 unhandled rdmsr: 0x606

 These are harmless and are related to unimplemented PMU msrs,
 not VFIO.

 Bandan
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe 

[PATCH] vfio: Split virqfd into a separate module for vfio bus drivers

2015-03-18 Thread Alex Williamson
An unintended consequence of commit 42ac9bd18d4f (vfio: initialize
the virqfd workqueue in VFIO generic code) is that the vfio module
is renamed to vfio_core so that it can include both vfio and virqfd.
That's a user visible change that may break module loading scritps
and it imposes eventfd support as a dependency on the core vfio code,
which it's really not.  virqfd is intended to be provided as a service
to vfio bus drivers, so instead of wrapping it into vfio.ko, we can
make it a stand-alone module toggled by vfio bus drivers.  This has
the additional benefit of removing initialization and exit from the
core vfio code.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

Posted in reply to [PATCH v14 19/20] vfio: initialize the virqfd
workqueue in VFIO generic code, reposting as an official proposal.
I removed the Kconfig select from AMBA support.  AMBA depends on
PLATFORM, which does the select and therefore seems sufficient.
Commit log and vfio_virqfd driver description also reworded.

 drivers/vfio/Kconfig  |5 +
 drivers/vfio/Makefile |5 +++--
 drivers/vfio/pci/Kconfig  |1 +
 drivers/vfio/platform/Kconfig |1 +
 drivers/vfio/vfio.c   |8 
 drivers/vfio/virqfd.c |   17 +++--
 include/linux/vfio.h  |2 --
 7 files changed, 25 insertions(+), 14 deletions(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index d5322a4..7d092dd 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -13,6 +13,11 @@ config VFIO_SPAPR_EEH
depends on EEH  VFIO_IOMMU_SPAPR_TCE
default n
 
+config VFIO_VIRQFD
+   tristate
+   depends on VFIO  EVENTFD
+   default n
+
 menuconfig VFIO
tristate VFIO Non-Privileged userspace driver framework
depends on IOMMU_API
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index d798b09..7b8a31f 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -1,6 +1,7 @@
-vfio_core-y := vfio.o virqfd.o
+vfio_virqfd-y := virqfd.o
 
-obj-$(CONFIG_VFIO) += vfio_core.o
+obj-$(CONFIG_VFIO) += vfio.o
+obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
 obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
 obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
 obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index c6bb5da..579d83b 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -1,6 +1,7 @@
 config VFIO_PCI
tristate VFIO support for PCI devices
depends on VFIO  PCI  EVENTFD
+   select VFIO_VIRQFD
help
  Support for the PCI VFIO bus driver.  This is required to make
  use of PCI drivers using the VFIO framework.
diff --git a/drivers/vfio/platform/Kconfig b/drivers/vfio/platform/Kconfig
index c0a3bff..9a4403e 100644
--- a/drivers/vfio/platform/Kconfig
+++ b/drivers/vfio/platform/Kconfig
@@ -1,6 +1,7 @@
 config VFIO_PLATFORM
tristate VFIO support for platform devices
depends on VFIO  EVENTFD  ARM
+   select VFIO_VIRQFD
help
  Support for platform devices with VFIO. This is required to make
  use of platform devices present on the system using the VFIO
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 86aac7e..0d33662 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1552,11 +1552,6 @@ static int __init vfio_init(void)
if (ret)
goto err_cdev_add;
 
-   /* Start the virqfd cleanup handler used by some VFIO bus drivers */
-   ret = vfio_virqfd_init();
-   if (ret)
-   goto err_virqfd;
-
pr_info(DRIVER_DESC  version:  DRIVER_VERSION \n);
 
/*
@@ -1569,8 +1564,6 @@ static int __init vfio_init(void)
 
return 0;
 
-err_virqfd:
-   cdev_del(vfio.group_cdev);
 err_cdev_add:
unregister_chrdev_region(vfio.group_devt, MINORMASK);
 err_alloc_chrdev:
@@ -1585,7 +1578,6 @@ static void __exit vfio_cleanup(void)
 {
WARN_ON(!list_empty(vfio.group_list));
 
-   vfio_virqfd_exit();
idr_destroy(vfio.group_idr);
cdev_del(vfio.group_cdev);
unregister_chrdev_region(vfio.group_devt, MINORMASK);
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index 3d19aaf..27c89cd 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -13,12 +13,17 @@
 #include linux/vfio.h
 #include linux/eventfd.h
 #include linux/file.h
+#include linux/module.h
 #include linux/slab.h
 
+#define DRIVER_VERSION  0.1
+#define DRIVER_AUTHOR   Alex Williamson alex.william...@redhat.com
+#define DRIVER_DESC IRQFD support for VFIO bus drivers
+
 static struct workqueue_struct *vfio_irqfd_cleanup_wq;
 static DEFINE_SPINLOCK(virqfd_lock);
 
-int __init vfio_virqfd_init(void)
+static int __init vfio_virqfd_init(void)
 {
vfio_irqfd_cleanup_wq =
create_singlethread_workqueue(vfio-irqfd-cleanup);
@@ -28,7 +33,7 @@ int __init 

Re: [PATCH 0/9] qspinlock stuff -v15

2015-03-18 Thread Waiman Long

On 03/16/2015 09:16 AM, Peter Zijlstra wrote:

Hi Waiman,

As promised; here is the paravirt stuff I did during the trip to BOS last week.

All the !paravirt patches are more or less the same as before (the only real
change is the copyright lines in the first patch).

The paravirt stuff is 'simple' and KVM only -- the Xen code was a little more
convoluted and I've no real way to test that but it should be stright fwd to
make work.

I ran this using the virtme tool (thanks Andy) on my laptop with a 4x
overcommit on vcpus (16 vcpus as compared to the 4 my laptop actually has) and
it both booted and survived a hackbench run (perf bench sched messaging -g 20
-l 5000).

So while the paravirt code isn't the most optimal code ever conceived it does 
work.

Also, the paravirt patching includes replacing the call with movb $0, %arg1
for the native case, which should greatly reduce the cost of having
CONFIG_PARAVIRT_SPINLOCKS enabled on actual hardware.

I feel that if someone were to do a Xen patch we can go ahead and merge this
stuff (finally!).

These patches do not implement the paravirt spinlock debug stats currently
implemented (separately) by KVM and Xen, but that should not be too hard to do
on top and in the 'generic' code -- no reason to duplicate all that.

Of course; once this lands people can look at improving the paravirt nonsense.



Thanks for sending this out. I have no problem with the !paravirt patch. 
I do have some comments on the paravirt one which I will reply individually.


Cheers,
Longman
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: call irq notifiers with directed EOI

2015-03-18 Thread Bandan Das
Radim Krčmář rkrc...@redhat.com writes:

 kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled.
 We need to do that for irq notifiers.  (Like with edge interrupts.)

 Fix it by skipping EOI broadcast only.

 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211
 Signed-off-by: Radim Krčmář rkrc...@redhat.com
 ---
  arch/x86/kvm/ioapic.c | 4 +++-
  arch/x86/kvm/lapic.c  | 3 +--
  2 files changed, 4 insertions(+), 3 deletions(-)

 diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
 index b1947e0f3e10..46d4449772bc 100644
 --- a/arch/x86/kvm/ioapic.c
 +++ b/arch/x86/kvm/ioapic.c
 @@ -422,6 +422,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
   struct kvm_ioapic *ioapic, int vector, int trigger_mode)
  {
   int i;
 + struct kvm_lapic *apic = vcpu-arch.apic;
  
   for (i = 0; i  IOAPIC_NUM_PINS; i++) {
   union kvm_ioapic_redirect_entry *ent = ioapic-redirtbl[i];
 @@ -443,7 +444,8 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
   kvm_notify_acked_irq(ioapic-kvm, KVM_IRQCHIP_IOAPIC, i);
   spin_lock(ioapic-lock);
  
 - if (trigger_mode != IOAPIC_LEVEL_TRIG)
 + if (trigger_mode != IOAPIC_LEVEL_TRIG ||
 + kvm_apic_get_reg(apic, APIC_SPIV)  APIC_SPIV_DIRECTED_EOI)
   continue;
  
   ASSERT(ent-fields.trig_mode == IOAPIC_LEVEL_TRIG);
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index bd4e34de24c7..4ee827d7bf36 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -833,8 +833,7 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct 
 kvm_vcpu *vcpu2)
  
  static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
  {
 - if (!(kvm_apic_get_reg(apic, APIC_SPIV)  APIC_SPIV_DIRECTED_EOI) 
 - kvm_ioapic_handles_vector(apic-vcpu-kvm, vector)) {
 + if (kvm_ioapic_handles_vector(apic-vcpu-kvm, vector)) {
   int trigger_mode;
   if (apic_test_vector(vector, apic-regs + APIC_TMR))
   trigger_mode = IOAPIC_LEVEL_TRIG;

Works on my Xen 4.4 L1 setup with Intel E5 v2 host. Without this patch,
L1 panics as reported in the bug referenced above.

Tested-by: Bandan Dasb...@redhat.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: x86: call irq notifiers with directed EOI

2015-03-18 Thread Radim Krčmář
kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled.
We need to do that for irq notifiers.  (Like with edge interrupts.)

Fix it by skipping EOI broadcast only.

Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211
Signed-off-by: Radim Krčmář rkrc...@redhat.com
---
 arch/x86/kvm/ioapic.c | 4 +++-
 arch/x86/kvm/lapic.c  | 3 +--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index b1947e0f3e10..46d4449772bc 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -422,6 +422,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
struct kvm_ioapic *ioapic, int vector, int trigger_mode)
 {
int i;
+   struct kvm_lapic *apic = vcpu-arch.apic;
 
for (i = 0; i  IOAPIC_NUM_PINS; i++) {
union kvm_ioapic_redirect_entry *ent = ioapic-redirtbl[i];
@@ -443,7 +444,8 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
kvm_notify_acked_irq(ioapic-kvm, KVM_IRQCHIP_IOAPIC, i);
spin_lock(ioapic-lock);
 
-   if (trigger_mode != IOAPIC_LEVEL_TRIG)
+   if (trigger_mode != IOAPIC_LEVEL_TRIG ||
+   kvm_apic_get_reg(apic, APIC_SPIV)  APIC_SPIV_DIRECTED_EOI)
continue;
 
ASSERT(ent-fields.trig_mode == IOAPIC_LEVEL_TRIG);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index bd4e34de24c7..4ee827d7bf36 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -833,8 +833,7 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct 
kvm_vcpu *vcpu2)
 
 static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
 {
-   if (!(kvm_apic_get_reg(apic, APIC_SPIV)  APIC_SPIV_DIRECTED_EOI) 
-   kvm_ioapic_handles_vector(apic-vcpu-kvm, vector)) {
+   if (kvm_ioapic_handles_vector(apic-vcpu-kvm, vector)) {
int trigger_mode;
if (apic_test_vector(vector, apic-regs + APIC_TMR))
trigger_mode = IOAPIC_LEVEL_TRIG;
-- 
2.3.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 82211] Cannot boot Xen under KVM with X2APIC enabled

2015-03-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=82211

--- Comment #11 from Radim Krčmář rkrc...@redhat.com ---
Should be fixed with KVM: x86: call irq notifiers with directed EOI,
(http://www.spinics.net/lists/kernel/msg1949367.html)

can you check if it is?

Thanks.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch v5] x86: irq_comm: Add check for RH bit in kvm_set_msi_irq

2015-03-18 Thread Marcelo Tosatti
On Fri, Mar 13, 2015 at 09:14:35AM -0600, James Sullivan wrote:
 This patch adds a check for RH=1 in kvm_set_msi_irq. Currently the
 DM bit is the only thing used to decide irq-dest_mode (logical when DM
 set, physical when unset). Documentation indicates that the DM bit will
 be 'ignored' when the RH bit is unset, and physical destination mode is
 used in this case.
 
 Fixed this to set irq-dest_mode to APIC_DEST_LOGICAL just in case both
 RH=1/DM=1.
 
 This patch doesn't completely handle RH=1; if RH=1 then the delivery will 
 behave
 as in low priority mode (deliver the interrupt to only the lowest priority 
 processor),
 but the delivery mode may still used to specify the semantics of the delivery 
 beyond
 its destination.
 
 I will be trying and comparing a few options to handle this fully (extension 
 of
 struct kvm_lapic_irq, introduction of MSI specific delivery functions or 
 helpers,
 etc) and hope to have some patches to show in the near future.
 
 
 Signed-off-by: James Sullivan sullivan.jame...@gmail.com

The documentation states the following:

* When RH is 0, the interrupt is directed to the processor listed in the
Destination ID field.

* If RH is 0, then the DM bit is ignored and the message is sent ahead
independent of whether the physical or logical destination mode is used.

However, from the POV of a device writing to memory to generate an MSI 
interrupt, there is no (or i can't see any) other information that 
can be used to infer logical or physical mode for the interrupt message.

Before your patch:

(dm, rh) = (0, 0) = irq-dest_mode = 0
(dm, rh) = (0, 1) = irq-dest_mode = 0
(dm, rh) = (1, 0) = irq-dest_mode = 1
(dm, rh) = (1, 1) = irq-dest_mode = 1

After your patch:

(dm, rh) = (0, 0) = irq-dest_mode = 0
(dm, rh) = (0, 1) = irq-dest_mode = 0
(dm, rh) = (1, 0) = irq-dest_mode = 0
(dm, rh) = (1, 1) = irq-dest_mode = 1


Am i missing some explicit documentation that refers 
to (dm, rh) = (1, 0) = irq-dest_mode = 0 ?

See native_compose_msi_msg:

msg-address_lo =
MSI_ADDR_BASE_LO |
((apic-irq_dest_mode == 0) ?
MSI_ADDR_DEST_MODE_PHYSICAL :
MSI_ADDR_DEST_MODE_LOGICAL) |
((apic-irq_delivery_mode != dest_LowestPrio) ?
MSI_ADDR_REDIRECTION_CPU :
MSI_ADDR_REDIRECTION_LOWPRI) |
MSI_ADDR_DEST_ID(dest);


So it does configure DM = MSI_ADDR_DEST_MODE_LOGICAL
and RH = MSI_ADDR_REDIRECTION_LOWPRI.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: XP machine freeze

2015-03-18 Thread Marcelo Tosatti
On Mon, Mar 16, 2015 at 04:10:40PM +0100, Saso Slavicic wrote:
 Hi,
 
 I'm fairly experienced with KVM (Centos 5/6), running about a dozen servers
 with 20-30 different (Linux  MS platform) systems.
 I have one Windows XP machine that acts very strangely - it freezes. I get
 ping timeout for the VM from my monitoring and the machine spins 2 or 3
 cores using all the cpu. Now the interesting thing that happens is that once
 you open the console, it suddenly starts working again. You can see the
 clock catching up as it was frozen in time and everything works normally
 once the timer catches up. It usually happens probably about once a month,
 although it happened yesterday and today again.
 
 This machine is on Centos 6, qemu-kvm-0.12.1.2-2.448.el6_6, kernel
 2.6.32-504.3.3.el6.x86_64.
 I was able to do some debugging when the machine was frozen, so I got some
 things to work with:
 
 # virsh qemu-monitor-command --hmp DBserver 'info cpus'
 * CPU #0: pc=0x80501fdd thread_id=32595
   CPU #1: pc=0x806e7a9b thread_id=32596
   CPU #2: pc=0xba2da162 (halted) thread_id=32597
   CPU #3: pc=0xba2da162 (halted) thread_id=32598
 
 Now, in both yesterday's and today's event the CPU0 was stopped at
 0x80501fdd. I've disassembled the function and got this:
 
  0x80501fb5:  int3
  0x80501fb6:  mov%edi,%edi
  0x80501fb8:  push   %ebp
  0x80501fb9:  mov%esp,%ebp
  0x80501fbb:  push   %esi
  0x80501fbc:  mov%fs:0x20,%eax
  0x80501fc2:  mov0x8(%ebp),%ecx
  0x80501fc5:  lea-0x1(%ecx),%esi
  0x80501fc8:  test   %esi,%ecx
  0x80501fca:  lea0x7ec(%eax),%edx
  0x80501fd0:  pop%esi
  0x80501fd1:  je 0x80501fdd
  0x80501fd3:  lea0x7a0(%eax),%edx
  0x80501fd9:  jmp0x80501fdd
  *0x80501fdb:  pause
  0x80501fdd:  cmpl   $0x0,(%edx)
  0x80501fe0:  jne0x80501fdb
  0x80501fe2:  pop%ebp
  0x80501fe3:  ret$0x4
  0x80501fe6:  int3
 
 Mov %edi,%edi is clearly the start of some function. From what I've been
 able to understand, the code fetches _KPRCB structure (%fs:0x20) and then
 does a spinlock between fdb and fe0 checking for PacketBarrier (?) in EDX
 (0xffdff8c0). Now, $pc always shows fdd address, shouldn't it jump between
 fdb and fe0, it seems as if it was stuck at fdd?
 
 # virsh qemu-monitor-command --hmp DBserver 'info registers'
  EAX=ffdff120 EBX=c06ddf58 ECX=000e EDX=ffdff8c0
  ESI=be6e3921 EDI=c06ddf60 EBP=ba4ff708 ESP=ba4ff708
  EIP=80501fdd EFL=0202 [---] CPL=0 II=0 A20=1 SMM=0 HLT=0
  ES =0023   00c0f300 DPL=3 DS   [-WA]
  CS =0008   00c09b00 DPL=0 CS32 [-RA]
  SS =0010   00c09300 DPL=0 DS   [-WA]
  DS =0023   00c0f300 DPL=3 DS   [-WA]
  FS =0030 ffdff000 1fff 00c09300 DPL=0 DS   [-WA]
  GS =  000f 
  LDT=  000f 
  TR =0028 80042000 20ab 8b00 DPL=0 TSS32-busy
  GDT= 8003f000 03ff
  IDT= 8003f400 07ff
  CR0=8001003b CR2=dbbec000 CR3=0b3c0020 CR4=06f8
  DR0= DR1= DR2= DR3=
  DR6=0ff0 DR7=0400
  FCW=027f FSW=0020 [ST=0] FTW=00 MXCSR=1fa0
  FPR0=8053632b003c1658 c048 FPR1=e1e0c048bf80f6ab 76f8
  FPR2=e1e0 0023 FPR3=0b017c30003c1658 
  FPR4=003bba1a7604 1e64 FPR5=0007268c 003b
  FPR6=0202001b 2684 FPR7=e3e0a9b4e1b50de4 ca0b
  XMM00=00a1fc950020027f
 XMM01=1fa01c4c0001
  XMM02=c0488053632b003c1658
 XMM03=76f8e1e0c048bf80f6ab
  XMM04=0023e1e0
 XMM05=0b017c30003c1658
  XMM06=1e64003bba1a7604
 XMM07=003b0007268c
 
 Clearly, the address in EDX is not 0:
 
 [root@linux ~]# virsh qemu-monitor-command --hmp DBserver 'x/1xb 0xFFDFF8C0'
 ffdff8c0: 0x0e
 
 [root@linux ~]# virt-manager
 
 [root@linux ~]# virsh qemu-monitor-command --hmp DBserver 'x/1xb 0xFFDFF8C0'
 ffdff8c0: 0x00
 
 However as soon as the VM console is opened and machine starts, the address
 in EDX is set to 0 and the loop is broken.
 Does anybody recognize what function that is? What could possibly happen
 that opening the console and moving the mouse a little, unfreezes the
 machine?
 VM has .81 virtio drivers from Fedora repo at the moment.

Generate a Windows dump? 

https://support.microsoft.com/en-us/kb/254649

https://support.microsoft.com/en-us/kb/972110
Step 7: Generate a complete crash dump file or a kernel crash dump file
by using an NMI on a Windows-based system

(you can inject NMIs via QEMU monitor).

 
 The configuration of the machine is pretty standard:
 
 !--
 WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
 OVERWRITTEN AND LOST. Changes to this xml configuration should be made
 using:
   virsh edit 

Re: [Patch v5] x86: irq_comm: Add check for RH bit in kvm_set_msi_irq

2015-03-18 Thread Marcelo Tosatti
On Wed, Mar 18, 2015 at 06:59:22PM -0600, James Sullivan wrote:
  
  The documentation states the following:
  
  * When RH is 0, the interrupt is directed to the processor listed in the
  Destination ID field.
  
  * If RH is 0, then the DM bit is ignored and the message is sent ahead
  independent of whether the physical or logical destination mode is used.
  
  However, from the POV of a device writing to memory to generate an MSI 
  interrupt, there is no (or i can't see any) other information that 
  can be used to infer logical or physical mode for the interrupt message.
  
  Before your patch:
  
  (dm, rh) = (0, 0) = irq-dest_mode = 0
  (dm, rh) = (0, 1) = irq-dest_mode = 0
  (dm, rh) = (1, 0) = irq-dest_mode = 1
  (dm, rh) = (1, 1) = irq-dest_mode = 1
  
  After your patch:
  
  (dm, rh) = (0, 0) = irq-dest_mode = 0
  (dm, rh) = (0, 1) = irq-dest_mode = 0
  (dm, rh) = (1, 0) = irq-dest_mode = 0
  (dm, rh) = (1, 1) = irq-dest_mode = 1
  
  
  Am i missing some explicit documentation that refers 
  to (dm, rh) = (1, 0) = irq-dest_mode = 0 ?
 
 From the IA32 manual (Vol. 3, 10.11.2):
 
  * When RH is 0, the interrupt is directed to the processor listed
in the Destination ID field.
  * When RH is 1 and the physical destination mode is used, the Destination
ID field must not be set to FFH; it must point to a processor that is
present and enabled to receive the interrupt.
  * When RH is 1 and the logical destination mode is active in a system using
a flat addressing model, the Destination ID field must be set so that bits
set to 1 identify processors that are present and enabled to receive the
interrupt.
  * If RH is set to 1 and the logical destination mode is active in a system
using cluster addressing model, then Destination ID field must not be
set to FFH; the processors identified with this field must be present
and enabled to receive the interrupt.
 
 My interpretation of this is that RH=0 indicates that the Dest. ID field
 contains an APIC ID, and as such destination mode is physical. When RH=1,
 depending on the value of DM, we either use physical or logical dest mode.
 The result of this is that logical dest mode is set just when RH=1/DM=1,
 as far as I understand.
 
  
  See native_compose_msi_msg:
  
  msg-address_lo =
  MSI_ADDR_BASE_LO |
  ((apic-irq_dest_mode == 0) ?
  MSI_ADDR_DEST_MODE_PHYSICAL :
  MSI_ADDR_DEST_MODE_LOGICAL) |
  ((apic-irq_delivery_mode != dest_LowestPrio) ?
  MSI_ADDR_REDIRECTION_CPU :
  MSI_ADDR_REDIRECTION_LOWPRI) |
  MSI_ADDR_DEST_ID(dest);
  
  
  So it does configure DM = MSI_ADDR_DEST_MODE_LOGICAL
  and RH = MSI_ADDR_REDIRECTION_LOWPRI.
  
 
 ...and yet this is a good counterexample against my argument :)
 
 What I think I'll do is revert this particular change so that dest_mode is
 set independently of RH. While I'm not entirely convinced that this is the
 intended interpretation, I do think that consistency with the existing logic
 is probably desirable for the time being. If I can get closure on the matter
 I'll re-submit that change, but for the time being I will undo it.
 
 -James

Just write MSI-X table entries on real hardware (say: modify
native_compose_msi_msg or MSI-X equivalent), with all RH/DM
combinations, and see what behaviour is
comes up?  

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch v5] x86: irq_comm: Add check for RH bit in kvm_set_msi_irq

2015-03-18 Thread Marcelo Tosatti
On Wed, Mar 18, 2015 at 06:59:22PM -0600, James Sullivan wrote:
  
  The documentation states the following:
  
  * When RH is 0, the interrupt is directed to the processor listed in the
  Destination ID field.
  
  * If RH is 0, then the DM bit is ignored and the message is sent ahead
  independent of whether the physical or logical destination mode is used.
  
  However, from the POV of a device writing to memory to generate an MSI 
  interrupt, there is no (or i can't see any) other information that 
  can be used to infer logical or physical mode for the interrupt message.
  
  Before your patch:
  
  (dm, rh) = (0, 0) = irq-dest_mode = 0
  (dm, rh) = (0, 1) = irq-dest_mode = 0
  (dm, rh) = (1, 0) = irq-dest_mode = 1
  (dm, rh) = (1, 1) = irq-dest_mode = 1
  
  After your patch:
  
  (dm, rh) = (0, 0) = irq-dest_mode = 0
  (dm, rh) = (0, 1) = irq-dest_mode = 0
  (dm, rh) = (1, 0) = irq-dest_mode = 0
  (dm, rh) = (1, 1) = irq-dest_mode = 1
  
  
  Am i missing some explicit documentation that refers 
  to (dm, rh) = (1, 0) = irq-dest_mode = 0 ?
 
 From the IA32 manual (Vol. 3, 10.11.2):
 
  * When RH is 0, the interrupt is directed to the processor listed
in the Destination ID field.
  * When RH is 1 and the physical destination mode is used, the Destination
ID field must not be set to FFH; it must point to a processor that is
present and enabled to receive the interrupt.
  * When RH is 1 and the logical destination mode is active in a system using
a flat addressing model, the Destination ID field must be set so that bits
set to 1 identify processors that are present and enabled to receive the
interrupt.
  * If RH is set to 1 and the logical destination mode is active in a system
using cluster addressing model, then Destination ID field must not be
set to FFH; the processors identified with this field must be present
and enabled to receive the interrupt.
 
 My interpretation of this is that RH=0 indicates that the Dest. ID field
 contains an APIC ID, and as such destination mode is physical. When RH=1,
 depending on the value of DM, we either use physical or logical dest mode.
 The result of this is that logical dest mode is set just when RH=1/DM=1,
 as far as I understand.
 
  
  See native_compose_msi_msg:
  
  msg-address_lo =
  MSI_ADDR_BASE_LO |
  ((apic-irq_dest_mode == 0) ?
  MSI_ADDR_DEST_MODE_PHYSICAL :
  MSI_ADDR_DEST_MODE_LOGICAL) |
  ((apic-irq_delivery_mode != dest_LowestPrio) ?
  MSI_ADDR_REDIRECTION_CPU :
  MSI_ADDR_REDIRECTION_LOWPRI) |
  MSI_ADDR_DEST_ID(dest);
  
  
  So it does configure DM = MSI_ADDR_DEST_MODE_LOGICAL
  and RH = MSI_ADDR_REDIRECTION_LOWPRI.
  
 
 ...and yet this is a good counterexample against my argument :)
 
 What I think I'll do is revert this particular change so that dest_mode is
 set independently of RH. While I'm not entirely convinced that this is the
 intended interpretation, 

Where would the logical/physical information come from, if not from the
DM bit ?

 I do think that consistency with the existing logic
 is probably desirable for the time being. If I can get closure on the matter
 I'll re-submit that change, but for the time being I will undo it.
 
 -James


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()

2015-03-18 Thread Marcelo Tosatti
On Tue, Mar 17, 2015 at 09:58:21AM +0100, Paolo Bonzini wrote:
 
 
 On 17/03/2015 08:19, Takuya Yoshikawa wrote:
  When all bits in mask are not set,
  kvm_arch_mmu_enable_log_dirty_pt_masked() has nothing to do.  But since
  it needs to be called from the generic code, it cannot be inlined, and
  a few function calls, two when PML is enabled, are wasted.
  
  Since it is common to see many pages remain clean, e.g. framebuffers can
  stay calm for a long time, it is worth eliminating this overhead.
  
  Signed-off-by: Takuya Yoshikawa yoshikawa_takuya...@lab.ntt.co.jp
  ---
   virt/kvm/kvm_main.c | 8 +---
   1 file changed, 5 insertions(+), 3 deletions(-)
  
  diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
  index a109370..420d8cf 100644
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
  @@ -1061,9 +1061,11 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
  mask = xchg(dirty_bitmap[i], 0);
  dirty_bitmap_buffer[i] = mask;
   
  -   offset = i * BITS_PER_LONG;
  -   kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot, offset,
  -   mask);
  +   if (mask) {
  +   offset = i * BITS_PER_LONG;
  +   kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
  +   offset, mask);
  +   }
  }
   
  spin_unlock(kvm-mmu_lock);
  
 
 Good catch!
 
 Reviewed-by: Paolo Bonzini pbonz...@redhat.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: SVM: Fix confusing message if no exit handlers are installed

2015-03-18 Thread Marcelo Tosatti
On Mon, Mar 16, 2015 at 05:18:25PM -0400, Bandan Das wrote:
 
 I hit this path on a AMD box and thought
 someone was playing a April Fool's joke on me.
 
 Signed-off-by: Bandan Das b...@redhat.com
 ---
  arch/x86/kvm/svm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


randconfig build error with next-20150318, in drivers/vfio/virqfd.c

2015-03-18 Thread Jim Davis
This time, in plain text!

Building with the attached random configuration file,

drivers/vfio/virqfd.c: In function 'vfio_virqfd_enable':
drivers/vfio/virqfd.c:132:2: error: implicit declaration of function
'eventfd_ctx_fileget' [-Werror=implicit-function-declaration]
  ctx = eventfd_ctx_fileget(irqfd.file);
  ^
drivers/vfio/virqfd.c:132:6: warning: assignment makes pointer from
integer without a cast
  ctx = eventfd_ctx_fileget(irqfd.file);
  ^
  CC  drivers/tty/serial/serial_core.o
cc1: some warnings being treated as errors
scripts/Makefile.build:258: recipe for target 'drivers/vfio/virqfd.o' failed
HEAD is now at 78c876aef4f28... Add linux-next specific files for 20150318
  CLEAN   .
  CLEAN   arch/x86/kernel/cpu
  CLEAN   arch/x86/kernel
  CLEAN   arch/x86/purgatory
  CLEAN   arch/x86/realmode/rm
  CLEAN   arch/x86/vdso
  CLEAN   crypto/asymmetric_keys
  CLEAN   kernel/time
  CLEAN   kernel
  CLEAN   usr
  CLEAN   arch/x86/tools
  CLEAN   .tmp_versions
  CLEAN   scripts/basic
  CLEAN   scripts/genksyms
  CLEAN   scripts/kconfig
  CLEAN   scripts/mod
  CLEAN   scripts
  CLEAN   include/config include/generated arch/x86/include/generated
  CLEAN   .version signing_key.priv signing_key.x509 x509.genkey
Removing arch/x86/vdso/vdso-image-64.c
  HOSTCC  scripts/basic/fixdep
  HOSTCC  scripts/kconfig/conf.o
  SHIPPED scripts/kconfig/zconf.tab.c
  SHIPPED scripts/kconfig/zconf.lex.c
  SHIPPED scripts/kconfig/zconf.hash.c
  HOSTCC  scripts/kconfig/zconf.tab.o
  HOSTLD  scripts/kconfig/conf
scripts/kconfig/conf --randconfig Kconfig
KCONFIG_SEED=0x68B9BF68
#
# configuration written to .config
#
scripts/kconfig/conf --silentoldconfig Kconfig
  SYSTBL  arch/x86/syscalls/../include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/syscalls/../include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/syscalls/../include/generated/uapi/asm/unistd_x32.h
  SYSHDR  arch/x86/syscalls/../include/generated/uapi/asm/unistd_64.h
  HOSTCC  scripts/basic/bin2c
  CHK include/config/kernel.release
  WRAParch/x86/include/generated/asm/early_ioremap.h
  WRAParch/x86/include/generated/asm/dma-contiguous.h
  WRAParch/x86/include/generated/asm/mcs_spinlock.h
  WRAParch/x86/include/generated/asm/clkdev.h
  WRAParch/x86/include/generated/asm/scatterlist.h
  WRAParch/x86/include/generated/asm/cputime.h
  CHK include/generated/uapi/linux/version.h
  UPD include/generated/uapi/linux/version.h
  HOSTCC  scripts/recordmcount
  HOSTCC  scripts/kallsyms
  HOSTCC  scripts/pnmtologo
  HOSTCC  scripts/genksyms/genksyms.o
  CC  scripts/mod/empty.o
  UPD include/config/kernel.release
  HOSTCC  scripts/mod/mk_elfconfig
  CC  scripts/mod/devicetable-offsets.s
  SHIPPED scripts/genksyms/parse.tab.c
  MKELF   scripts/mod/elfconfig.h
  SHIPPED scripts/genksyms/lex.lex.c
  HOSTCC  scripts/mod/modpost.o
  SHIPPED scripts/genksyms/keywords.hash.c
  SHIPPED scripts/genksyms/parse.tab.h
  HOSTCC  scripts/sortextable
  HOSTCC  scripts/genksyms/parse.tab.o
  HOSTCC  scripts/mod/sumversion.o
  HOSTCC  scripts/genksyms/lex.lex.o
  HOSTCC  scripts/asn1_compiler
  GEN scripts/mod/devicetable-offsets.h
  HOSTCC  scripts/mod/file2alias.o
  HOSTLD  scripts/genksyms/genksyms
  HOSTLD  scripts/mod/modpost
  HOSTCC  arch/x86/tools/relocs_common.o
  HOSTCC  arch/x86/tools/relocs_64.o
  HOSTCC  arch/x86/tools/relocs_32.o
  CHK include/generated/utsrelease.h
  UPD include/generated/utsrelease.h
  HOSTLD  arch/x86/tools/relocs
  CC  kernel/bounds.s
  GEN include/generated/bounds.h
  CC  arch/x86/kernel/asm-offsets.s
  GEN include/generated/asm-offsets.h
  CALLscripts/checksyscalls.sh
  CHK include/generated/compile.h
  CC  init/main.o
  CC  init/do_mounts.o
  CC  init/initramfs.o
  CC  init/do_mounts_initrd.o
  CC  init/init_task.o
  CC  init/calibrate.o
  UPD include/generated/compile.h
  HOSTCC  usr/gen_init_cpio
  CC  init/version.o
  GEN usr/initramfs_data.cpio.lz4
  CC  arch/x86/kernel/process_32.o
  AS  arch/x86/crypto/aes-i586-asm_32.o
  CC  kernel/fork.o
  CC  kernel/exec_domain.o
  AS  usr/initramfs_data.o
  CC  arch/x86/crypto/aes_glue.o
  CC  arch/x86/mm/init.o
  LD  usr/built-in.o
  CC  arch/x86/mm/init_32.o
  LD  init/mounts.o
  LD  init/built-in.o
  CC  arch/x86/mm/fault.o
  AS  arch/x86/crypto/aesni-intel_asm.o
  CC  mm/filemap.o
  CC  arch/x86/crypto/aesni-intel_glue.o
  CC  arch/x86/kernel/signal.o
  CC  kernel/panic.o
  CC  arch/x86/mm/ioremap.o
  CC  arch/x86/mm/extable.o
  CC  arch/x86/mm/pageattr.o
  CC  arch/x86/crypto/fpu.o
  CC  kernel/cpu.o
  CC  arch/x86/mm/mmap.o
  AS  arch/x86/kernel/entry_32.o
  CC  arch/x86/crypto/crc32c-intel_glue.o
  CC  arch/x86/kernel/traps.o
  CC  arch/x86/mm/pat.o
  CC  arch/x86/mm/pgtable.o
  CC  kernel/exit.o
  CC [M]  arch/x86/crypto/glue_helper.o
  CC  kernel/softirq.o
  CC

[PATCH v4 1/2] kvm: x86: Extended struct kvm_lapic_irq with msi_redir_hint for MSI delivery

2015-03-18 Thread James Sullivan
Extended struct kvm_lapic_irq with bool msi_redir_hint, which will
be used to determine if the delivery of the MSI should target only
the lowest priority CPU in the logical group specified for delivery.
(In physical dest mode, the RH bit is not relevant). Initialized the value
of msi_redir_hint to true when RH=1 in kvm_set_msi_irq(), and initialized
to false in all other cases.

Added value of msi_redir_hint to a debug message dump of an IRQ in
apic_send_ipi().

Signed-off-by: James Sullivan sullivan.jame...@gmail.com
---
Changes since v1:
* Squashed a number of smaller commits into this one commit,
which adds and initializes the msi_redir_hint variable
and extends existing debug messages to display its value.
Changes since v2:
* Added old patch (5502fedb.3030...@gmail.com) to set the value of
dest_mode in kvm_set_msi_irq() to be APIC_DEST_LOGICAL only when RH=1/DM=1,
and APIC_DEST_PHYSICAL otherwise. This decouples the dependency of 
this patch set on the previous submission and collects all efforts
to implement RH bit handling into one submission.
* Patch formatting
Changes since v3:
* Reverted logic for setting dest_mode to RH=1/DM=1 (see
20150318225225.ga8...@amt.cnet; this is for consistency with
the interpretation of DM and RH in native_compose_msi_msg())

 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/ioapic.c   | 1 +
 arch/x86/kvm/irq_comm.c | 3 ++-
 arch/x86/kvm/lapic.c| 6 --
 arch/x86/kvm/x86.c  | 1 +
 5 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a236e39..77feaf4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -685,6 +685,7 @@ struct kvm_lapic_irq {
u32 trig_mode;
u32 shorthand;
u32 dest_id;
+   bool msi_redir_hint;
 };
 
 struct kvm_x86_ops {
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index b1947e0..61f0874 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -347,6 +347,7 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int 
irq, bool line_status)
irqe.delivery_mode = entry-fields.delivery_mode  8;
irqe.level = 1;
irqe.shorthand = 0;
+   irqe.msi_redir_hint = false;
 
if (irqe.trig_mode == IOAPIC_EDGE_TRIG)
ioapic-irr = ~(1  irq);
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 72298b3..80c10af 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -106,9 +106,10 @@ static inline void kvm_set_msi_irq(struct 
kvm_kernel_irq_routing_entry *e,
irq-dest_mode = (1  MSI_ADDR_DEST_MODE_SHIFT)  e-msi.address_lo;
irq-trig_mode = (1  MSI_DATA_TRIGGER_SHIFT)  e-msi.data;
irq-delivery_mode = e-msi.data  0x700;
+   irq-msi_redir_hint = ((e-msi.address_lo
+MSI_ADDR_REDIRECTION_LOWPRI)  0);
irq-level = 1;
irq-shorthand = 0;
-   /* TODO Deal with RH bit of MSI message address */
 }
 
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index bd4e34d..a15c444 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -892,6 +892,7 @@ static void apic_send_ipi(struct kvm_lapic *apic)
irq.level = icr_low  APIC_INT_ASSERT;
irq.trig_mode = icr_low  APIC_INT_LEVELTRIG;
irq.shorthand = icr_low  APIC_SHORT_MASK;
+   irq.msi_redir_hint = false;
if (apic_x2apic_mode(apic))
irq.dest_id = icr_high;
else
@@ -901,10 +902,11 @@ static void apic_send_ipi(struct kvm_lapic *apic)
 
apic_debug(icr_high 0x%x, icr_low 0x%x, 
   short_hand 0x%x, dest 0x%x, trig_mode 0x%x, level 0x%x, 
-  dest_mode 0x%x, delivery_mode 0x%x, vector 0x%x\n,
+  dest_mode 0x%x, delivery_mode 0x%x, vector 0x%x, 
+  msi_redir_hint 0x%x\n,
   icr_high, icr_low, irq.shorthand, irq.dest_id,
   irq.trig_mode, irq.level, irq.dest_mode, irq.delivery_mode,
-  irq.vector);
+  irq.vector, irq.msi_redir_hint);
 
kvm_irq_delivery_to_apic(apic-vcpu-kvm, apic, irq, NULL);
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bd7a70b..03e9b09 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5902,6 +5902,7 @@ static void kvm_pv_kick_cpu_op(struct kvm *kvm, unsigned 
long flags, int apicid)
lapic_irq.shorthand = 0;
lapic_irq.dest_mode = 0;
lapic_irq.dest_id = apicid;
+   lapic_irq.msi_redir_hint = false;
 
lapic_irq.delivery_mode = APIC_DM_REMRD;
kvm_irq_delivery_to_apic(kvm, 0, lapic_irq, NULL);
-- 
2.3.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/2] kvm: x86: Deliver MSI IRQ to only lowest prio cpu if msi_redir_hint is true

2015-03-18 Thread James Sullivan
An MSI interrupt should only be delivered to the lowest priority CPU
when it has RH=1, regardless of the delivery mode. Modified
kvm_is_dm_lowest_prio() to check for either irq-delivery_mode == APIC_DM_LOWPRI
or irq-msi_redir_hint.

Moved kvm_is_dm_lowest_prio() into lapic.h and renamed to
kvm_lowest_prio_delivery().

Changed a check in kvm_irq_delivery_to_apic_fast() from
irq-delivery_mode == APIC_DM_LOWPRI to kvm_is_dm_lowest_prio().

Signed-off-by: James Sullivan sullivan.jame...@gmail.com
---
Changes since v1:
* Squashed a number of smaller commits into this one commit,
which implements MSI delivery to only the lowest-priority
CPU whenever RH=1 using the above helper
kvm_lowest_prio_delivery().
Changes since v2:
* Patch formatting
Changes since v3:
N/A

 arch/x86/kvm/irq_comm.c | 11 ---
 arch/x86/kvm/lapic.c|  3 +--
 arch/x86/kvm/lapic.h|  6 ++
 3 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 80c10af..9efff9e 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -31,6 +31,8 @@
 
 #include ioapic.h
 
+#include lapic.h
+
 static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
   struct kvm *kvm, int irq_source_id, int level,
   bool line_status)
@@ -48,11 +50,6 @@ static int kvm_set_ioapic_irq(struct 
kvm_kernel_irq_routing_entry *e,
line_status);
 }
 
-inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq)
-{
-   return irq-delivery_mode == APIC_DM_LOWEST;
-}
-
 int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, unsigned long *dest_map)
 {
@@ -60,7 +57,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
struct kvm_vcpu *vcpu, *lowest = NULL;
 
if (irq-dest_mode == 0  irq-dest_id == 0xff 
-   kvm_is_dm_lowest_prio(irq)) {
+   kvm_lowest_prio_delivery(irq)) {
printk(KERN_INFO kvm: apic: phys broadcast and lowest prio\n);
irq-delivery_mode = APIC_DM_FIXED;
}
@@ -76,7 +73,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
irq-dest_id, irq-dest_mode))
continue;
 
-   if (!kvm_is_dm_lowest_prio(irq)) {
+   if (!kvm_lowest_prio_delivery(irq)) {
if (r  0)
r = 0;
r += kvm_apic_set_irq(vcpu, irq, dest_map);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a15c444..f8b21d5 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -701,8 +701,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct 
kvm_lapic *src,
dst = map-logical_map[cid];
 
bitmap = apic_logical_id(map, mda);
-
-   if (irq-delivery_mode == APIC_DM_LOWEST) {
+   if (kvm_lowest_prio_delivery(irq)) {
int l = -1;
for_each_set_bit(i, bitmap, 16) {
if (!dst[i])
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 0bc6c65..ed7e2fa 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -168,6 +168,12 @@ static inline bool kvm_apic_has_events(struct kvm_vcpu 
*vcpu)
return vcpu-arch.apic-pending_events;
 }
 
+static inline bool kvm_lowest_prio_delivery(struct kvm_lapic_irq *irq)
+{
+   return (irq-delivery_mode == APIC_DM_LOWEST ||
+   irq-msi_redir_hint);
+}
+
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
 void wait_lapic_expire(struct kvm_vcpu *vcpu);
-- 
2.3.3

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/2] kvm: x86: Implement handling of RH=1 for MSI delivery in KVM

2015-03-18 Thread James Sullivan
Changes Since v1:
* Reworked patches into two commits:
1) [Patch v2 1/2] Extended struct kvm_lapic_irq with bool
msi_redir_hint
* Initialize msi_redir_hint = true in kvm_set_msi_irq when RH=1
* Initialize msi_redir_hint = false otherwise
* Added value of msi_redir_hint to debug message dump of IRQ in
apic_send_ipi
2) [Patch v2 2/2] Deliver to only lowest prio CPU if msi_redir_hint 
   is true 
* Move kvm_is_dm_lowest_prio() - lapic.h, rename to
kvm_lowest_prio_delivery, set condition to
(APIC_DM_LOWPRI || msi_redir_hint)
* Change check in kvm_irq_delivery_to_apic_fast() for
APIC_DM_LOWPRI or msi_redir_hint to a check for
kvm_is_dm_lowest_prio() 
Changes since v2:
* Extend Patch 1/2 (kvm: x86: Extended struct kvm_lapic_irq with 
msi_redir_hint for MSI delivery) with older patch to set the value
of dest_mode in kvm_set_msi_irq() to be APIC_DEST_LOGICAL only when 
RH=1/DM=1, and APIC_DEST_PHYSICAL otherwise. 
(5502fedb.3030...@gmail.com)
This was done to decouple the patch dependency and to collect all
efforts to implement RH bit handling into one submission.
* Patch formatting
Changes since v3:
* Revert logic for setting dest_mode; irq-dest_mode is now set
independently of RH=1. (See 20150318225225.ga8...@amt.cnet).
The reason for this is to maintain consistenty with the interpretation
of MSI destination mode selection in native_compose_msi_msg().

This series of patches extends the KVM interrupt delivery mechanism
to correctly account for the MSI Redirection Hint bit. The RH bit is 
used in logical destination mode to indicate that the delivery of the
interrupt shall only be to the lowest priority candidate LAPIC.

Currently, there is no handling of the MSI RH bit in the KVM interrupt
delivery mechanism. This patch implements the following logic:

* DM=0, RH=*  : Physical destination mode. Interrupt is delivered to
the LAPIC with the matching APIC ID. (Subject to
the usual restrictions, i.e. no broadcast dest)
* DM=1, RH=0  : Logical destination mode without redirection. Interrupt
is delivered to all LAPICs in the logical group 
specified by the IRQ's destination map and delivery
mode.
* DM=1, RH=1  : Logical destination mode with redirection. Interrupt
is delivered only to the lowest priority LAPIC in the 
logical group specified by the dest map and the
delivery mode. Delivery semantics are otherwise
specified by the delivery_mode of the IRQ, which
is unchanged.

In other words, the RH bit is ignored in physical destination mode, and
when it is set in logical destination mode causes delivery to only apply
to the lowest priority processor in the logical group. The IA32 manual
is in slight contradiction with itself on this matter, but this patch
agrees with this interpretation of the RH bit:

https://software.intel.com/en-us/forums/topic/23

This patch has passed some rudimentary tests using an SMP QEMU guest and
virtio sourced MSIs, but I haven't done experiments with passing through 
PCI hardware (intend to start working on this).

-James

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH kvm-unit-tests] x86: fix build (macro R interpreted as raw string)

2015-03-18 Thread Marcelo Tosatti
On Fri, Mar 13, 2015 at 05:48:04PM +0100, Radim Krčmář wrote:
 GCC 5.0.0 enables raw strings by default and they have higher priority
 than macros, thus R[...] is interpreted incorrectly:
 
   lib/x86/isr.c:112:30: error: invalid character ')' in raw string delimiter
   lib/x86/isr.c:112:8: error: stray ‘R’ in program
   lib/x86/isr.c:112:26: error: expected ‘:’ or ‘)’ before string constant
   orl $0x200, (%%Rsp)\n\t
 
 Fix it by putting a space between macro R and a string literal.
 (We already do that somewhere.)
 
 Signed-off-by: Radim Krčmář rkrc...@redhat.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch v5] x86: irq_comm: Add check for RH bit in kvm_set_msi_irq

2015-03-18 Thread James Sullivan
 
 The documentation states the following:
 
 * When RH is 0, the interrupt is directed to the processor listed in the
 Destination ID field.
 
 * If RH is 0, then the DM bit is ignored and the message is sent ahead
 independent of whether the physical or logical destination mode is used.
 
 However, from the POV of a device writing to memory to generate an MSI 
 interrupt, there is no (or i can't see any) other information that 
 can be used to infer logical or physical mode for the interrupt message.
 
 Before your patch:
 
 (dm, rh) = (0, 0) = irq-dest_mode = 0
 (dm, rh) = (0, 1) = irq-dest_mode = 0
 (dm, rh) = (1, 0) = irq-dest_mode = 1
 (dm, rh) = (1, 1) = irq-dest_mode = 1
 
 After your patch:
 
 (dm, rh) = (0, 0) = irq-dest_mode = 0
 (dm, rh) = (0, 1) = irq-dest_mode = 0
 (dm, rh) = (1, 0) = irq-dest_mode = 0
 (dm, rh) = (1, 1) = irq-dest_mode = 1
 
 
 Am i missing some explicit documentation that refers 
 to (dm, rh) = (1, 0) = irq-dest_mode = 0 ?

From the IA32 manual (Vol. 3, 10.11.2):

 * When RH is 0, the interrupt is directed to the processor listed
   in the Destination ID field.
 * When RH is 1 and the physical destination mode is used, the Destination
   ID field must not be set to FFH; it must point to a processor that is
   present and enabled to receive the interrupt.
 * When RH is 1 and the logical destination mode is active in a system using
   a flat addressing model, the Destination ID field must be set so that bits
   set to 1 identify processors that are present and enabled to receive the
   interrupt.
 * If RH is set to 1 and the logical destination mode is active in a system
   using cluster addressing model, then Destination ID field must not be
   set to FFH; the processors identified with this field must be present
   and enabled to receive the interrupt.

My interpretation of this is that RH=0 indicates that the Dest. ID field
contains an APIC ID, and as such destination mode is physical. When RH=1,
depending on the value of DM, we either use physical or logical dest mode.
The result of this is that logical dest mode is set just when RH=1/DM=1,
as far as I understand.

 
 See native_compose_msi_msg:
 
 msg-address_lo =
 MSI_ADDR_BASE_LO |
 ((apic-irq_dest_mode == 0) ?
 MSI_ADDR_DEST_MODE_PHYSICAL :
 MSI_ADDR_DEST_MODE_LOGICAL) |
 ((apic-irq_delivery_mode != dest_LowestPrio) ?
 MSI_ADDR_REDIRECTION_CPU :
 MSI_ADDR_REDIRECTION_LOWPRI) |
 MSI_ADDR_DEST_ID(dest);
 
 
 So it does configure DM = MSI_ADDR_DEST_MODE_LOGICAL
 and RH = MSI_ADDR_REDIRECTION_LOWPRI.
 

...and yet this is a good counterexample against my argument :)

What I think I'll do is revert this particular change so that dest_mode is
set independently of RH. While I'm not entirely convinced that this is the
intended interpretation, I do think that consistency with the existing logic
is probably desirable for the time being. If I can get closure on the matter
I'll re-submit that change, but for the time being I will undo it.

-James
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: randconfig build error with next-20150318, in drivers/vfio/virqfd.c

2015-03-18 Thread Alex Williamson
On Wed, 2015-03-18 at 15:57 -0700, Jim Davis wrote:
 Building with the attached random configuration file,
 
 drivers/vfio/virqfd.c: In function 'vfio_virqfd_enable':
 drivers/vfio/virqfd.c:132:2: error: implicit declaration of function
 'eventfd_ctx_fileget' [-Werror=implicit-function-declaration]
   ctx = eventfd_ctx_fileget(irqfd.file);
   ^
 drivers/vfio/virqfd.c:132:6: warning: assignment makes pointer from integer
 without a cast
   ctx = eventfd_ctx_fileget(irqfd.file);
   ^
   CC  drivers/tty/serial/serial_core.o
 cc1: some warnings being treated as errors
 scripts/Makefile.build:258: recipe for target 'drivers/vfio/virqfd.o' failed

Thank you.  We've already got a patch posted on the mailing list that
will make this code dependent on CONFIG_EVENTFD.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: call irq notifiers with directed EOI

2015-03-18 Thread Bandan Das
Radim Krčmář rkrc...@redhat.com writes:

 kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled.
 We need to do that for irq notifiers.  (Like with edge interrupts.)

Wow! It's interesting that this path is only hit with Xen as guest.
I always thought of directed EOI as a security feature since broadcast
could lead to interrupt storms (or something like that) :)

Bandan

 Fix it by skipping EOI broadcast only.

 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=82211
 Signed-off-by: Radim Krčmář rkrc...@redhat.com
 ---
  arch/x86/kvm/ioapic.c | 4 +++-
  arch/x86/kvm/lapic.c  | 3 +--
  2 files changed, 4 insertions(+), 3 deletions(-)

 diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
 index b1947e0f3e10..46d4449772bc 100644
 --- a/arch/x86/kvm/ioapic.c
 +++ b/arch/x86/kvm/ioapic.c
 @@ -422,6 +422,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
   struct kvm_ioapic *ioapic, int vector, int trigger_mode)
  {
   int i;
 + struct kvm_lapic *apic = vcpu-arch.apic;
  
   for (i = 0; i  IOAPIC_NUM_PINS; i++) {
   union kvm_ioapic_redirect_entry *ent = ioapic-redirtbl[i];
 @@ -443,7 +444,8 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
   kvm_notify_acked_irq(ioapic-kvm, KVM_IRQCHIP_IOAPIC, i);
   spin_lock(ioapic-lock);
  
 - if (trigger_mode != IOAPIC_LEVEL_TRIG)
 + if (trigger_mode != IOAPIC_LEVEL_TRIG ||
 + kvm_apic_get_reg(apic, APIC_SPIV)  APIC_SPIV_DIRECTED_EOI)
   continue;
  
   ASSERT(ent-fields.trig_mode == IOAPIC_LEVEL_TRIG);
 diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
 index bd4e34de24c7..4ee827d7bf36 100644
 --- a/arch/x86/kvm/lapic.c
 +++ b/arch/x86/kvm/lapic.c
 @@ -833,8 +833,7 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct 
 kvm_vcpu *vcpu2)
  
  static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
  {
 - if (!(kvm_apic_get_reg(apic, APIC_SPIV)  APIC_SPIV_DIRECTED_EOI) 
 - kvm_ioapic_handles_vector(apic-vcpu-kvm, vector)) {
 + if (kvm_ioapic_handles_vector(apic-vcpu-kvm, vector)) {
   int trigger_mode;
   if (apic_test_vector(vector, apic-regs + APIC_TMR))
   trigger_mode = IOAPIC_LEVEL_TRIG;
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC] project proposal

2015-03-18 Thread Catalin Vasile
cryptodev is not merged into upstream from what I know.
gnutls can use cryptodev and AF_ALG as crypto engines.
From some benchmarks (that can also be found on cryptodev's webpage)
you can see AF_ALG has a lot overhead over a standalone misc/char
device.

On Wed, Mar 18, 2015 at 6:42 PM, Paolo Bonzini pbonz...@redhat.com wrote:


 On 18/03/2015 17:01, Catalin Vasile wrote:
 To be more exact, I want to make a virtio-crypto device to emulate a
 virtual cryptographic offloading device that will send jobs from the
 guest to a vhost that will process the jobs. This mechanism will link
 CryptoAPI from the guest to the CryptoAPI from the host. This way,
 whatever it's beneath CryptoAPI from the host will be used as
 offloading for the guest.
 Is there a mentor interested in getting involved in this kind of project?

 I think it's very likely that you'll find a mentor.  Please submit a
 proposal, also detailing the advantage of vhost over a userspace
 solution (using any of gnutls, AF_ALG, cryptodev).

 Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM live migration i/o error

2015-03-18 Thread Francesc Guasch
Hi.

I have three Ubuntu Server 14.04 trusty with KVM. Two of
them are HP servers and one is Dell. Both brands run fine
the KVM virtual servers, and I can do live migration between
the HPs. But I get I/O errors in the vda when I migrate to
or from the Dell server.

I have shared storage with NFS, mounted the same way in all
of them:

nfs.sever:/kvm /var/lib/libvirt/images nfs auto,vers=3

I checked the version of all the packages to make sure are
the same. I got:

kernel: 3.13.0-43-generic #72-Ubuntu SMP x86_64 libvirt:
libvirt: 1.2.2-0ubuntu13.1.9 
qemu-utils: 2.0.0+dfsg-2ubuntu1.10
qemu-kvm: 2.0.0+dfsg-2ubuntu1.10

I made sure the Cache in the Storage is set to None.

Disk bus: virtio Cache mode: none IO mode: default

I run this to do live migration:

virsh migrate --live virtual qemu+ssh://dellserver/system

As soon as it starts in the origin console I spot I/O error
messages, when it finishes I got them in the console in the
destination server. The file system is read only and I have to
shut it down hard.

end request I/O error, /dev/vda, sector 8790327

When I migrate to the other HP server the process runs fine.
I don't know what else to check, I wonder if such different
hardware could be a problem.

These are the CPU flags in the HP server:

  fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
  cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
  pbe syscall nx rdtscp lm c onstant_tsc arch_perfmon pebs bts
  rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64
  monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1
  sse4_2 po pcnt lahf_lm dtherm tpr_shadow vnmi flexpriority
  ept vpid
  
And those in the Dell server:

  fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
  cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
  pbe syscall nx lm constant _tsc pebs bts nopl pni dtes64
  monitor ds_cpl vmx est cid cx16 xtpr pdcm lahf_lm tpr_shadow
  
I tried to check the log files in /var/log/libvirt but I
can't see any different message when I migrate from HP to HP
than when I do from HP to Dell.

What else can I try ? Thank you for your time
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 92871] nested kvm - Warning in L0 kernel when trying to launch L2 guest in L1 guest

2015-03-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=92871

Radim Krčmář rkrc...@redhat.com changed:

   What|Removed |Added

 CC||rkrc...@redhat.com

--- Comment #1 from Radim Krčmář rkrc...@redhat.com ---
Fixed with KVM: nVMX: mask unrestricted_guest if disabled on L0.
(https://lkml.org/lkml/2015/3/17/478)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] PCI passthrough of 40G ethernet interface (Openstack/KVM)

2015-03-18 Thread Shannon Nelson
On Wed, Mar 18, 2015 at 3:01 PM, Shannon Nelson
shannon.nel...@intel.com wrote:


 On Wed, Mar 18, 2015 at 8:40 AM, jacob jacob opstk...@gmail.com wrote:
 
  On Wed, Mar 18, 2015 at 11:24 AM, Bandan Das b...@redhat.com wrote:
  
   Actually, Stefan suggests that support for this card is still sketchy
   and your best bet is to try out net-next
   http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git
  
   Also, could you please post more information about your hardware setup
   (chipset/processor/firmware version on the card etc) ?
 
  Host CPU : Model name:Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz
 
  Manufacturer Part Number:  XL710QDA1BLK
  Ethernet controller: Intel Corporation Ethernet Controller XL710 for
  40GbE QSFP+ (rev 01)
   #ethtool -i enp9s0
  driver: i40e
  version: 1.2.6-k
  firmware-version: f4.22 a1.1 n04.24 e800013fd
  bus-info: :09:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: no
 

Jacob,

It looks like you're using a NIC with the e800013fd firmware from last
summer, and from a separate message that you saw these issues with
both the 1.2.2-k and the 1.2.37 version drivers.  I suggest the next
step would be to update the NIC firmware as there are some performance
and stability updates available that deal with similar issues.  Please
see the Intel Networking support webpage at
https://downloadcenter.intel.com/download/24769 and look for the
NVMUpdatePackage.zip.  This should take care of several of the things
Stefan might describe as sketchy :-).

sln
(Sent again, hopefully without gmail adding html and thereby getting
it blocked from the kernel mailing lists...)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 5/6] vfio-pci: Remove warning if try-reset fails

2015-03-18 Thread Alex Williamson
As indicated in the comment, this is not entirely uncommon and
causes user concern for no reason.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 drivers/vfio/pci/vfio_pci.c |   10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 43517ce..d0f1e70 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -217,14 +217,8 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 * Try to reset the device.  The success of this is dependent on
 * being able to lock the device, which is not always possible.
 */
-   if (vdev-reset_works) {
-   int ret = pci_try_reset_function(pdev);
-   if (ret)
-   pr_warn(%s: Failed to reset device %s (%d)\n,
-   __func__, dev_name(pdev-dev), ret);
-   else
-   vdev-needs_reset = false;
-   }
+   if (vdev-reset_works  !pci_try_reset_function(pdev))
+   vdev-needs_reset = false;
 
pci_restore_state(pdev);
 out:

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 4/6] vfio-pci: Allow PCI IDs to be specified as module options

2015-03-18 Thread Alex Williamson
This copies the same support from pci-stub for exactly the same
purpose, enabling a set of PCI IDs to be automatically added to the
driver's dynamic ID table at module load time.  The code here is
pretty simple and both vfio-pci and pci-stub are fairly unique in
being meta drivers, capable of attaching to any device, so there's no
attempt made to generalize the code into pci-core.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 drivers/vfio/pci/vfio_pci.c |   49 +++
 1 file changed, 49 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 25aef05..43517ce 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -11,6 +11,8 @@
  * Author: Tom Lyon, p...@cisco.com
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME :  fmt
+
 #include linux/device.h
 #include linux/eventfd.h
 #include linux/file.h
@@ -33,6 +35,10 @@
 #define DRIVER_AUTHOR   Alex Williamson alex.william...@redhat.com
 #define DRIVER_DESC VFIO PCI - User Level meta-driver
 
+static char ids[1024] __initdata;
+module_param_string(ids, ids, sizeof(ids), 0);
+MODULE_PARM_DESC(ids, Initial PCI IDs to add to the vfio driver, format is 
\vendor:device[:subvendor[:subdevice[:class[:class_mask\ and multiple 
comma separated entries can be specified);
+
 static bool nointxmask;
 module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(nointxmask,
@@ -1105,6 +,47 @@ static void __exit vfio_pci_cleanup(void)
vfio_pci_uninit_perm_bits();
 }
 
+static void __init vfio_pci_fill_ids(void)
+{
+   char *p, *id;
+   int rc;
+
+   /* no ids passed actually */
+   if (ids[0] == '\0')
+   return;
+
+   /* add ids specified in the module parameter */
+   p = ids;
+   while ((id = strsep(p, ,))) {
+   unsigned int vendor, device, subvendor = PCI_ANY_ID,
+   subdevice = PCI_ANY_ID, class = 0, class_mask = 0;
+   int fields;
+
+   if (!strlen(id))
+   continue;
+
+   fields = sscanf(id, %x:%x:%x:%x:%x:%x,
+   vendor, device, subvendor, subdevice,
+   class, class_mask);
+
+   if (fields  2) {
+   pr_warn(invalid id string \%s\\n, id);
+   continue;
+   }
+
+   rc = pci_add_dynid(vfio_pci_driver, vendor, device,
+  subvendor, subdevice, class, class_mask, 0);
+   if (rc)
+   pr_warn(failed to add dynamic id 
[%04hx:%04hx[%04hx:%04hx]] class %#08x/%08x (%d)\n,
+   vendor, device, subvendor, subdevice,
+   class, class_mask, rc);
+   else
+   pr_info(add [%04hx:%04hx[%04hx:%04hx]] class 
%#08x/%08x\n,
+   vendor, device, subvendor, subdevice,
+   class, class_mask);
+   }
+}
+
 static int __init vfio_pci_init(void)
 {
int ret;
@@ -1119,6 +1166,8 @@ static int __init vfio_pci_init(void)
if (ret)
goto out_driver;
 
+   vfio_pci_fill_ids();
+
return 0;
 
 out_driver:

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/6] vfio-pci: Add VGA arbiter client

2015-03-18 Thread Alex Williamson
If VFIO VGA access is disabled for the user, either by CONFIG option
or module parameter, we can often opt-out of VGA arbitration.  We can
do this when PCI bridge control of VGA routing is possible.  This
means that we must have a parent bridge and there must only be a
single VGA device below that bridge.  Fortunately this is the typical
case for discrete GPUs.

Doing this allows us to minimize the impact of additional GPUs, in
terms of VGA arbitration, when they are only used via vfio-pci for
non-VGA applications.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 drivers/vfio/pci/vfio_pci.c |   67 ---
 1 file changed, 63 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 7053110..25aef05 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -25,6 +25,7 @@
 #include linux/types.h
 #include linux/uaccess.h
 #include linux/vfio.h
+#include linux/vgaarb.h
 
 #include vfio_pci_private.h
 
@@ -54,6 +55,50 @@ static inline bool vfio_vga_disabled(void)
 #endif
 }
 
+/*
+ * Our VGA arbiter participation is limited since we don't know anything
+ * about the device itself.  However, if the device is the only VGA device
+ * downstream of a bridge and VFIO VGA support is disabled, then we can
+ * safely return legacy VGA IO and memory as not decoded since the user
+ * has no way to get to it and routing can be disabled externally at the
+ * bridge.
+ */
+static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga)
+{
+   struct vfio_pci_device *vdev = opaque;
+   struct pci_dev *tmp = NULL, *pdev = vdev-pdev;
+   unsigned char max_busnr;
+   unsigned int decodes;
+
+   if (single_vga || !vfio_vga_disabled() || pci_is_root_bus(pdev-bus))
+   return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
+  VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
+
+   max_busnr = pci_bus_max_busnr(pdev-bus);
+   decodes = VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM;
+
+   while ((tmp = pci_get_class(PCI_CLASS_DISPLAY_VGA  8, tmp)) != NULL) {
+   if (tmp == pdev ||
+   pci_domain_nr(tmp-bus) != pci_domain_nr(pdev-bus) ||
+   pci_is_root_bus(tmp-bus))
+   continue;
+
+   if (tmp-bus-number = pdev-bus-number 
+   tmp-bus-number = max_busnr) {
+   pci_dev_put(tmp);
+   decodes |= VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM;
+   break;
+   }
+   }
+
+   return decodes;
+}
+
+static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
+{
+   return (pdev-class  8) == PCI_CLASS_DISPLAY_VGA;
+}
+
 static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev);
 
 static int vfio_pci_enable(struct vfio_pci_device *vdev)
@@ -108,7 +153,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
} else
vdev-msix_bar = 0xFF;
 
-   if (!vfio_vga_disabled()  (pdev-class  8) == PCI_CLASS_DISPLAY_VGA)
+   if (!vfio_vga_disabled()  vfio_pci_is_vga(pdev))
vdev-has_vga = true;
 
return 0;
@@ -900,6 +945,12 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
kfree(vdev);
}
 
+   if (vfio_pci_is_vga(pdev)) {
+   vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode);
+   vga_set_legacy_decoding(pdev,
+   vfio_pci_set_vga_decode(vdev, false));
+   }
+
return ret;
 }
 
@@ -908,9 +959,17 @@ static void vfio_pci_remove(struct pci_dev *pdev)
struct vfio_pci_device *vdev;
 
vdev = vfio_del_group_dev(pdev-dev);
-   if (vdev) {
-   iommu_group_put(pdev-dev.iommu_group);
-   kfree(vdev);
+   if (!vdev)
+   return;
+
+   iommu_group_put(pdev-dev.iommu_group);
+   kfree(vdev);
+
+   if (vfio_pci_is_vga(pdev)) {
+   vga_client_register(pdev, NULL, NULL, NULL);
+   vga_set_legacy_decoding(pdev,
+   VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
+   VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM);
}
 }
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/6] vfio-pci: Misc enhancements

2015-03-18 Thread Alex Williamson
v2:
 - Incorporate comments from Bandan and Bjorn for vfio-pci.ids option
 - Include necessary vgaarb change, already Ack'd by Dave
 - Rebase on top of my current next branch
 - Rename D3 disable parameter

There are really 3 separate features added in this series, the first
is to opt-out VGA devices from VGA arbitration if a) vfio VGA support
is disable, either via config or module optioni (new), and b) bridge
control of VGA resource routing is possible.  This means that if
multiple VGA devices are installed with the intention of using them
for device assignment without VGA, we can eliminate the effect they
have on host graphics.

The second feature is the addition of the ids module option, which
acts just like the option of the same name on pci-stub.  This makes it
easier to configure vfio-pci to statically claim certain devices.  By
either building vfio support into the kernel or using softdeps to
load vfio-pci before native drivers, this can make it much easier to
bind to devices which are only intended to be used through vfio, such
as those additional graphics cards.

Finally, when devices are bound to vfio-pci and unused, we can try to
put them into a low-power state.  This again feeds into that idea that
devices may be installed on the system only for use through vfio, and
that use may not be continuous.  This saves a few watts for some GPUs.
Thanks,

Alex

---

Alex Williamson (6):
  vgaarb: Stub vga_set_legacy_decoding()
  vfio-pci: Add module option to disable VGA region access
  vfio-pci: Add VGA arbiter client
  vfio-pci: Allow PCI IDs to be specified as module options
  vfio-pci: Remove warning if try-reset fails
  vfio-pci: Move idle devices to D3hot power state


 drivers/vfio/pci/vfio_pci.c |  179 +++
 include/linux/vgaarb.h  |5 +
 2 files changed, 167 insertions(+), 17 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/6] vfio-pci: Add module option to disable VGA region access

2015-03-18 Thread Alex Williamson
Add a module option so that we don't require a CONFIG change and
kernel rebuild to disable VGA support.  Not only can VGA support be
troublesome in itself, but by disabling it we can reduce the impact
to host devices by doing a VGA arbitration opt-out.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 drivers/vfio/pci/vfio_pci.c |   19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 2f865d07..7053110 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -37,8 +37,23 @@ module_param_named(nointxmask, nointxmask, bool, S_IRUGO | 
S_IWUSR);
 MODULE_PARM_DESC(nointxmask,
  Disable support for PCI 2.3 style INTx masking.  If this 
resolves problems for specific devices, report lspci -vvvxxx to 
linux-...@vger.kernel.org so the device can be fixed automatically via the 
broken_intx_masking flag.);
 
+#ifdef CONFIG_VFIO_PCI_VGA
+static bool disable_vga;
+module_param(disable_vga, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_vga, Disable VGA resource access through vfio-pci);
+#endif
+
 static DEFINE_MUTEX(driver_lock);
 
+static inline bool vfio_vga_disabled(void)
+{
+#ifdef CONFIG_VFIO_PCI_VGA
+   return disable_vga;
+#else
+   return true;
+#endif
+}
+
 static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev);
 
 static int vfio_pci_enable(struct vfio_pci_device *vdev)
@@ -93,10 +108,8 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
} else
vdev-msix_bar = 0xFF;
 
-#ifdef CONFIG_VFIO_PCI_VGA
-   if ((pdev-class  8) == PCI_CLASS_DISPLAY_VGA)
+   if (!vfio_vga_disabled()  (pdev-class  8) == PCI_CLASS_DISPLAY_VGA)
vdev-has_vga = true;
-#endif
 
return 0;
 }

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: call irq notifiers with directed EOI

2015-03-18 Thread Radim Krčmář
2015-03-18 15:37-0400, Bandan Das:
 Radim Krčmář rkrc...@redhat.com writes:
  kvm_ioapic_update_eoi() wasn't called if directed EOI was enabled.
  We need to do that for irq notifiers.  (Like with edge interrupts.)
 
 Wow! It's interesting that this path is only hit with Xen as guest.

Linux doesn't use directed EOI ... KVM should fail with anything that
depends on PIT, so probably only Xen bothered to implement it :)

 I always thought of directed EOI as a security feature since broadcast
 could lead to interrupt storms (or something like that) :)

I think it is just an unpopular optimization for large systems.

(With multiple IO-APICs: IRQ handler knows which ones need the EOI, but
 LAPIC doesn't, hence we avoid some useless poking if OS does it ...
 EOI interrupt storm happens because right after EOI, the IO-APIC can
 send another interrupt and real hardware is slow, so CPU manages some
 cycles before receiving the next one, but KVM works instantaneously.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 6/6] vfio-pci: Move idle devices to D3hot power state

2015-03-18 Thread Alex Williamson
We can save some power by putting devices that are bound to vfio-pci
but not in use by the user in the D3hot power state.  Devices get
woken into D0 when opened by the user.  Resets return the device to
D0, so we need to re-apply the low power state after a bus reset.
It's tempting to try to use D3cold, but we have no reason to inhibit
hotplug of idle devices and we might get into a loop of having the
device disappear before we have a chance to try to use it.

A new module parameter allows this feature to be disabled if there are
devices that misbehave as a result of this change.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 drivers/vfio/pci/vfio_pci.c |   36 +---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index d0f1e70..049b9e9 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -50,6 +50,11 @@ module_param(disable_vga, bool, S_IRUGO);
 MODULE_PARM_DESC(disable_vga, Disable VGA resource access through vfio-pci);
 #endif
 
+static bool disable_idle_d3;
+module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(disable_idle_d3,
+Disable using the PCI D3 low power state for idle, unused 
devices);
+
 static DEFINE_MUTEX(driver_lock);
 
 static inline bool vfio_vga_disabled(void)
@@ -114,6 +119,8 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
u16 cmd;
u8 msix_pos;
 
+   pci_set_power_state(pdev, PCI_D0);
+
/* Don't allow our initial saved state to include busmaster */
pci_clear_master(pdev);
 
@@ -225,6 +232,9 @@ out:
pci_disable_device(pdev);
 
vfio_pci_try_bus_reset(vdev);
+
+   if (!disable_idle_d3)
+   pci_set_power_state(pdev, PCI_D3hot);
 }
 
 static void vfio_pci_release(void *device_data)
@@ -951,6 +961,20 @@ static int vfio_pci_probe(struct pci_dev *pdev, const 
struct pci_device_id *id)
vfio_pci_set_vga_decode(vdev, false));
}
 
+   if (!disable_idle_d3) {
+   /*
+* pci-core sets the device power state to an unknown value at
+* bootup and after being removed from a driver.  The only
+* transition it allows from this unknown state is to D0, which
+* typically happens when a driver calls pci_enable_device().
+* We're not ready to enable the device yet, but we do want to
+* be able to get to D3.  Therefore first do a D0 transition
+* before going to D3.
+*/
+   pci_set_power_state(pdev, PCI_D0);
+   pci_set_power_state(pdev, PCI_D3hot);
+   }
+
return ret;
 }
 
@@ -971,6 +995,9 @@ static void vfio_pci_remove(struct pci_dev *pdev)
VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM |
VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM);
}
+
+   if (!disable_idle_d3)
+   pci_set_power_state(pdev, PCI_D0);
 }
 
 static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
@@ -1089,10 +1116,13 @@ static void vfio_pci_try_bus_reset(struct 
vfio_pci_device *vdev)
 
 put_devs:
for (i = 0; i  devs.cur_index; i++) {
-   if (!ret) {
-   tmp = vfio_device_data(devs.devices[i]);
+   tmp = vfio_device_data(devs.devices[i]);
+   if (!ret)
tmp-needs_reset = false;
-   }
+
+   if (!tmp-refcnt  !disable_idle_d3)
+   pci_set_power_state(tmp-pdev, PCI_D3hot);
+
vfio_device_put(devs.devices[i]);
}
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/6] vgaarb: Stub vga_set_legacy_decoding()

2015-03-18 Thread Alex Williamson
vga_set_legacy_decoding() is defined in drivers/gpu/vga/vgaarb.c,
which is only compiled with CONFIG_VGA_ARB.  A caller would
therefore get an undefined symbol if the VGA arbiter is not
enabled.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Acked-by: Dave Airlie airl...@redhat.com
---
 include/linux/vgaarb.h |5 +
 1 file changed, 5 insertions(+)

diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h
index c37bd4d..8c3b412 100644
--- a/include/linux/vgaarb.h
+++ b/include/linux/vgaarb.h
@@ -65,8 +65,13 @@ struct pci_dev;
  * out of the arbitration process (and can be safe to take
  * interrupts at any time.
  */
+#if defined(CONFIG_VGA_ARB)
 extern void vga_set_legacy_decoding(struct pci_dev *pdev,
unsigned int decodes);
+#else
+static inline void vga_set_legacy_decoding(struct pci_dev *pdev,
+  unsigned int decodes) { };
+#endif
 
 /**
  * vga_get - acquire  locks VGA resources

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC] project proposal

2015-03-18 Thread Paolo Bonzini


On 18/03/2015 18:05, Catalin Vasile wrote:
 cryptodev is not merged into upstream from what I know.

Yes, but QEMU runs on non-Linux platforms too.  Of course doing
vhost+driver or gnutls+driver would be already more than enough for the
summer.

In any case, just put all the justification in your application.  Thanks
for participating to QEMU's GSoC!

Paolo

 gnutls can use cryptodev and AF_ALG as crypto engines.
 From some benchmarks (that can also be found on cryptodev's webpage)
 you can see AF_ALG has a lot overhead over a standalone misc/char
 device.

 On Wed, Mar 18, 2015 at 6:42 PM, Paolo Bonzini pbonz...@redhat.com wrote:


 On 18/03/2015 17:01, Catalin Vasile wrote:
 To be more exact, I want to make a virtio-crypto device to emulate a
 virtual cryptographic offloading device that will send jobs from the
 guest to a vhost that will process the jobs. This mechanism will link
 CryptoAPI from the guest to the CryptoAPI from the host. This way,
 whatever it's beneath CryptoAPI from the host will be used as
 offloading for the guest.
 Is there a mentor interested in getting involved in this kind of project?

 I think it's very likely that you'll find a mentor.  Please submit a
 proposal, also detailing the advantage of vhost over a userspace
 solution (using any of gnutls, AF_ALG, cryptodev).

 Paolo
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html