Re: [3.5.0 BUG] vmx_handle_exit: unexpected, valid vectoring info (0x80000b0e)

2012-10-17 Thread Xiao Guangrong
On 09/14/2012 01:57 PM, Xiao Guangrong wrote:
 On 09/12/2012 04:15 PM, Avi Kivity wrote:
 On 09/12/2012 07:40 AM, Fengguang Wu wrote:
 Hi,

 3 of my test boxes running v3.5 kernel become unaccessible and I find
 two of them kept emitting this dmesg:

 vmx_handle_exit: unexpected, valid vectoring info (0x8b0e) and exit 
 reason is 0x31

 The other one has froze and the above lines are the last dmesg.
 Any ideas?

 First, that printk should be rate-limited.

 Second, we should add EXIT_REASON_EPT_MISCONFIG (0x31) to 

  if ((vectoring_info  VECTORING_INFO_VALID_MASK) 
  (exit_reason != EXIT_REASON_EXCEPTION_NMI 
  exit_reason != EXIT_REASON_EPT_VIOLATION 
  exit_reason != EXIT_REASON_TASK_SWITCH))
  printk(KERN_WARNING %s: unexpected, valid vectoring info 
 (0x%x) and exit reason is 0x%x\n,
 __func__, vectoring_info, exit_reason);

 since it's easily caused by the guest.
 
 Yes, i will do these.
 

 Third, it's really unexpected.  It seems the guest was attempting to deliver 
 a page fault exception (0x0e) but encountered an mmio page during delivery 
 (in the IDT, TSS, stack, or page tables).  Is this reproducible?  If so it's 
 easy to patch kvm to halt in that case and allow examining the guest via 
 qemu.

 
 Have no idea yet why the box was frozen under this case, will try to write a 
 test case,
 hope it can help me to find the reason out.
 

Still did not know why linux kernel triggered it. I have posted
a patchset to report an internal error for this case, hoping
Fengguang can reproduce it after the patchset and Qemu's dump
can help us to find the reason out.

I will keep working on it.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.5.0 BUG] vmx_handle_exit: unexpected, valid vectoring info (0x80000b0e)

2012-10-17 Thread Fengguang Wu
On Wed, Oct 17, 2012 at 02:26:22PM +0800, Xiao Guangrong wrote:
 On 09/14/2012 01:57 PM, Xiao Guangrong wrote:
  On 09/12/2012 04:15 PM, Avi Kivity wrote:
  On 09/12/2012 07:40 AM, Fengguang Wu wrote:
  Hi,
 
  3 of my test boxes running v3.5 kernel become unaccessible and I find
  two of them kept emitting this dmesg:
 
  vmx_handle_exit: unexpected, valid vectoring info (0x8b0e) and exit 
  reason is 0x31
 
  The other one has froze and the above lines are the last dmesg.
  Any ideas?
 
  First, that printk should be rate-limited.
 
  Second, we should add EXIT_REASON_EPT_MISCONFIG (0x31) to 
 
 if ((vectoring_info  VECTORING_INFO_VALID_MASK) 
 (exit_reason != EXIT_REASON_EXCEPTION_NMI 
 exit_reason != EXIT_REASON_EPT_VIOLATION 
 exit_reason != EXIT_REASON_TASK_SWITCH))
 printk(KERN_WARNING %s: unexpected, valid vectoring info 
(0x%x) and exit reason is 0x%x\n,
__func__, vectoring_info, exit_reason);
 
  since it's easily caused by the guest.
  
  Yes, i will do these.
  
 
  Third, it's really unexpected.  It seems the guest was attempting to 
  deliver a page fault exception (0x0e) but encountered an mmio page during 
  delivery (in the IDT, TSS, stack, or page tables).  Is this reproducible?  
  If so it's easy to patch kvm to halt in that case and allow examining the 
  guest via qemu.
 
  
  Have no idea yet why the box was frozen under this case, will try to write 
  a test case,
  hope it can help me to find the reason out.
  
 
 Still did not know why linux kernel triggered it. I have posted
 a patchset to report an internal error for this case, hoping
 Fengguang can reproduce it after the patchset and Qemu's dump
 can help us to find the reason out.
 
 I will keep working on it.

Thanks! Shall I run some patched kernel, or just 3.6.0?

Another problem I sometimes run into is, dmesg no longer works in the
test boxes that run lots of KVMs. It aborts with an error message:

dmesg: klogctl failed: Bad address

Thanks,
Fengguang
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.5.0 BUG] vmx_handle_exit: unexpected, valid vectoring info (0x80000b0e)

2012-10-17 Thread Xiao Guangrong
On 10/17/2012 02:43 PM, Fengguang Wu wrote:
 On Wed, Oct 17, 2012 at 02:26:22PM +0800, Xiao Guangrong wrote:
 On 09/14/2012 01:57 PM, Xiao Guangrong wrote:
 On 09/12/2012 04:15 PM, Avi Kivity wrote:
 On 09/12/2012 07:40 AM, Fengguang Wu wrote:
 Hi,

 3 of my test boxes running v3.5 kernel become unaccessible and I find
 two of them kept emitting this dmesg:

 vmx_handle_exit: unexpected, valid vectoring info (0x8b0e) and exit 
 reason is 0x31

 The other one has froze and the above lines are the last dmesg.
 Any ideas?

 First, that printk should be rate-limited.

 Second, we should add EXIT_REASON_EPT_MISCONFIG (0x31) to 

if ((vectoring_info  VECTORING_INFO_VALID_MASK) 
(exit_reason != EXIT_REASON_EXCEPTION_NMI 
exit_reason != EXIT_REASON_EPT_VIOLATION 
exit_reason != EXIT_REASON_TASK_SWITCH))
printk(KERN_WARNING %s: unexpected, valid vectoring info 
   (0x%x) and exit reason is 0x%x\n,
   __func__, vectoring_info, exit_reason);

 since it's easily caused by the guest.

 Yes, i will do these.


 Third, it's really unexpected.  It seems the guest was attempting to 
 deliver a page fault exception (0x0e) but encountered an mmio page during 
 delivery (in the IDT, TSS, stack, or page tables).  Is this reproducible?  
 If so it's easy to patch kvm to halt in that case and allow examining the 
 guest via qemu.


 Have no idea yet why the box was frozen under this case, will try to write 
 a test case,
 hope it can help me to find the reason out.


 Still did not know why linux kernel triggered it. I have posted
 a patchset to report an internal error for this case, hoping
 Fengguang can reproduce it after the patchset and Qemu's dump
 can help us to find the reason out.

 I will keep working on it.
 
 Thanks! Shall I run some patched kernel, or just 3.6.0?

The patchset is under review. Can be found at:
https://lkml.org/lkml/2012/10/17/31

 
 Another problem I sometimes run into is, dmesg no longer works in the
 test boxes that run lots of KVMs. It aborts with an error message:
 
 dmesg: klogctl failed: Bad address

Interesting, will fight for it. :)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM_MAX_VCPUS

2012-10-17 Thread Gleb Natapov
On Wed, Oct 17, 2012 at 02:57:15AM +, Wei, Bing (WeiBing, MCXS-SH) wrote:
 For pCPU/core and VCPUS/logical cpu mapping, It should be 8 multiple. 254 is 
 reasonable. Or something I miss?
 
I am not sure what do you mean. Can you clarify?

 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf 
 Of Vinod, Chegu
 Sent: Sunday, October 14, 2012 9:43 PM
 To: Gleb Natapov
 Cc: Sasha Levin; KVM
 Subject: Re: KVM_MAX_VCPUS
 
 On 10/14/2012 2:08 AM, Gleb Natapov wrote:
  On Sat, Oct 13, 2012 at 10:32:13PM -0400, Sasha Levin wrote:
  On 10/13/2012 06:29 PM, Chegu Vinod wrote:
  Hello,
 
  Wanted to get a clarification about KVM_MAX_VCPUS(currently set to 254)
in kvm_host.h file. The kvm_vcpu *vcpus array is sized based on 
  KVM_MAX_VCPUS.
  (i.e. a max of 254 elements in the array).

  An 8bit APIC id should allow for 256 ID's. Reserving one for Broadcast 
  should
  leave 255 ID's.  Is there one more ID reserved for some other purpose ? 
  (hence
  leading to KVM_MAX_VCPUS being set to 254 and not 255).
  Another ID goes to the IO-APIC.
 
  This is not really needed on KVM. We can enlarge KVM_MAX_VCPUS to 255.
 
 Thanks for clarification!  ( We did suspect the IO-APIC...but weren't 
 quite sure).
 
 Vinod
 
  --
  Gleb.
  .
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [Patch]KVM: enabling per domain PLE

2012-10-17 Thread Hu, Xuekun
 
 The problem with this is that it requires an administrator to understand the
 workload, not only of the guest, but also of other guests on the machine.
 With low overcommit, a high PLE window reduces unneeded exits, but with
 high overcommit we need those exits to reduce spinning.
 
 In addition, most kvm hosts don't have an administrator.  They are controlled
 by a management system, which means we'll need some algorithm in
 userspace to control the PLE window.  Taking the two together, we need a
 dynamic (for changing workloads) algorithm.
 
 There are threads discussing this dynamic algorithm, we are making slow
 progress because it's such a difficult problem, but I think this is much more
 useful than anything requiring user intervention.

Avi, agreed that dynamic adaptive ple should be the best solution. However
currently it is a difficult problem like you said. Our solution just gives user
a choice who know how to set the two PLE values. So the solution is a compromise
solution, which should be better than nothing, for now? :-)

Your comments? 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.

2012-10-17 Thread Glauber Costa
On 10/17/2012 06:23 AM, Michael Wolf wrote:
 In the case of where you have a system that is running in a
 capped or overcommitted environment the user may see steal time
 being reported in accounting tools such as top or vmstat.  This can
 cause confusion for the end user.  To ease the confusion this patch set
 adds the idea of consigned (expected steal) time.  The host will separate
 the consigned time from the steal time.  The consignment limit passed to the
 host will be the amount of steal time expected within a fixed period of
 time.  Any other steal time accruing during that period will show as the
 traditional steal time.
 
 TODO:
 * Change native_clock to take params and not return a value
 * Change update_rq_clock_task
 
 Changes from V1:
 * Removed the steal time allowed percentage from the guest
 * Moved the separation of consigned (expected steal) and steal time to the
   host.
 * No longer include a sysctl interface.
 

You are showing this in the guest somewhere, but tools like top will
still not show it. So for quite a while, it achieves nothing.

Of course this is a barrier that any new statistic has to go through. So
while annoying, this is per-se ultimately not a blocker.

What I still fail to see, is how this is useful information to be shown
in the guest. Honestly, if I'm in a guest VM or container, any time
during which I am not running is time I lost. It doesn't matter if this
was expected or not. This still seems to me as a host-side problem, to
be solved entirely by tooling.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [3.5.0 BUG] vmx_handle_exit: unexpected, valid vectoring info (0x80000b0e)

2012-10-17 Thread Fengguang Wu
On Wed, Oct 17, 2012 at 03:04:49PM +0800, Xiao Guangrong wrote:
 On 10/17/2012 02:43 PM, Fengguang Wu wrote:
  On Wed, Oct 17, 2012 at 02:26:22PM +0800, Xiao Guangrong wrote:
  On 09/14/2012 01:57 PM, Xiao Guangrong wrote:
  On 09/12/2012 04:15 PM, Avi Kivity wrote:
  On 09/12/2012 07:40 AM, Fengguang Wu wrote:
  Hi,
 
  3 of my test boxes running v3.5 kernel become unaccessible and I find
  two of them kept emitting this dmesg:
 
  vmx_handle_exit: unexpected, valid vectoring info (0x8b0e) and exit 
  reason is 0x31
 
  The other one has froze and the above lines are the last dmesg.
  Any ideas?
 
  First, that printk should be rate-limited.
 
  Second, we should add EXIT_REASON_EPT_MISCONFIG (0x31) to 
 
   if ((vectoring_info  VECTORING_INFO_VALID_MASK) 
   (exit_reason != EXIT_REASON_EXCEPTION_NMI 
   exit_reason != EXIT_REASON_EPT_VIOLATION 
   exit_reason != EXIT_REASON_TASK_SWITCH))
   printk(KERN_WARNING %s: unexpected, valid vectoring info 
  (0x%x) and exit reason is 0x%x\n,
  __func__, vectoring_info, exit_reason);
 
  since it's easily caused by the guest.
 
  Yes, i will do these.
 
 
  Third, it's really unexpected.  It seems the guest was attempting to 
  deliver a page fault exception (0x0e) but encountered an mmio page 
  during delivery (in the IDT, TSS, stack, or page tables).  Is this 
  reproducible?  If so it's easy to patch kvm to halt in that case and 
  allow examining the guest via qemu.
 
 
  Have no idea yet why the box was frozen under this case, will try to 
  write a test case,
  hope it can help me to find the reason out.
 
 
  Still did not know why linux kernel triggered it. I have posted
  a patchset to report an internal error for this case, hoping
  Fengguang can reproduce it after the patchset and Qemu's dump
  can help us to find the reason out.
 
  I will keep working on it.
  
  Thanks! Shall I run some patched kernel, or just 3.6.0?
 
 The patchset is under review. Can be found at:
 https://lkml.org/lkml/2012/10/17/31

Thanks, I'll try it.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 06/19] Implement -dimm command line option

2012-10-17 Thread Vasilis Liaskovitis
On Sat, Oct 13, 2012 at 08:57:19AM +, Blue Swirl wrote:
 On Tue, Oct 9, 2012 at 5:04 PM, Vasilis Liaskovitis
 vasilis.liaskovi...@profitbricks.com wrote:
 
snip
  Maybe even the dimmbus device shouldn't exist by itself after all, or
  it should be pretty much invisible to users. On real HW, the memory
  controller or south bridge handles the memory. For i440fx, it's part
  of the same chipset. So I think we should just add qdev properties to
  i440fx to specify the sizes, nodes etc. Then i440fx should create the
  dimmbus device unconditionally using the properties. The default
  properties should create a sane configuration, otherwise -global
  i440fx.dimm_size=512M etc. could be used. Then the bus would be
  populated as before or with device_add.
 
  hmm the problem with using only i440fx properties, is that size/nodes look
  dimm specific to me, not chipset-memcontroller specific. Unless we only 
  allow
  uniform size dimms. Is it possible to have a dynamic list of sizes/nodes 
  pairs as
  properties of a qdev device?
 
 I don't think so, but probably there's a limit of DIMMs that real
 controllers have, something like 8 max.

In the case of i440fx specifically, do you mean that we should model the DRB
(Dram row boundary registers in section 3.2.19 of the i440fx spec) ?

The i440fx DRB registers only supports up to 8 DRAM rows (let's say 1 row
maps 1-1 to a DimmDevice for this discussion) and only supports up to 2GB of
memory afaict (bit 31 and above is ignored).

I 'd rather not model this part of the i440fx - having only 8 DIMMs seems too
restrictive. The rest of the patchset supports up to 255 DIMMs so it would be a
waste imho to model an old pc memory controller that only supports 8 DIMMs.

There was also an old discussion about i440fx modeling here:
https://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg02705.html
the general direction was that i440fx is too old and we don't want to precisely
emulate the DRB registers, since they lack flexibility.

Possible solutions:

1) is there a newer and more flexible chipset that we could model?

2) model and document a generic (non-existent) i440fx that would support more
and larger DIMMs. E.g. support 255 DIMMs. If we want to use a description
similar to the i440fx DRB registers, the registers would take up a lot of space.
In i440fx there is one 8-bit DRB register per DIMM, and DRB[i] describes how
many 8MB chunks are contained in DIMMs 0...i. So, the register values are
cumulative (and total described memory cannot exceed 256x8MB = 2GB)

We could for example model: 
- an 8-bit non-cumulative register for each DIMM, denoting how many
128MB chunks it contains. This allowes 32GB for each DIMM, and with 255 DIMMs we
describe a bit less than 8TB. These registers require 255 bytes.
- a 16-bit cumulative register for each DIMM again for 128MB chunks. This allows
us to describe 8TB of memory (but the registers take up double the space, 
because
they describe cumulative memory amounts)

3) let everything be handled/abstracted by dimmbus - the chipset DRB modelling
is not done (at least for i440fx, other machines could). This is the least 
precise
in terms of emulation. On the other hand, if we are not really trying to emulate
the real (too restrictive) hardware, does it matter?

thanks,

- Vasilis
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM on NFS

2012-10-17 Thread Andrew Holway
Hello,

I am testing KVM on an Oracle NFS box that I have.

Does the list have any advice on best practice? I remember reading that there 
is stuff you can do with I/O schedulers and stuff to make it more efficient.

My VMs will primarily be running mysql databases. I am currently using o_direct.

Thanks,

Andrew



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM ept flush

2012-10-17 Thread Avi Kivity
On 10/16/2012 08:50 PM, Rohan Sharma wrote:
 Thanks for the reply.
 I have one more question.
 If I do munmap of the RAM allocated in qemu,
 will the changes be reflected in KVM Ept.

Yes.  Those changes will be reflected.  See
kvm_mmu_notifier_invalidate_page(), and related.


 I guess there is some mmu notifier which ensures that entries of EPT
 are synced with the host entries.
 
 On Tue, Oct 16, 2012 at 8:27 PM, Avi Kivity a...@redhat.com wrote:
 On 10/16/2012 01:57 PM, Rohan Sharma wrote:
 Is there a way to flush ept entries in qemu-kvm.

 No.


 --
 error compiling committee.c: too many arguments to function
 


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH v3 06/19] Implement -dimm command line option

2012-10-17 Thread Avi Kivity
On 10/17/2012 11:19 AM, Vasilis Liaskovitis wrote:
 
 I don't think so, but probably there's a limit of DIMMs that real
 controllers have, something like 8 max.
 
 In the case of i440fx specifically, do you mean that we should model the DRB
 (Dram row boundary registers in section 3.2.19 of the i440fx spec) ?
 
 The i440fx DRB registers only supports up to 8 DRAM rows (let's say 1 row
 maps 1-1 to a DimmDevice for this discussion) and only supports up to 2GB of
 memory afaict (bit 31 and above is ignored).
 
 I 'd rather not model this part of the i440fx - having only 8 DIMMs seems too
 restrictive. The rest of the patchset supports up to 255 DIMMs so it would be 
 a
 waste imho to model an old pc memory controller that only supports 8 DIMMs.
 
 There was also an old discussion about i440fx modeling here:
 https://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg02705.html
 the general direction was that i440fx is too old and we don't want to 
 precisely
 emulate the DRB registers, since they lack flexibility.
 
 Possible solutions:
 
 1) is there a newer and more flexible chipset that we could model?

Look for q35 on this list.

 
 2) model and document 
 ^--- the critical bit

 a generic (non-existent) i440fx that would support more
 and larger DIMMs. E.g. support 255 DIMMs. If we want to use a description
 similar to the i440fx DRB registers, the registers would take up a lot of 
 space.
 In i440fx there is one 8-bit DRB register per DIMM, and DRB[i] describes how
 many 8MB chunks are contained in DIMMs 0...i. So, the register values are
 cumulative (and total described memory cannot exceed 256x8MB = 2GB)

Our i440fx has already been extended by support for pci and cpu hotplug,
and I see no reason not to extend it for memory.  We can allocate extra
mmio space for registers if needed.  Usually I'm against this sort of
thing, but in this case we don't have much choice.

 
 We could for example model: 
 - an 8-bit non-cumulative register for each DIMM, denoting how many
 128MB chunks it contains. This allowes 32GB for each DIMM, and with 255 DIMMs 
 we
 describe a bit less than 8TB. These registers require 255 bytes.
 - a 16-bit cumulative register for each DIMM again for 128MB chunks. This 
 allows
 us to describe 8TB of memory (but the registers take up double the space, 
 because
 they describe cumulative memory amounts)

There is no reason to save space.  Why not have two 64-bit registers per
DIMM, one describing the size and the other the base address, both in
bytes?  Use a few low order bits for control.

 
 3) let everything be handled/abstracted by dimmbus - the chipset DRB modelling
 is not done (at least for i440fx, other machines could). This is the least 
 precise
 in terms of emulation. On the other hand, if we are not really trying to 
 emulate
 the real (too restrictive) hardware, does it matter?

We could emulate base memory using the chipset, and extra memory using
the scheme above.  This allows guests that are tied to the chipset to
work, and guests that have more awareness (seabios) to use the extra
features.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch]KVM: enabling per domain PLE

2012-10-17 Thread Avi Kivity
On 10/17/2012 10:02 AM, Hu, Xuekun wrote:
 
 The problem with this is that it requires an administrator to understand the
 workload, not only of the guest, but also of other guests on the machine.
 With low overcommit, a high PLE window reduces unneeded exits, but with
 high overcommit we need those exits to reduce spinning.
 
 In addition, most kvm hosts don't have an administrator.  They are controlled
 by a management system, which means we'll need some algorithm in
 userspace to control the PLE window.  Taking the two together, we need a
 dynamic (for changing workloads) algorithm.
 
 There are threads discussing this dynamic algorithm, we are making slow
 progress because it's such a difficult problem, but I think this is much more
 useful than anything requiring user intervention.
 
 Avi, agreed that dynamic adaptive ple should be the best solution. However
 currently it is a difficult problem like you said. Our solution just gives 
 user
 a choice who know how to set the two PLE values. So the solution is a 
 compromise
 solution, which should be better than nothing, for now? :-)

Let's see how the PLE thread works out.  Yes the patches give the user
control, but we need to make sure the user knows how to control it (in
fact your patch doesn't even update the documentation).  Just throwing
out a new ioctl, even if it is documented, doesn't mean that userspace
will begin to use it, or that users will exploit it.

Do you have a specific use case in mind?

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary

2012-10-17 Thread Avi Kivity
On 10/17/2012 04:28 AM, Zhang Yanfei wrote:
 于 2012年10月15日 23:43, Avi Kivity 写道:
 On 10/12/2012 08:40 AM, Zhang Yanfei wrote:
 Currently, kdump just makes all the logical processors leave VMX operation 
 by
 executing VMXOFF instruction, so any VMCSs active on the logical processors 
 may
 be corrupted. But, sometimes, we need the VMCSs to debug guest images 
 contained
 in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs 
 before
 executing the VMXOFF instruction.
 
 How have you verified that VMXOFF doesn't flush cached VMCSs already?
 
 
 I tried some tests, for example, I made copies for every vmcs, and in the 
 kdump
 path, I backed up all the loaded vmcs into the copies before vmxoff.
 After generating the vmcore, I retrieve the vmcss and their copies, and 
 compare them,
 no differences.
 
 Another test is using VMCLEAR to clear all the loaded vmcs before VMXOFF,
 and compare the vmcss and their copies, there are indeed differences between 
 the
 vmcs and its copy.
 
 I know the tests may be not so convincing, for example, I used memcpy to back 
 up
 the vmcss and it is an ordinary memory operation. But to ensure the 
 non-corruption
 of the vmcss in the vmcore, I think we should VMCLEAR the vmcss before VMXOFF 
 just
 as the Intel spec says.

Sorry, I was unclear -- I was referring to the spec, I wasn't sure
whether VMXOFF is defined to flush VMCSes or whether it just invalidates
on-chip caches so that it won't flush them out in the future, corrupting
memory.  We don't want to depend on actual behaviour as it may change
with future version.

Copying some Intel folk, maybe they can clarify it.

 

 The patch set provides an alternative way to clear VMCSs related to guests
 on all cpus when host is doing kdump.

 
 I'm not sure the sysctl is really necessary.  The only reason to turn if
 off is if the corruption is so severe that the loaded vmcs list itself
 causes a crash.  I think it should be rare enough that we can do it
 unconditionally.
 
 
 You mean not using sysctl and just let VMCLEAR-VMCSS be a default behaviour? 
 If so,
 I agree with you.

Yes, that's what I meant.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Secure migration of LVM based guests over WAN

2012-10-17 Thread Lukas Laukamp

Am 16.10.2012 12:10, schrieb Avi Kivity:

On 10/16/2012 11:48 AM, Lukas Laukamp wrote:

Am 16.10.2012 11:40, schrieb Avi Kivity:

On 10/16/2012 11:12 AM, Lukas Laukamp wrote:

Hey all,

I have a question about a solution for migrate LVM based guests directly
over the network.

So the situation: Two KVM hosts with libvirt, multiple LVM based guests
Want to do: Migrate a LVM based guest directly to the other host over an
secure connection

I know that migration is possible when the VM disks are stored on an
NFS, GFS2 filer/cluster etc.

So would it be possible to do an offline migration directly with netcat
or something like that?


If all you need is offline, you can use scp to copy each volume to the
destination volume.  Make sure the guests are shut down when you do that.

It is also possible to do a live migration, but unless the destination
and source are in the same IP subnet, the guests are going to lose
connectivity.



Hello Avi,

so can I simply copy an logical volume to the path of the volume group
with scp?

Yes.  Best to enable compression to avoid sending zero blocks.


For the live migration theme, it would be no problem when the guests
looses connectivity, how could be done a live migration?


See the -b option to the migrate command.



I will read a little bit about the live migration theme.

Best Regards
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT

2012-10-17 Thread Avi Kivity
On 10/16/2012 10:03 PM, Anthony Liguori wrote:

 This forces userspace to dedicate a thread for the HPT.
 
 If no changes are available, does read return a size  0?  I don't think
 it's necessary to support polling.  The kernel should always be able to
 respond to userspace here.  The only catch is whether to return !0 read
 sizes when there are no changes.
 
 At any case, I can't see why a dedicated thread is needed.  QEMU is
 going to poll HPT based on how fast we can send data over the wire.

That means spinning if we can send the data faster than we dirty it.
But we do that anyway for memory.



-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT

2012-10-17 Thread Avi Kivity
On 10/16/2012 11:52 PM, Paul Mackerras wrote:
 On Tue, Oct 16, 2012 at 03:06:33PM +0200, Avi Kivity wrote:
 On 10/16/2012 01:58 PM, Paul Mackerras wrote:
  On Tue, Oct 16, 2012 at 12:06:58PM +0200, Avi Kivity wrote:
  Does/should the fd support O_NONBLOCK and poll? (=waiting for an entry
  to change).
  
  No.
 
 This forces userspace to dedicate a thread for the HPT.
 
 Why? Reads never block in any case.

Ok.  This parallels KVM_GET_DIRTY_LOG.

 
 I meant the internal data structure that holds HPT entries.
 
 Oh, that's just an array, and userspace already knows how big it is.
 
 I guess I don't understand the index.  Do we expect changes to be in
 contiguous ranges?  And invalid entries to be contiguous as well?  That
 doesn't fit with how hash tables work.  Does the index represent the
 position of the entry within the table, or something else?
 
 The index is just the position in the array.  Typically, in each group
 of 8 it will tend to be the low-numbered ones that are valid, since
 creating an entry usually uses the first empty slot.  So I expect that
 on the first pass, most of the records will represent 8 HPTEs.  On
 subsequent passes, probably most records will represent a single HPTE.

So it's a form of RLE compression.  Ok.

 
 16MiB is transferred in ~0.15 sec on GbE, much faster with 10GbE.  Does
 it warrant a live migration protocol?
 
 The qemu people I talked to seemed to think so.
 
  Because it is a hash table, updates tend to be scattered throughout
  the whole table, which is another reason why per-page dirty tracking
  and updates would be pretty inefficient.
 
 This suggests a stream format that includes the index in every entry.
 
 That would amount to dropping the n_valid and n_invalid fields from
 the current header format.  That would be less efficient for the
 initial pass (assuming we achieve an average n_valid of at least 2 on
 the initial pass), and probably less efficient for the incremental
 updates, since a newly-invalidated entry would have to be represented
 as 16 zero bytes rather than just an 8-byte header with n_valid=0 and
 n_invalid=1.  I'm assuming here that the initial pass would omit
 invalid entries.

I agree.  But let's have some measurements to make sure.

 
  
  As for the change rate, it depends on the application of course, but
  basically every time the guest changes a PTE in its Linux page tables
  we do the corresponding change to the corresponding HPT entry, so the
  rate can be quite high.  Workloads that do a lot of fork, exit, mmap,
  exec, etc. have a high rate of HPT updates.
 
 If the rate is high enough, then there's no point in a live update.
 
 True, but doesn't that argument apply to memory pages as well?

In some cases it does.  The question is what happens in practice.  If
you migrate a kernel build, how many entries are sent in the guest
stopped phase?


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Added call parameter to track whether invocation originated with guest or elsewhere

2012-10-17 Thread Avi Kivity
On 10/17/2012 04:10 AM, Will Auld wrote:
 Signed-off-by: Will Auld will.a...@intel.com
 ---
 
 Resending to full list
 
 Marcelo,
 
 This patch is what I believe you ask for as foundational for later
 patches to address IA32_TSC_ADJUST. 
 

Please write a changelog to reflect the motivation.

All those bool parameters scattered all over the place aren't very
pretty.  Usually we solve this with helpers that embed the parameter
name (kvm_set_msr() vs. kvm_set_msr_host()) but there are too many
functions for this to work here.

Marcelo, any ideas?

 Thanks,
 
 Will
 
  arch/x86/include/asm/kvm_host.h |  8 
  arch/x86/kvm/svm.c  | 18 ++
  arch/x86/kvm/vmx.c  | 18 ++
  arch/x86/kvm/x86.c  | 18 ++
  arch/x86/kvm/x86.h  |  2 +-
  5 files changed, 35 insertions(+), 29 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index 09155d6..c06f0d1 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -621,7 +621,7 @@ struct kvm_x86_ops {
   void (*set_guest_debug)(struct kvm_vcpu *vcpu,
   struct kvm_guest_debug *dbg);
   int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
 - int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
 + int (*set_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 data, bool 
 guest_initiated);
   u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg);
   void (*get_segment)(struct kvm_vcpu *vcpu,
   struct kvm_segment *var, int seg);
 @@ -684,7 +684,7 @@ struct kvm_x86_ops {
   bool (*has_wbinvd_exit)(void);
  
   void (*set_tsc_khz)(struct kvm_vcpu *vcpu, u32 user_tsc_khz, bool 
 scale);
 - void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset);
 + void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset, bool 
 guest_initiated);
  
   u64 (*compute_tsc_offset)(struct kvm_vcpu *vcpu, u64 target_tsc);
   u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu);
 @@ -772,7 +772,7 @@ static inline int emulate_instruction(struct kvm_vcpu 
 *vcpu,
  
  void kvm_enable_efer_bits(u64);
  int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
 -int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data);
 +int kvm_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data, bool 
 guest_initiated);
  
  struct x86_emulate_ctxt;
  
 @@ -799,7 +799,7 @@ void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, 
 int *l);
  int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);
  
  int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
 -int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data);
 +int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool 
 guest_initiated);
  
  unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu);
  void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags);
 diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
 index baead95..424be27 100644
 --- a/arch/x86/kvm/svm.c
 +++ b/arch/x86/kvm/svm.c
 @@ -1012,7 +1012,8 @@ static void svm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 
 user_tsc_khz, bool scale)
   svm-tsc_ratio = ratio;
  }
  
 -static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 +static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset, 
 + bool guest_initiated)
  {
   struct vcpu_svm *svm = to_svm(vcpu);
   u64 g_tsc_offset = 0;
 @@ -1255,7 +1256,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm 
 *kvm, unsigned int id)
   svm-vmcb_pa = page_to_pfn(page)  PAGE_SHIFT;
   svm-asid_generation = 0;
   init_vmcb(svm);
 - kvm_write_tsc(svm-vcpu, 0);
 + kvm_write_tsc(svm-vcpu, 0, false /*Not Guest Initiated*/);
  
   err = fx_init(svm-vcpu);
   if (err)
 @@ -3147,13 +3148,14 @@ static int svm_set_vm_cr(struct kvm_vcpu *vcpu, u64 
 data)
   return 0;
  }
  
 -static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data)
 +static int svm_set_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 data, 
 + bool guest_initiated)
  {
   struct vcpu_svm *svm = to_svm(vcpu);
  
   switch (ecx) {
   case MSR_IA32_TSC:
 - kvm_write_tsc(vcpu, data);
 + kvm_write_tsc(vcpu, data, guest_initiated);
   break;
   case MSR_STAR:
   svm-vmcb-save.star = data;
 @@ -3208,12 +3210,12 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, 
 unsigned ecx, u64 data)
   vcpu_unimpl(vcpu, unimplemented wrmsr: 0x%x data 0x%llx\n, 
 ecx, data);
   break;
   default:
 - return kvm_set_msr_common(vcpu, ecx, data);
 + return kvm_set_msr_common(vcpu, ecx, data, guest_initiated);
   }
   return 0;
  }
  
 -static int wrmsr_interception(struct vcpu_svm *svm)
 +static int wrmsr_interception(struct vcpu_svm *svm, 

Re: KVM on NFS

2012-10-17 Thread Avi Kivity
On 10/17/2012 11:20 AM, Andrew Holway wrote:
 Hello,
 
 I am testing KVM on an Oracle NFS box that I have.
 
 Does the list have any advice on best practice? I remember reading that there 
 is stuff you can do with I/O schedulers and stuff to make it more efficient.
 
 My VMs will primarily be running mysql databases. I am currently using 
 o_direct.
 

O_DIRECT is good.  I/O schedulers don't affect NFS so no need to tune
anything on the host.  You might experiment with switching to the
deadline scheduler in the guest.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM on NFS

2012-10-17 Thread Andrew Holway


 O_DIRECT is good.  I/O schedulers don't affect NFS so no need to tune
 anything on the host.  You might experiment with switching to the
 deadline scheduler in the guest.

Ill give it a go. Any ideas how I should be tuning my NFS?

 
 
 -- 
 error compiling committee.c: too many arguments to function
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM on NFS

2012-10-17 Thread Avi Kivity
On 10/17/2012 01:04 PM, Andrew Holway wrote:
 
 
 O_DIRECT is good.  I/O schedulers don't affect NFS so no need to tune
 anything on the host.  You might experiment with switching to the
 deadline scheduler in the guest.
 
 Ill give it a go. Any ideas how I should be tuning my NFS?

Not really.  The defaults should work well enough.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I/O errors in guest OS after repeated migration

2012-10-17 Thread Guido Winkelmann
Am Dienstag, 16. Oktober 2012, 12:44:27 schrieb Brian Jackson:
 On Tuesday, October 16, 2012 11:33:44 AM Guido Winkelmann wrote:
[...]
  The commandline, as generated by libvirtd, looks like this:
  
  LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
  QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.15 -enable-kvm -m 1024
  -smp 1,sockets=1,cores=1,threads=1 -name migratetest2 -uuid
  ddbf11e9-387e-902b-4849-8c3067dc42a2 -nodefconfig -nodefaults -chardev
  socket,id=charmonitor,path=/var/lib/libvirt/qemu/migratetest2.monitor,serv
  e
  r,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
  -no-reboot -no- shutdown -device
  piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
  file=/data/migratetest2_system,if=none,id=drive-virtio-
  disk0,format=qcow2,cache=none -device virtio-blk-
  pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-
  disk0,bootindex=1 -drive file=/data/migratetest2_data-1,if=none,id=drive-
  virtio-disk1,format=qcow2,cache=none -device virtio-blk-
  pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -
  netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-
  pci,netdev=hostnet0,id=net0,mac=02:00:00:00:00:0c,bus=pci.0,addr=0x3 -vnc
  127.0.0.1:2,password -k de -vga cirrus -incoming tcp:0.0.0.0:49153 -device
  virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
 
 I see qcow2 in there. Live migration of qcow2 was a new feature in 1.0. Have
 you tried other formats or different qemu/kvm versions?

Are you sure about that? Because I'm fairly certain I have been using live 
migration since at least 0.14, if not 0.13, and I have always been using qcow2 
as the image format for the disks...

I can still try with other image formats, though.

Guido
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 1/6] KVM: MMU: fix release noslot pfn

2012-10-17 Thread Avi Kivity
On 10/16/2012 02:07 PM, Xiao Guangrong wrote:
 We can not directly call kvm_release_pfn_clean to release the pfn
 since we can meet noslot pfn which is used to cache mmio info into
 spte

Applied to master for 3.7, 3.6, thanks.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/10] UAPI: Put a comment into uapi/asm-generic/kvm_para.h and use it from arches

2012-10-17 Thread David Howells
Make uapi/asm-generic/kvm_para.h non-empty by addition of a comment to stop
the patch program from deleting it when it creates it.

Then delete empty arch-specific uapi/asm/kvm_para.h files and tell the Kbuild
files to use the generic instead.

Should this perhaps instead be a #warning or #error that the facility is
unsupported on this arch?

Signed-off-by: David Howells dhowe...@redhat.com
cc: Arnd Bergmann a...@arndb.de
cc: Avi Kivity a...@redhat.com
cc: Marcelo Tosatti mtosa...@redhat.com
cc: kvm@vger.kernel.org
---

 arch/ia64/include/uapi/asm/Kbuild |2 ++
 arch/ia64/include/uapi/asm/kvm_para.h |0 
 arch/s390/include/uapi/asm/Kbuild |2 ++
 arch/s390/include/uapi/asm/kvm_para.h |0 
 include/uapi/asm-generic/kvm_para.h   |4 
 5 files changed, 8 insertions(+)
 delete mode 100644 arch/ia64/include/uapi/asm/kvm_para.h
 delete mode 100644 arch/s390/include/uapi/asm/kvm_para.h

diff --git a/arch/ia64/include/uapi/asm/Kbuild 
b/arch/ia64/include/uapi/asm/Kbuild
index 30cafac..1b3f5eb 100644
--- a/arch/ia64/include/uapi/asm/Kbuild
+++ b/arch/ia64/include/uapi/asm/Kbuild
@@ -1,6 +1,8 @@
 # UAPI Header export list
 include include/uapi/asm-generic/Kbuild.asm
 
+generic-y += kvm_para.h
+
 header-y += auxvec.h
 header-y += bitsperlong.h
 header-y += break.h
diff --git a/arch/ia64/include/uapi/asm/kvm_para.h 
b/arch/ia64/include/uapi/asm/kvm_para.h
deleted file mode 100644
index e69de29..000
diff --git a/arch/s390/include/uapi/asm/Kbuild 
b/arch/s390/include/uapi/asm/Kbuild
index 7bf68ff..59b67ed 100644
--- a/arch/s390/include/uapi/asm/Kbuild
+++ b/arch/s390/include/uapi/asm/Kbuild
@@ -1,6 +1,8 @@
 # UAPI Header export list
 include include/uapi/asm-generic/Kbuild.asm
 
+generic-y += kvm_para.h
+
 header-y += auxvec.h
 header-y += bitsperlong.h
 header-y += byteorder.h
diff --git a/arch/s390/include/uapi/asm/kvm_para.h 
b/arch/s390/include/uapi/asm/kvm_para.h
deleted file mode 100644
index e69de29..000
diff --git a/include/uapi/asm-generic/kvm_para.h 
b/include/uapi/asm-generic/kvm_para.h
index e69de29..486f0af 100644
--- a/include/uapi/asm-generic/kvm_para.h
+++ b/include/uapi/asm-generic/kvm_para.h
@@ -0,0 +1,4 @@
+/*
+ * There isn't anything here, but the file must not be empty or patch
+ * will delete it.
+ */

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/10] UAPI: Put a comment into uapi/asm-generic/kvm_para.h and use it from arches

2012-10-17 Thread Arnd Bergmann
On Wednesday 17 October 2012, David Howells wrote:
 Make uapi/asm-generic/kvm_para.h non-empty by addition of a comment to stop
 the patch program from deleting it when it creates it.
 
 Then delete empty arch-specific uapi/asm/kvm_para.h files and tell the Kbuild
 files to use the generic instead.
 
 Should this perhaps instead be a #warning or #error that the facility is
 unsupported on this arch?

Just an empty file is fine by me, but an #error also sounds reasonable if
we want users to be able to write autoconf tests for it.

 Signed-off-by: David Howells dhowe...@redhat.com
 cc: Arnd Bergmann a...@arndb.de
 cc: Avi Kivity a...@redhat.com
 cc: Marcelo Tosatti mtosa...@redhat.com
 cc: kvm@vger.kernel.org

Acked-by: Arnd Bergmann a...@arndb.de
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Added call parameter to track whether invocation originated with guest or elsewhere

2012-10-17 Thread Marcelo Tosatti
On Wed, Oct 17, 2012 at 12:35:33PM +0200, Avi Kivity wrote:
 On 10/17/2012 04:10 AM, Will Auld wrote:
  Signed-off-by: Will Auld will.a...@intel.com
  ---
  
  Resending to full list
  
  Marcelo,
  
  This patch is what I believe you ask for as foundational for later
  patches to address IA32_TSC_ADJUST. 
  
 
 Please write a changelog to reflect the motivation.
 
 All those bool parameters scattered all over the place aren't very
 pretty.  Usually we solve this with helpers that embed the parameter
 name (kvm_set_msr() vs. kvm_set_msr_host()) but there are too many
 functions for this to work here.
 
 Marcelo, any ideas?

Its easier to read

kvm_x86_ops-kvm_set_msr()
kvm_x86_ops-kvm_set_msr_host()

then

kvm_x86_ops-kvm_set_msr(,false)
kvm_x86_ops-kvm_set_msr(,true)

So you're right.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/10] UAPI: Put a comment into uapi/asm-generic/kvm_para.h and use it from arches

2012-10-17 Thread David Howells
Make uapi/asm-generic/kvm_para.h non-empty by addition of a comment to stop
the patch program from deleting it when it creates it.

Then delete empty arch-specific uapi/asm/kvm_para.h files and tell the Kbuild
files to use the generic instead.

Should this perhaps instead be a #warning or #error that the facility is
unsupported on this arch?

Signed-off-by: David Howells dhowe...@redhat.com
cc: Arnd Bergmann a...@arndb.de
cc: Avi Kivity a...@redhat.com
cc: Marcelo Tosatti mtosa...@redhat.com
cc: kvm@vger.kernel.org
---

 arch/ia64/include/uapi/asm/Kbuild |2 ++
 arch/ia64/include/uapi/asm/kvm_para.h |0 
 arch/s390/include/uapi/asm/Kbuild |2 ++
 arch/s390/include/uapi/asm/kvm_para.h |0 
 include/uapi/asm-generic/kvm_para.h   |4 
 5 files changed, 8 insertions(+)
 delete mode 100644 arch/ia64/include/uapi/asm/kvm_para.h
 delete mode 100644 arch/s390/include/uapi/asm/kvm_para.h

diff --git a/arch/ia64/include/uapi/asm/Kbuild 
b/arch/ia64/include/uapi/asm/Kbuild
index 30cafac..1b3f5eb 100644
--- a/arch/ia64/include/uapi/asm/Kbuild
+++ b/arch/ia64/include/uapi/asm/Kbuild
@@ -1,6 +1,8 @@
 # UAPI Header export list
 include include/uapi/asm-generic/Kbuild.asm
 
+generic-y += kvm_para.h
+
 header-y += auxvec.h
 header-y += bitsperlong.h
 header-y += break.h
diff --git a/arch/ia64/include/uapi/asm/kvm_para.h 
b/arch/ia64/include/uapi/asm/kvm_para.h
deleted file mode 100644
index e69de29..000
diff --git a/arch/s390/include/uapi/asm/Kbuild 
b/arch/s390/include/uapi/asm/Kbuild
index 7bf68ff..59b67ed 100644
--- a/arch/s390/include/uapi/asm/Kbuild
+++ b/arch/s390/include/uapi/asm/Kbuild
@@ -1,6 +1,8 @@
 # UAPI Header export list
 include include/uapi/asm-generic/Kbuild.asm
 
+generic-y += kvm_para.h
+
 header-y += auxvec.h
 header-y += bitsperlong.h
 header-y += byteorder.h
diff --git a/arch/s390/include/uapi/asm/kvm_para.h 
b/arch/s390/include/uapi/asm/kvm_para.h
deleted file mode 100644
index e69de29..000
diff --git a/include/uapi/asm-generic/kvm_para.h 
b/include/uapi/asm-generic/kvm_para.h
index e69de29..486f0af 100644
--- a/include/uapi/asm-generic/kvm_para.h
+++ b/include/uapi/asm-generic/kvm_para.h
@@ -0,0 +1,4 @@
+/*
+ * There isn't anything here, but the file must not be empty or patch
+ * will delete it.
+ */

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Added call parameter to track whether invocation originated with guest or elsewhere

2012-10-17 Thread Avi Kivity
On 10/17/2012 04:09 PM, Marcelo Tosatti wrote:
 On Wed, Oct 17, 2012 at 12:35:33PM +0200, Avi Kivity wrote:
 On 10/17/2012 04:10 AM, Will Auld wrote:
  Signed-off-by: Will Auld will.a...@intel.com
  ---
  
  Resending to full list
  
  Marcelo,
  
  This patch is what I believe you ask for as foundational for later
  patches to address IA32_TSC_ADJUST. 
  
 
 Please write a changelog to reflect the motivation.
 
 All those bool parameters scattered all over the place aren't very
 pretty.  Usually we solve this with helpers that embed the parameter
 name (kvm_set_msr() vs. kvm_set_msr_host()) but there are too many
 functions for this to work here.
 
 Marcelo, any ideas?
 
 Its easier to read
 
 kvm_x86_ops-kvm_set_msr()
 kvm_x86_ops-kvm_set_msr_host()
 
 then
 
 kvm_x86_ops-kvm_set_msr(,false)
 kvm_x86_ops-kvm_set_msr(,true)
 
 So you're right.

Yes, but we have a million functions for setting MSRs.

Maybe

struct msr {
bool host_requested;
u32 index;
u64 data;
};

and change all the APIs to use that.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/6] KVM: MMU: remove mmu_is_invalid

2012-10-17 Thread Avi Kivity
On 10/16/2012 02:08 PM, Xiao Guangrong wrote:
 Remove mmu_is_invalid and use is_invalid_pfn instead


Applied 2-5 to next; 6 depends on 1, so will wait until it is merged
upstream.



-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: PPC: Support ioeventfd

2012-10-17 Thread Avi Kivity
On 10/16/2012 04:49 PM, Alexander Graf wrote:

 If there is a lot of prioritization and/or queuing logic, then yes.  But
 what about MSI?  Doesn't that have a direct path?
 
 Nope. Well, yes, in a certain special case where the MPIC pushes the
 interrupt vector on interrupt delivery into a special register. But not
 for the normal case.

Ok.  The patches are fine then, but would be good to add the PIO check.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv4 0/2] kvm: direct msix injection

2012-10-17 Thread Michael S. Tsirkin
We can deliver certain interrupts, notably MSIX,
from atomic context.
Here's an untested patch to do this (compiled only).

Changes from v2:
Don't inject broadcast interrupts directly
Changes from v1:
Tried to address comments from v1, except unifying
with kvm_set_irq: passing flags to it looks too ugly.
Added a comment.

Jan, you said you can test this?


Michael S. Tsirkin (2):
  kvm: add kvm_set_irq_inatomic
  kvm: deliver msi interrupts from irq handler

 include/linux/kvm_host.h |  1 +
 virt/kvm/assigned-dev.c  | 36 +++--
 virt/kvm/irq_comm.c  | 83 +---
 3 files changed, 98 insertions(+), 22 deletions(-)

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv4 1/2] kvm: add kvm_set_irq_inatomic

2012-10-17 Thread Michael S. Tsirkin
Add an API to inject IRQ from atomic context.
Return EWOULDBLOCK if impossible (e.g. for multicast).
Only MSI is supported ATM.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 include/linux/kvm_host.h |  1 +
 virt/kvm/irq_comm.c  | 83 +---
 2 files changed, 72 insertions(+), 12 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 93bfc9f..e165c09 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -677,6 +677,7 @@ void kvm_get_intr_delivery_bitmask(struct kvm_ioapic 
*ioapic,
   unsigned long *deliver_bitmask);
 #endif
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level);
+int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int 
level);
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *irq_entry, struct kvm 
*kvm,
int irq_source_id, int level);
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 2eb58af..656fa45 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -102,6 +102,23 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
return r;
 }
 
+static inline void kvm_set_msi_irq(struct kvm_kernel_irq_routing_entry *e,
+  struct kvm_lapic_irq *irq)
+{
+   trace_kvm_msi_set_irq(e-msi.address_lo, e-msi.data);
+
+   irq-dest_id = (e-msi.address_lo 
+   MSI_ADDR_DEST_ID_MASK)  MSI_ADDR_DEST_ID_SHIFT;
+   irq-vector = (e-msi.data 
+   MSI_DATA_VECTOR_MASK)  MSI_DATA_VECTOR_SHIFT;
+   irq-dest_mode = (1  MSI_ADDR_DEST_MODE_SHIFT)  e-msi.address_lo;
+   irq-trig_mode = (1  MSI_DATA_TRIGGER_SHIFT)  e-msi.data;
+   irq-delivery_mode = e-msi.data  0x700;
+   irq-level = 1;
+   irq-shorthand = 0;
+   /* TODO Deal with RH bit of MSI message address */
+}
+
 int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
struct kvm *kvm, int irq_source_id, int level)
 {
@@ -110,22 +127,26 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
if (!level)
return -1;
 
-   trace_kvm_msi_set_irq(e-msi.address_lo, e-msi.data);
+   kvm_set_msi_irq(e, irq);
 
-   irq.dest_id = (e-msi.address_lo 
-   MSI_ADDR_DEST_ID_MASK)  MSI_ADDR_DEST_ID_SHIFT;
-   irq.vector = (e-msi.data 
-   MSI_DATA_VECTOR_MASK)  MSI_DATA_VECTOR_SHIFT;
-   irq.dest_mode = (1  MSI_ADDR_DEST_MODE_SHIFT)  e-msi.address_lo;
-   irq.trig_mode = (1  MSI_DATA_TRIGGER_SHIFT)  e-msi.data;
-   irq.delivery_mode = e-msi.data  0x700;
-   irq.level = 1;
-   irq.shorthand = 0;
-
-   /* TODO Deal with RH bit of MSI message address */
return kvm_irq_delivery_to_apic(kvm, NULL, irq);
 }
 
+
+static int kvm_set_msi_inatomic(struct kvm_kernel_irq_routing_entry *e,
+struct kvm *kvm)
+{
+   struct kvm_lapic_irq irq;
+   int r;
+
+   kvm_set_msi_irq(e, irq);
+
+   if (kvm_irq_delivery_to_apic_fast(kvm, NULL, irq, r))
+   return r;
+   else
+   return -EWOULDBLOCK;
+}
+
 int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi)
 {
struct kvm_kernel_irq_routing_entry route;
@@ -178,6 +199,44 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 
irq, int level)
return ret;
 }
 
+/*
+ * Deliver an IRQ in an atomic context if we can, or return a failure,
+ * user can retry in a process context.
+ * Return value:
+ *  -EWOULDBLOCK - Can't deliver in atomic context: retry in a process context.
+ *  Other values - No need to retry.
+ */
+int kvm_set_irq_inatomic(struct kvm *kvm, int irq_source_id, u32 irq, int 
level)
+{
+   struct kvm_kernel_irq_routing_entry *e;
+   int ret = -EINVAL;
+   struct kvm_irq_routing_table *irq_rt;
+   struct hlist_node *n;
+
+   trace_kvm_set_irq(irq, level, irq_source_id);
+
+   /*
+* Injection into either PIC or IOAPIC might need to scan all CPUs,
+* which would need to be retried from thread context;  when same GSI
+* is connected to both PIC and IOAPIC, we'd have to report a
+* partial failure here.
+* Since there's no easy way to do this, we only support injecting MSI
+* which is limited to 1:1 GSI mapping.
+*/
+   rcu_read_lock();
+   irq_rt = rcu_dereference(kvm-irq_routing);
+   if (irq  irq_rt-nr_rt_entries)
+   hlist_for_each_entry(e, n, irq_rt-map[irq], link) {
+   if (likely(e-type == KVM_IRQ_ROUTING_MSI))
+   ret = kvm_set_msi_inatomic(e, kvm);
+   else
+   ret = -EWOULDBLOCK;
+   break;
+   }
+   rcu_read_unlock();
+   return ret;
+}
+
 void 

[PATCHv4 2/2] kvm: deliver msi interrupts from irq handler

2012-10-17 Thread Michael S. Tsirkin
We can deliver certain interrupts, notably MSI,
from atomic context.  Use kvm_set_irq_inatomic,
to implement an irq handler for msi.

This reduces the pressure on scheduler in case
where host and guest irq share a host cpu.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 virt/kvm/assigned-dev.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 23a41a9..3642239 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -105,6 +105,15 @@ static irqreturn_t kvm_assigned_dev_thread_intx(int irq, 
void *dev_id)
 }
 
 #ifdef __KVM_HAVE_MSI
+static irqreturn_t kvm_assigned_dev_msi(int irq, void *dev_id)
+{
+   struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
+   int ret = kvm_set_irq_inatomic(assigned_dev-kvm,
+  assigned_dev-irq_source_id,
+  assigned_dev-guest_irq, 1);
+   return unlikely(ret == -EWOULDBLOCK) ? IRQ_WAKE_THREAD : IRQ_HANDLED;
+}
+
 static irqreturn_t kvm_assigned_dev_thread_msi(int irq, void *dev_id)
 {
struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
@@ -117,6 +126,23 @@ static irqreturn_t kvm_assigned_dev_thread_msi(int irq, 
void *dev_id)
 #endif
 
 #ifdef __KVM_HAVE_MSIX
+static irqreturn_t kvm_assigned_dev_msix(int irq, void *dev_id)
+{
+   struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
+   int index = find_index_from_host_irq(assigned_dev, irq);
+   u32 vector;
+   int ret = 0;
+
+   if (index = 0) {
+   vector = assigned_dev-guest_msix_entries[index].vector;
+   ret = kvm_set_irq_inatomic(assigned_dev-kvm,
+  assigned_dev-irq_source_id,
+  vector, 1);
+   }
+
+   return unlikely(ret == -EWOULDBLOCK) ? IRQ_WAKE_THREAD : IRQ_HANDLED;
+}
+
 static irqreturn_t kvm_assigned_dev_thread_msix(int irq, void *dev_id)
 {
struct kvm_assigned_dev_kernel *assigned_dev = dev_id;
@@ -334,11 +360,6 @@ static int assigned_device_enable_host_intx(struct kvm 
*kvm,
 }
 
 #ifdef __KVM_HAVE_MSI
-static irqreturn_t kvm_assigned_dev_msi(int irq, void *dev_id)
-{
-   return IRQ_WAKE_THREAD;
-}
-
 static int assigned_device_enable_host_msi(struct kvm *kvm,
   struct kvm_assigned_dev_kernel *dev)
 {
@@ -363,11 +384,6 @@ static int assigned_device_enable_host_msi(struct kvm *kvm,
 #endif
 
 #ifdef __KVM_HAVE_MSIX
-static irqreturn_t kvm_assigned_dev_msix(int irq, void *dev_id)
-{
-   return IRQ_WAKE_THREAD;
-}
-
 static int assigned_device_enable_host_msix(struct kvm *kvm,
struct kvm_assigned_dev_kernel *dev)
 {
-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.

2012-10-17 Thread Michael Wolf
On Wed, 2012-10-17 at 21:14 +0400, Glauber Costa wrote:
 On 10/17/2012 06:23 AM, Michael Wolf wrote:
  In the case of where you have a system that is running in a
  capped or overcommitted environment the user may see steal time
  being reported in accounting tools such as top or vmstat.  This can
  cause confusion for the end user.  To ease the confusion this patch set
  adds the idea of consigned (expected steal) time.  The host will separate
  the consigned time from the steal time.  The consignment limit passed to the
  host will be the amount of steal time expected within a fixed period of
  time.  Any other steal time accruing during that period will show as the
  traditional steal time.
  
  TODO:
  * Change native_clock to take params and not return a value
  * Change update_rq_clock_task
  
  Changes from V1:
  * Removed the steal time allowed percentage from the guest
  * Moved the separation of consigned (expected steal) and steal time to the
host.
  * No longer include a sysctl interface.
  
 
 You are showing this in the guest somewhere, but tools like top will
 still not show it. So for quite a while, it achieves nothing.
 
 Of course this is a barrier that any new statistic has to go through. So
 while annoying, this is per-se ultimately not a blocker.
 
 What I still fail to see, is how this is useful information to be shown
 in the guest. Honestly, if I'm in a guest VM or container, any time
 during which I am not running is time I lost. It doesn't matter if this
 was expected or not. This still seems to me as a host-side problem, to
 be solved entirely by tooling.
 

What tools like top and vmstat will show is altered.  When I put time in
the consign bucket it does not show up in steal.  So now as long as the
system is performing as expected the user will see 100% and 0% steal.  I
added the consign field to /proc/stat so that all time accrued in the
period is accounted for and also for debugging purposes.  The user wont
care about consign and will not see it.  

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I/O errors in guest OS after repeated migration

2012-10-17 Thread Guido Winkelmann
Am Dienstag, 16. Oktober 2012, 12:44:27 schrieb Brian Jackson:
 On Tuesday, October 16, 2012 11:33:44 AM Guido Winkelmann wrote:
  The commandline, as generated by libvirtd, looks like this:
  
  LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
  QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.15 -enable-kvm -m 1024
  -smp 1,sockets=1,cores=1,threads=1 -name migratetest2 -uuid
  ddbf11e9-387e-902b-4849-8c3067dc42a2 -nodefconfig -nodefaults -chardev
  socket,id=charmonitor,path=/var/lib/libvirt/qemu/migratetest2.monitor,serv
  e
  r,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
  -no-reboot -no- shutdown -device
  piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
  file=/data/migratetest2_system,if=none,id=drive-virtio-
  disk0,format=qcow2,cache=none -device virtio-blk-
  pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-
  disk0,bootindex=1 -drive file=/data/migratetest2_data-1,if=none,id=drive-
  virtio-disk1,format=qcow2,cache=none -device virtio-blk-
  pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk1 -
  netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device virtio-net-
  pci,netdev=hostnet0,id=net0,mac=02:00:00:00:00:0c,bus=pci.0,addr=0x3 -vnc
  127.0.0.1:2,password -k de -vga cirrus -incoming tcp:0.0.0.0:49153 -device
  virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
 
 I see qcow2 in there. Live migration of qcow2 was a new feature in 1.0. Have
 you tried other formats or different qemu/kvm versions?

I tried the same thing with a raw image file instead of qcow2, and the problem 
still happens. From the /var/log/messages of the guest:

Oct 17 17:10:34 localhost sshd[2368]: nss_ldap: could not search LDAP server - 
Server is unavailable
Oct 17 17:10:39 localhost kernel: [  126.800075] eth0: no IPv6 routers present
Oct 17 17:10:52 localhost kernel: [  140.335783] Clocksource tsc unstable 
(delta = -70265501 ns)
Oct 17 17:12:04 localhost /O error on device vda1, logical block 1858765
Oct 17 17:12:04 localhost kernel: [  212.070584] Buffer I/O error on device 
vda1, logical block 1858766
Oct 17 17:12:04 localhost kernel: [  212.070587] Buffer I/O error on device 
vda1, logical block 1858767
Oct 17 17:12:04 localhost kernel: [  212.070589] Buffer I/O error on device 
vda1, logical block 1858768
Oct 17 17:12:04 localhost kernel: [  212.070592] Buffer I/O error on device 
vda1, logical block 1858769
Oct 17 17:12:04 localhost kernel: [  212.070595] Buffer I/O error on device 
vda1, logical block 1858770
Oct 17 17:12:04 localhost kernel: [  212.070597] Buffer I/O error on device 
vda1, logical block 1858771
Oct 17 17:12:04 localhost kernel: [  212.070600] Buffer I/O error on device 
vda1, logical block 1858772
Oct 17 17:12:04 localhost kernel: [  212.070602] Buffer I/O error on device 
vda1, logical block 1858773
Oct 17 17:12:04 localhost kernel: [  212.070605] Buffer I/O error on device 
vda1, logical block 1858774
Oct 17 17:12:04 localhost kernel: [  212.070607] Buffer I/O error on device 
vda1, logical block 1858775
Oct 17 17:12:04 localhost kernel: [  212.070610] Buffer I/O error on device 
vda1, logical block 1858776
Oct 17 17:12:04 localhost kernel: [  212.070612] Buffer I/O error on device 
vda1, logical block 1858777
Oct 17 17:12:04 localhost kernel: [  212.070615] Buffer I/O error on device 
vda1, logical block 1858778
Oct 17 17:12:04 localhost kernel: [  212.070617] Buffer I/O error on device 
vda1, logical block 1858779

(I was writing a large file at the time, to make sure I actually catch I/O 
errors as they happen)

Guido
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kvm-unit test behavior

2012-10-17 Thread Conny Seidel
Hi,


we are seeing something strange when running the KVM unit-tests on
recent KVM and older CPUs (K8 Family).

[ cut here ]
WARNING: at arch/x86/kvm/../../../virt/kvm/kvm_main.c:1325 
kvm_release_pfn_clean+0x5b/0x60 [kvm]()
Hardware name: WARTHOG
Modules linked in: tun nfsv4 auth_rpcgss nfsv3 nfs_acl nfs fscache lockd sunrpc 
bridge stp llc ipv6 amd8111e mii powernow_k8 freq_table kvm_amd kvm serio_raw 
pcspkr k8temp amd64_edac_mod edac_core edac_mce_amd i2c_amd756 amd_rng 
i2c_amd8111 sg shpchp ext3 jbd mbcache sd_mod crc_t10dif sr_mod cdrom sata_sil 
ata_generic pata_acpi pata_amd radeon ttm drm_kms_helper drm i2c_algo_bit 
i2c_core dm_mirror dm_region_hash dm_log dm_mod
Pid: 2084, comm: qemu-kvm Not tainted 3.6.0.20121010_ecefbd9-1.el6.osrc.x86_64 
#1
Call Trace:
 [8105510f] warn_slowpath_common+0x7f/0xc0
 [8105516a] warn_slowpath_null+0x1a/0x20
 [a029439b] kvm_release_pfn_clean+0x5b/0x60 [kvm]
 [a02b7cdb] paging64_fetch+0x1eb/0x370 [kvm]
 [a0294a1f] ? __gfn_to_pfn+0x6f/0x80 [kvm]
 [a0294b0a] ? gfn_to_pfn_async+0x1a/0x20 [kvm]
 [a02b335b] ? try_async_pf+0x4b/0x1f0 [kvm]
 [a02b80f3] paging64_page_fault+0x293/0x2d0 [kvm]
 [8116a72c] ? kfree+0x2c/0x120
 [a02b37d7] kvm_mmu_page_fault+0x27/0xd0 [kvm]
 [a03150b4] pf_interception+0xa4/0x170 [kvm_amd]
 [a031be56] handle_exit+0x146/0x2d0 [kvm_amd]
 [a02a458d] ? kvm_get_cr8+0x1d/0x30 [kvm]
 [a03155a5] ? svm_vcpu_run+0x425/0x530 [kvm_amd]
 [a02a8f0c] vcpu_enter_guest+0x39c/0x6b0 [kvm]
 [a02a9408] __vcpu_run+0x1e8/0x320 [kvm]
 [a02a95da] kvm_arch_vcpu_ioctl_run+0x9a/0x1f0 [kvm]
 [a02960f8] kvm_vcpu_ioctl+0x4a8/0x590 [kvm]
 [81189e0c] do_vfs_ioctl+0x8c/0x340
 [8118a161] sys_ioctl+0xa1/0xb0
 [810d4fc6] ? __audit_syscall_exit+0x3d6/0x430
 [81548669] system_call_fastpath+0x16/0x1b
---[ end trace bc3b9055849b3814 ]---

The failing tests are svm and svm-disable, which seem to loop forever
once started.

Begin logfile:
 enabling apic
 enabling apic
 paging enabled
 cr0 = 80010011
 cr3 = 7fff000
 cr4 = 20
 null: PASS
 vmrun: PASS
 vmrun intercept check: PASS
 cr3 read intercept: PASS
 enabling apic
 enabling apic
 paging enabled
 cr0 = 80010011
 cr3 = 7fff000
 cr4 = 20
 null: PASS
 vmrun: PASS
 vmrun intercept check: PASS
 cr3 read intercept: PASS
 /snip # goes on until the test is killed.

Anyone seen this behavior?

--
Kind regards.

Conny Seidel

##
# Email : conny.sei...@amd.comGnuPG-Key : 0xA6AB055D #
# Fingerprint: 17C4 5DB2 7C4C C1C7 1452 8148 F139 7C09 A6AB 055D #
##
# Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach  #
# General Managers: Alberto Bozzo#
# Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen #
#   HRB Nr. 43632#
##


signature.asc
Description: PGP signature


Re: kvm-unit test behavior

2012-10-17 Thread Avi Kivity
On 10/17/2012 06:08 PM, Conny Seidel wrote:
 Hi,
 
 
 we are seeing something strange when running the KVM unit-tests on
 recent KVM and older CPUs (K8 Family).
 


A patch was just applied fixing this; it will be merged upstream in a
few days.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: PPC: Support ioeventfd

2012-10-17 Thread Alexander Graf

On 10/17/2012 04:50 PM, Avi Kivity wrote:

On 10/16/2012 04:49 PM, Alexander Graf wrote:


If there is a lot of prioritization and/or queuing logic, then yes.  But
what about MSI?  Doesn't that have a direct path?

Nope. Well, yes, in a certain special case where the MPIC pushes the
interrupt vector on interrupt delivery into a special register. But not
for the normal case.

Ok.  The patches are fine then, but would be good to add the PIO check.


Yup, will do as a separate patch.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I/O errors in guest OS after repeated migration

2012-10-17 Thread Brian Jackson
On Wednesday, October 17, 2012 06:54:00 AM Guido Winkelmann wrote:
 Am Dienstag, 16. Oktober 2012, 12:44:27 schrieb Brian Jackson:
  On Tuesday, October 16, 2012 11:33:44 AM Guido Winkelmann wrote:
 [...]
 
   The commandline, as generated by libvirtd, looks like this:
   
   LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
   QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.15 -enable-kvm -m 1024
   -smp 1,sockets=1,cores=1,threads=1 -name migratetest2 -uuid
   ddbf11e9-387e-902b-4849-8c3067dc42a2 -nodefconfig -nodefaults -chardev
   socket,id=charmonitor,path=/var/lib/libvirt/qemu/migratetest2.monitor,s
   erv e
   r,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
   -no-reboot -no- shutdown -device
   piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
   file=/data/migratetest2_system,if=none,id=drive-virtio-
   disk0,format=qcow2,cache=none -device virtio-blk-
   pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-
   disk0,bootindex=1 -drive
   file=/data/migratetest2_data-1,if=none,id=drive-
   virtio-disk1,format=qcow2,cache=none -device virtio-blk-
   pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk
   1 - netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device
   virtio-net-
   pci,netdev=hostnet0,id=net0,mac=02:00:00:00:00:0c,bus=pci.0,addr=0x3
   -vnc 127.0.0.1:2,password -k de -vga cirrus -incoming
   tcp:0.0.0.0:49153 -device
   virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
  
  I see qcow2 in there. Live migration of qcow2 was a new feature in 1.0.
  Have you tried other formats or different qemu/kvm versions?
 
 Are you sure about that? Because I'm fairly certain I have been using live
 migration since at least 0.14, if not 0.13, and I have always been using
 qcow2 as the image format for the disks...
 
 I can still try with other image formats, though.


Yes, see the release notes for 1.0. It may have worked by chance before that, 
but it wasn't guaranteed to work. There was no blacklisting feature then like 
there is now to stop it.


 
   Guido
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: I/O errors in guest OS after repeated migration

2012-10-17 Thread Brian Jackson
On Wednesday, October 17, 2012 10:45:14 AM Guido Winkelmann wrote:
 Am Dienstag, 16. Oktober 2012, 12:44:27 schrieb Brian Jackson:
  On Tuesday, October 16, 2012 11:33:44 AM Guido Winkelmann wrote:
   The commandline, as generated by libvirtd, looks like this:
   
   LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
   QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.15 -enable-kvm -m 1024
   -smp 1,sockets=1,cores=1,threads=1 -name migratetest2 -uuid
   ddbf11e9-387e-902b-4849-8c3067dc42a2 -nodefconfig -nodefaults -chardev
   socket,id=charmonitor,path=/var/lib/libvirt/qemu/migratetest2.monitor,s
   erv e
   r,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
   -no-reboot -no- shutdown -device
   piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
   file=/data/migratetest2_system,if=none,id=drive-virtio-
   disk0,format=qcow2,cache=none -device virtio-blk-
   pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-
   disk0,bootindex=1 -drive
   file=/data/migratetest2_data-1,if=none,id=drive-
   virtio-disk1,format=qcow2,cache=none -device virtio-blk-
   pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk1,id=virtio-disk
   1 - netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 -device
   virtio-net-
   pci,netdev=hostnet0,id=net0,mac=02:00:00:00:00:0c,bus=pci.0,addr=0x3
   -vnc 127.0.0.1:2,password -k de -vga cirrus -incoming
   tcp:0.0.0.0:49153 -device
   virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
  
  I see qcow2 in there. Live migration of qcow2 was a new feature in 1.0.
  Have you tried other formats or different qemu/kvm versions?
 
 I tried the same thing with a raw image file instead of qcow2, and the
 problem still happens. From the /var/log/messages of the guest:
 
 Oct 17 17:10:34 localhost sshd[2368]: nss_ldap: could not search LDAP
 server - Server is unavailable
 Oct 17 17:10:39 localhost kernel: [  126.800075] eth0: no IPv6 routers
 present Oct 17 17:10:52 localhost kernel: [  140.335783] Clocksource tsc
 unstable (delta = -70265501 ns)
 Oct 17 17:12:04 localhost /O error on device vda1, logical block 1858765
 Oct 17 17:12:04 localhost kernel: [  212.070584] Buffer I/O error on device
 vda1, logical block 1858766
 Oct 17 17:12:04 localhost kernel: [  212.070587] Buffer I/O error on device
 vda1, logical block 1858767
 Oct 17 17:12:04 localhost kernel: [  212.070589] Buffer I/O error on device
 vda1, logical block 1858768
 Oct 17 17:12:04 localhost kernel: [  212.070592] Buffer I/O error on device
 vda1, logical block 1858769
 Oct 17 17:12:04 localhost kernel: [  212.070595] Buffer I/O error on device
 vda1, logical block 1858770
 Oct 17 17:12:04 localhost kernel: [  212.070597] Buffer I/O error on device
 vda1, logical block 1858771
 Oct 17 17:12:04 localhost kernel: [  212.070600] Buffer I/O error on device
 vda1, logical block 1858772
 Oct 17 17:12:04 localhost kernel: [  212.070602] Buffer I/O error on device
 vda1, logical block 1858773
 Oct 17 17:12:04 localhost kernel: [  212.070605] Buffer I/O error on device
 vda1, logical block 1858774
 Oct 17 17:12:04 localhost kernel: [  212.070607] Buffer I/O error on device
 vda1, logical block 1858775
 Oct 17 17:12:04 localhost kernel: [  212.070610] Buffer I/O error on device
 vda1, logical block 1858776
 Oct 17 17:12:04 localhost kernel: [  212.070612] Buffer I/O error on device
 vda1, logical block 1858777
 Oct 17 17:12:04 localhost kernel: [  212.070615] Buffer I/O error on device
 vda1, logical block 1858778
 Oct 17 17:12:04 localhost kernel: [  212.070617] Buffer I/O error on device
 vda1, logical block 1858779
 
 (I was writing a large file at the time, to make sure I actually catch I/O
 errors as they happen)


What about newer versions of qemu/kvm? But of course if those work, your next 
task is going to be git bisect it or file a bug with your distro that is using 
an ancient version of qemu/kvm.


 
   Guido
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How to do fast accesses to LAPIC TPR under kvm?

2012-10-17 Thread Stefan Fritsch
Hi,

OpenBSD/i386 seems to be one of the few operating systems that still 
uses the LAPIC taks priority register for interrupt handling. On AMD 
CPUs and on older Intel CPUs without the flexpriority feature, this 
causes a huge performance impact on kvm. I have seen slowdown by a 
factor of 10.

Is there a way to use the TPR under kvm without the slowdown? There 
are some MSRs inherited from Hyper-V, but using these does not make 
that much difference. I think this is because they still cause an 
vmexit for every TPR access. I expect the the same is true for x2apic 
emulation, isn't it?

There is also the kvmvapic, but kvm does not expose a sane interface 
to it and only uses it for Windows XP specific binary patching.

Another possibility is TPR access via CR8 on AMD, but the missing 
cr8_legacy CPUID bit and this discussion [1] make me believe that this 
is not supported under kvm, at least in 32bit mode. Could this be 
easily fixed? If yes, would it solve the performance problems, i.e. 
offer performance comparable to Intel's flexpriority feature?

OpenBSD seems to be reluctant to stop using the TPR. In fact, in a 
recent discussion, there has been a suggestion that OpenBSD should 
switch to using TPR also on OpenBSD/amd64 to solve some problems with 
boot interrupts. How do you expect this would affect performance under 
kvm (if using CR8)?

Or do you have any other suggestions? One could also modify kvm to 
expose a real interface to kvmvapic, e.g. allow the guest OS to 
provide the virtual address of the option rom and the offset of the 
CPU number in the %fs segment, instead of using hard coded values for 
Windows XP.

Cheers,
Stefan

[1] http://www.mail-archive.com/kvm@vger.kernel.org/msg30627.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl

2012-10-17 Thread Peter Maydell
On 14 October 2012 01:04, Christoffer Dall
c.d...@virtualopensystems.com wrote:
 Used to initialize the in-kernel interrupt controller. On ARM we need to
 map the virtual generic interrupt controller (vGIC) into Hyp the guest's
 physicall address space so the guest can access the virtual cpu
 interface. This must be done after the IRQ chips is create and after a
 base address has been provided for the emulated platform (patch is
 following), but before the CPU is initally run.

I've now written the code for that patch but don't have access to a machine
with the ARM cross compile setup to build it until tomorrow.


 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  Documentation/virtual/kvm/api.txt |   16 
  arch/arm/kvm/arm.c|1 +
  include/linux/kvm.h   |3 +++
  3 files changed, 20 insertions(+)

 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 25eacc6..26e953d 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2102,6 +2102,22 @@ This ioctl returns the guest registers that are 
 supported for the
  KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.


 +4.79 KVM_INIT_IRQCHIP
 +
 +Capability: KVM_CAP_INIT_IRQCHIP
 +Architectures: arm
 +Type: vm ioctl
 +Parameters: none
 +Returns: 0 on success, -1 on error
 +
 +Initialize the in-kernel interrupt controller. On ARM we need to map the
 +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall

Should that Hyp be deleted?

physical

 +address space so the guest can access the virtual cpu interface. This must be
 +done after the IRQ chips is create and after a base address has been provided

chip. created.

 +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU is
 +initally run.

initially.

(all these typos are also in your commit message)

 +
 +
  5. The kvm_run structure
  

 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index f8c377b..85c76e4 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -195,6 +195,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 switch (ext) {
  #ifdef CONFIG_KVM_ARM_VGIC
 case KVM_CAP_IRQCHIP:
 +   case KVM_CAP_INIT_IRQCHIP:
 r = vgic_present;
 break;
  #endif
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index 8091b1d..90ee023 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -626,6 +626,7 @@ struct kvm_ppc_smmu_info {
  #ifdef __KVM_HAVE_READONLY_MEM
  #define KVM_CAP_READONLY_MEM 81
  #endif
 +#define KVM_CAP_INIT_IRQCHIP 82

  #ifdef KVM_CAP_IRQ_ROUTING

 @@ -839,6 +840,8 @@ struct kvm_s390_ucas_mapping {
  #define KVM_PPC_GET_SMMU_INFO_IOR(KVMIO,  0xa6, struct kvm_ppc_smmu_info)
  /* Available with KVM_CAP_PPC_ALLOC_HTAB */
  #define KVM_PPC_ALLOCATE_HTAB_IOWR(KVMIO, 0xa7, __u32)
 +/* Available with KVM_CAP_INIT_IRQCHIP */
 +#define KVM_INIT_IRQCHIP _IO(KVMIO,   0xa8)

  /*
   * ioctls for vcpu fds
 --
 1.7.9.5



-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl

2012-10-17 Thread Christoffer Dall
On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 14 October 2012 01:04, Christoffer Dall
 c.d...@virtualopensystems.com wrote:
 Used to initialize the in-kernel interrupt controller. On ARM we need to
 map the virtual generic interrupt controller (vGIC) into Hyp the guest's
 physicall address space so the guest can access the virtual cpu
 interface. This must be done after the IRQ chips is create and after a
 base address has been provided for the emulated platform (patch is
 following), but before the CPU is initally run.

 I've now written the code for that patch but don't have access to a machine
 with the ARM cross compile setup to build it until tomorrow.


 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  Documentation/virtual/kvm/api.txt |   16 
  arch/arm/kvm/arm.c|1 +
  include/linux/kvm.h   |3 +++
  3 files changed, 20 insertions(+)

 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 25eacc6..26e953d 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2102,6 +2102,22 @@ This ioctl returns the guest registers that are 
 supported for the
  KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.


 +4.79 KVM_INIT_IRQCHIP
 +
 +Capability: KVM_CAP_INIT_IRQCHIP
 +Architectures: arm
 +Type: vm ioctl
 +Parameters: none
 +Returns: 0 on success, -1 on error
 +
 +Initialize the in-kernel interrupt controller. On ARM we need to map the
 +virtual generic interrupt controller (vGIC) into Hyp the guest's physicall

 Should that Hyp be deleted?

yup


 physical

 +address space so the guest can access the virtual cpu interface. This must 
 be
 +done after the IRQ chips is create and after a base address has been 
 provided

 chip. created.

 +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU 
 is
 +initally run.

 initially.

thanks a bunch for those, and sorry about the sloppyness.


 (all these typos are also in your commit message)


yeah, you caught my -ECUTANDPASTE there ;)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/3] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2012-10-17 Thread Peter Maydell
On 14 October 2012 01:04, Christoffer Dall
c.d...@virtualopensystems.com wrote:
 On ARM (and possibly other architectures) some bits are specific to the
 model being emulated for the guest and user space needs a way to tell
 the kernel about those bits.  An example is mmio device base addresses,
 where KVM must know the base address for a given device to properly
 emulate mmio accesses within a certain address range or directly map a
 device with virtualiation extensions into the guest address space.

 We try to make this API slightly more generic than for our specific use,
 but so far only the VGIC uses this feature.

 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
 ---
  Documentation/virtual/kvm/api.txt |   30 ++
  arch/arm/include/asm/kvm.h|   13 +
  arch/arm/include/asm/kvm_mmu.h|1 +
  arch/arm/include/asm/kvm_vgic.h   |6 ++
  arch/arm/kvm/arm.c|   31 ++-
  arch/arm/kvm/vgic.c   |   34 +++---
  include/linux/kvm.h   |8 
  7 files changed, 119 insertions(+), 4 deletions(-)

 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 26e953d..30ddcac 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -2118,6 +2118,36 @@ for the emulated platofrm (see 
 KVM_SET_DEVICE_ADDRESS), but before the CPU is
  initally run.


 +4.80 KVM_SET_DEVICE_ADDRESS
 +
 +Capability: KVM_CAP_SET_DEVICE_ADDRESS
 +Architectures: arm
 +Type: vm ioctl
 +Parameters: struct kvm_device_address (in)
 +Returns: 0 on success, -1 on error
 +Errors:
 +  ENODEV: The device id is unknwown

unknown

 +  ENXIO:  Device not supported in configuration

in this configuration ? (I'm guessing this is for you tried to
map a GIC when this CPU doesn't have a GIC and similar errors?)

 +  E2BIG:  Address outside of guest physical address space

I would say outside rather than outside of here.

 +
 +struct kvm_device_address {
 +   __u32 id;
 +   __u64 addr;
 +};
 +
 +Specify a device address in the guest's physical address space where guests
 +can access emulated or directly exposed devices, which the host kernel needs
 +to know about. The id field is an architecture specific identifier for a
 +specific device.
 +
 +ARM divides the id field into two parts, a device ID and an address type id

We should be consistent about whether ID is capitalised or not.

 +specific to the individual device.
 +
 +  bits:  | 31...16 | 15...0 |
 +  field: | device id   |  addr type id  |

This doesn't say whether userspace is allowed to make this ioctl
multiple times for the same device. This could be any of:
 * undefined behaviour
 * second call fails with some errno
 * second call overrides first one

It also doesn't say that you're supposed to call this after CREATE
and before INIT of the irqchip. (Nor does it say what happens if
you call it at some other time.)

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl

2012-10-17 Thread Peter Maydell
On 17 October 2012 21:23, Christoffer Dall
c.d...@virtualopensystems.com wrote:
 On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell peter.mayd...@linaro.org 
 wrote:
 +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the CPU 
 is
 +initally run.

 initially.

 thanks a bunch for those, and sorry about the sloppyness.

No problem. Also just noticed platform there :-)

-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Alexander Graf

On 10/14/2012 02:04 AM, Christoffer Dall wrote:

*** warning: this RFC patch series is only compile-tested ***

We need a way to specify the address at which we expect VMs to access
the interrupt controller (both the emulated distributor and the hardware
interface supporting virtualization).  User space should decide on this
address as user space decides on an emulated board and loads a device
tree describing these details directly to the guest.

Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
ioctl with a a highly device specific set of parameters, we try
something slightly more generic, that should fit well with how user
space (read QEMU) first builds the individual devices and later sets up
the emulated platform.


Have you talked to Ben about this one? He wanted to design a new, more 
flexible irqchip API that would work for XICS  MPIC. Maybe there's some 
room for cooperation here?



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/3] KVM: ARM: Introduce KVM_INIT_IRQCHIP ioctl

2012-10-17 Thread Christoffer Dall
On Wed, Oct 17, 2012 at 4:31 PM, Peter Maydell peter.mayd...@linaro.org wrote:
 On 17 October 2012 21:23, Christoffer Dall
 c.d...@virtualopensystems.com wrote:
 On Wed, Oct 17, 2012 at 4:21 PM, Peter Maydell peter.mayd...@linaro.org 
 wrote:
 +for the emulated platofrm (see KVM_SET_DEVICE_ADDRESS), but before the 
 CPU is
 +initally run.

 initially.

 thanks a bunch for those, and sorry about the sloppyness.

 No problem. Also just noticed platform there :-)

I'll spell check the diff just to be sure. :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Christoffer Dall
On Wed, Oct 17, 2012 at 4:38 PM, Alexander Graf ag...@suse.de wrote:
 On 10/14/2012 02:04 AM, Christoffer Dall wrote:

 *** warning: this RFC patch series is only compile-tested ***

 We need a way to specify the address at which we expect VMs to access
 the interrupt controller (both the emulated distributor and the hardware
 interface supporting virtualization).  User space should decide on this
 address as user space decides on an emulated board and loads a device
 tree describing these details directly to the guest.

 Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
 ioctl with a a highly device specific set of parameters, we try
 something slightly more generic, that should fit well with how user
 space (read QEMU) first builds the individual devices and later sets up
 the emulated platform.


 Have you talked to Ben about this one? He wanted to design a new, more
 flexible irqchip API that would work for XICS  MPIC. Maybe there's some
 room for cooperation here?

I have not - Ben, what do you have in mind?

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Benjamin Herrenschmidt
On Wed, 2012-10-17 at 16:39 -0400, Christoffer Dall wrote:

  Have you talked to Ben about this one? He wanted to design a new, more
  flexible irqchip API that would work for XICS  MPIC. Maybe there's some
  room for cooperation here?
 
 I have not - Ben, what do you have in mind?

I've been sidetracked to some other stuff so for now Paul (CC) is taking
over my interrupt patches.

We initially changes IRQ_CREATE_IRQCHIP to take an argument but that was
causing an x86 ABI breakage (ioctl number changing). So we'll probably
be creating a new one.

From there, nothing fancy really, just an ioctl with an IRQ chip type at
the beginning followed by a union of type-specific parameters.

The main problem we haven't sorted out yet is how to replace some of the
horrors related to mapping interrupts that have tendrils all the way
into virtio-pci etc... in kemu that don't apply to use (well mostly) and
the interaction with in-kernel generated interrupts to avoid going
through qemu for vhost ec...

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] Added call parameter to track whether invocation originated with guest or elsewhere

2012-10-17 Thread Auld, Will
OK, agreed it is not pretty. 

Thanks,

Will

 -Original Message-
 From: Marcelo Tosatti [mailto:mtosa...@redhat.com]
 Sent: Wednesday, October 17, 2012 7:09 AM
 To: Avi Kivity
 Cc: Auld, Will; Will Auld; kvm@vger.kernel.org; Zhang, Xiantao; Liu,
 Jinsong
 Subject: Re: [PATCH] Added call parameter to track whether invocation
 originated with guest or elsewhere
 
 On Wed, Oct 17, 2012 at 12:35:33PM +0200, Avi Kivity wrote:
  On 10/17/2012 04:10 AM, Will Auld wrote:
   Signed-off-by: Will Auld will.a...@intel.com
   ---
  
   Resending to full list
  
   Marcelo,
  
   This patch is what I believe you ask for as foundational for later
   patches to address IA32_TSC_ADJUST.
  
 
  Please write a changelog to reflect the motivation.
 
  All those bool parameters scattered all over the place aren't very
  pretty.  Usually we solve this with helpers that embed the parameter
  name (kvm_set_msr() vs. kvm_set_msr_host()) but there are too many
  functions for this to work here.
 
  Marcelo, any ideas?
 
 Its easier to read
 
 kvm_x86_ops-kvm_set_msr()
 kvm_x86_ops-kvm_set_msr_host()
 
 then
 
 kvm_x86_ops-kvm_set_msr(,false)
 kvm_x86_ops-kvm_set_msr(,true)
 
 So you're right.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Paul Mackerras
On Wed, Oct 17, 2012 at 04:39:57PM -0400, Christoffer Dall wrote:
 On Wed, Oct 17, 2012 at 4:38 PM, Alexander Graf ag...@suse.de wrote:
  On 10/14/2012 02:04 AM, Christoffer Dall wrote:
 
  *** warning: this RFC patch series is only compile-tested ***
 
  We need a way to specify the address at which we expect VMs to access
  the interrupt controller (both the emulated distributor and the hardware
  interface supporting virtualization).  User space should decide on this
  address as user space decides on an emulated board and loads a device
  tree describing these details directly to the guest.
 
  Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
  ioctl with a a highly device specific set of parameters, we try
  something slightly more generic, that should fit well with how user
  space (read QEMU) first builds the individual devices and later sets up
  the emulated platform.
 
 
  Have you talked to Ben about this one? He wanted to design a new, more
  flexible irqchip API that would work for XICS  MPIC. Maybe there's some
  room for cooperation here?
 
 I have not - Ben, what do you have in mind?

I've taken over Ben's patches in this area and I'm currently working
on getting them ready for submission.  So far we only have XICS
emulation, and it is accessed through hypercalls, so there are no
addresses in the create-iochip ioctl argument yet.

What we have so far is a new ioctl:

#define KVM_CREATE_IRQCHIP_ARGS   _IOW(KVMIO,  0xac, struct kvm_irqchip_args)

where kvm_irqchip_args is defined in an arch header and currently
looks like this:

/* for KVM_CAP_SPAPR_XICS */
#define __KVM_HAVE_IRQCHIP_ARGS
struct kvm_irqchip_args {
#define KVM_IRQCHIP_TYPE_ICP0   /* XICS: ICP (presentation controller) 
*/
#define KVM_IRQCHIP_TYPE_ICS1   /* XICS: ICS (source controller) */
__u32 type;
union {
/* XICS ICP arguments. This needs to be called once before
 * creating any VCPU to initialize the main kernel XICS data
 * structures.
 */
struct {
#define KVM_ICP_FLAG_NOREALMODE 0x0001 /* Disable real mode ICP */
__u32 flags;
} icp;

/* XICS ICS arguments. You can call this for every BUID you
 * want to make available.
 *
 * The BUID is 12 bits, the interrupt number within a BUID
 * is up to 12 bits as well. The resulting interrupt numbers
 * exposed to the guest are BUID || IRQ which is 24 bit
 *
 * BUID cannot be 0.
 */
struct {
__u32 flags;
__u16 buid;
__u16 nr_irqs;
} ics;
};
};

With the XICS, there are two types of irqchip: a source controller and
a presentation controller.  There is one presentation controller per
vcpu and typically one source controller per PCI host bridge (a source
controller can manage multiple sources).  The buid above is
basically an identifier for a source controller.

So with the above, it would be quite easy to add new types and
arguments for them.

Thoughts?

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Benjamin Herrenschmidt
On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote:
 
 With the XICS, there are two types of irqchip: a source controller and
 a presentation controller.  There is one presentation controller per
 vcpu and typically one source controller per PCI host bridge (a source
 controller can manage multiple sources).  The buid above is
 basically an identifier for a source controller.
 
 So with the above, it would be quite easy to add new types and
 arguments for them. 

The only possible issue is that afiak, the ioctl number depends on the
structure size, no ? If it does, then we should add some padding to the
union to leave room for new types.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] KVM: ia64: remove unused variable in kvm_release_vm_pages()

2012-10-17 Thread Zhang, Xiantao
Acked-by: Xiantao Zhangxiantao.zh...@intel.com

 -Original Message-
 From: Wei Yongjun [mailto:weiyj...@gmail.com]
 Sent: Wednesday, October 17, 2012 11:04 PM
 To: a...@redhat.com; mtosa...@redhat.com; Zhang, Xiantao; Luck, Tony; Yu,
 Fenghua
 Cc: yongjun_...@trendmicro.com.cn; kvm@vger.kernel.org; kvm-
 i...@vger.kernel.org; linux-i...@vger.kernel.org
 Subject: [PATCH] KVM: ia64: remove unused variable in
 kvm_release_vm_pages()
 
 From: Wei Yongjun yongjun_...@trendmicro.com.cn
 
 The variable base_gfn is initialized but never used otherwise, so remove the
 unused variable.
 
 dpatch engine is used to auto generate this patch.
 (https://github.com/weiyj/dpatch)
 
 Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn
 ---
  arch/ia64/kvm/kvm-ia64.c | 2 --
  1 file changed, 2 deletions(-)
 
 diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index
 8b3a9c0..c71acd7 100644
 --- a/arch/ia64/kvm/kvm-ia64.c
 +++ b/arch/ia64/kvm/kvm-ia64.c
 @@ -1362,11 +1362,9 @@ static void kvm_release_vm_pages(struct kvm
 *kvm)
   struct kvm_memslots *slots;
   struct kvm_memory_slot *memslot;
   int j;
 - unsigned long base_gfn;
 
   slots = kvm_memslots(kvm);
   kvm_for_each_memslot(memslot, slots) {
 - base_gfn = memslot-base_gfn;
   for (j = 0; j  memslot-npages; j++) {
   if (memslot-rmap[j])
   put_page((struct page *)memslot-rmap[j]);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary

2012-10-17 Thread Zhang Yanfei
于 2012年10月17日 18:16, Avi Kivity 写道:
 On 10/17/2012 04:28 AM, Zhang Yanfei wrote:
 于 2012年10月15日 23:43, Avi Kivity 写道:
 On 10/12/2012 08:40 AM, Zhang Yanfei wrote:
 Currently, kdump just makes all the logical processors leave VMX operation 
 by
 executing VMXOFF instruction, so any VMCSs active on the logical 
 processors may
 be corrupted. But, sometimes, we need the VMCSs to debug guest images 
 contained
 in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs 
 before
 executing the VMXOFF instruction.

 How have you verified that VMXOFF doesn't flush cached VMCSs already?


 I tried some tests, for example, I made copies for every vmcs, and in the 
 kdump
 path, I backed up all the loaded vmcs into the copies before vmxoff.
 After generating the vmcore, I retrieve the vmcss and their copies, and 
 compare them,
 no differences.

 Another test is using VMCLEAR to clear all the loaded vmcs before VMXOFF,
 and compare the vmcss and their copies, there are indeed differences between 
 the
 vmcs and its copy.

 I know the tests may be not so convincing, for example, I used memcpy to 
 back up
 the vmcss and it is an ordinary memory operation. But to ensure the 
 non-corruption
 of the vmcss in the vmcore, I think we should VMCLEAR the vmcss before 
 VMXOFF just
 as the Intel spec says.
 
 Sorry, I was unclear -- I was referring to the spec, I wasn't sure
 whether VMXOFF is defined to flush VMCSes or whether it just invalidates
 on-chip caches so that it won't flush them out in the future, corrupting
 memory.  We don't want to depend on actual behaviour as it may change
 with future version.
 
 Copying some Intel folk, maybe they can clarify it.
 

Yes, the Intel spec says may be about the VMCS-corruption thing. From
chapter 24.10.1 in Intel® 64 and IA-32 Architectures Software Developer’s
Manual Volume 3C:System Programming Guide, Part 3, there is the description:

If a logical processor leaves VMX operation, any VMCSs active on that logical
processor may be corrupted (see below). To prevent such corruption of a VMCS 
that
may be used either after a return to VMX operation or on another logical 
processor,
software should VMCLEAR that VMCS before executing the VMXOFF instruction or
removing power from the processor (e.g., as part of a transition to the S3 and 
S4
power states).

Our purpose is to make sure the VMCSs in the vmcore are updated and 
non-corrupted. So
according to the description above, no matter whether VMXOFF is defined to flush
VMCSs or whether it just invalidates on-chip caches, we'd better VMCLEAR the
VMCSs before executing the VMXOFF.

Thanks
Zhang Yanfei
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost-blk: Add vhost-blk support v2

2012-10-17 Thread Rusty Russell
Asias He as...@redhat.com writes:
 +#define BLK_HDR0
 
 What's this for, exactly? Please add a comment.

 The block headr is in the first and separate buffer.

Please don't assume this!  We're trying to fix all the assumptions in
qemu at the moment.

vhost_net handles this correctly, taking bytes off the descriptor chain
as required.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm/powerpc: Handle errors in secondary thread grabbing

2012-10-17 Thread Alexander Graf

On 16.10.2012, at 21:33, Benjamin Herrenschmidt wrote:

 On Tue, 2012-10-16 at 17:00 +1100, Michael Ellerman wrote:
 Thanks for looking at this - but in fact this is fixed by my patch
 entitled KVM: PPC: Book3S HV: Fix some races in starting secondary
 threads submitted back on August 28.
 
 OK thanks. It seems that patch didn't make 3.7 ?
 
 I don't see it in kvm-ppc-next either.
 
 Alex, WTF ?

Hrm. Not sure what happened there. I think I waited for your ack, but never 
actually applied things when it came. My bad :)


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT

2012-10-17 Thread Avi Kivity
On 10/16/2012 10:03 PM, Anthony Liguori wrote:

 This forces userspace to dedicate a thread for the HPT.
 
 If no changes are available, does read return a size  0?  I don't think
 it's necessary to support polling.  The kernel should always be able to
 respond to userspace here.  The only catch is whether to return !0 read
 sizes when there are no changes.
 
 At any case, I can't see why a dedicated thread is needed.  QEMU is
 going to poll HPT based on how fast we can send data over the wire.

That means spinning if we can send the data faster than we dirty it.
But we do that anyway for memory.



-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] KVM: PPC: Book3S HV: Provide a method for userspace to read and write the HPT

2012-10-17 Thread Avi Kivity
On 10/16/2012 11:52 PM, Paul Mackerras wrote:
 On Tue, Oct 16, 2012 at 03:06:33PM +0200, Avi Kivity wrote:
 On 10/16/2012 01:58 PM, Paul Mackerras wrote:
  On Tue, Oct 16, 2012 at 12:06:58PM +0200, Avi Kivity wrote:
  Does/should the fd support O_NONBLOCK and poll? (=waiting for an entry
  to change).
  
  No.
 
 This forces userspace to dedicate a thread for the HPT.
 
 Why? Reads never block in any case.

Ok.  This parallels KVM_GET_DIRTY_LOG.

 
 I meant the internal data structure that holds HPT entries.
 
 Oh, that's just an array, and userspace already knows how big it is.
 
 I guess I don't understand the index.  Do we expect changes to be in
 contiguous ranges?  And invalid entries to be contiguous as well?  That
 doesn't fit with how hash tables work.  Does the index represent the
 position of the entry within the table, or something else?
 
 The index is just the position in the array.  Typically, in each group
 of 8 it will tend to be the low-numbered ones that are valid, since
 creating an entry usually uses the first empty slot.  So I expect that
 on the first pass, most of the records will represent 8 HPTEs.  On
 subsequent passes, probably most records will represent a single HPTE.

So it's a form of RLE compression.  Ok.

 
 16MiB is transferred in ~0.15 sec on GbE, much faster with 10GbE.  Does
 it warrant a live migration protocol?
 
 The qemu people I talked to seemed to think so.
 
  Because it is a hash table, updates tend to be scattered throughout
  the whole table, which is another reason why per-page dirty tracking
  and updates would be pretty inefficient.
 
 This suggests a stream format that includes the index in every entry.
 
 That would amount to dropping the n_valid and n_invalid fields from
 the current header format.  That would be less efficient for the
 initial pass (assuming we achieve an average n_valid of at least 2 on
 the initial pass), and probably less efficient for the incremental
 updates, since a newly-invalidated entry would have to be represented
 as 16 zero bytes rather than just an 8-byte header with n_valid=0 and
 n_invalid=1.  I'm assuming here that the initial pass would omit
 invalid entries.

I agree.  But let's have some measurements to make sure.

 
  
  As for the change rate, it depends on the application of course, but
  basically every time the guest changes a PTE in its Linux page tables
  we do the corresponding change to the corresponding HPT entry, so the
  rate can be quite high.  Workloads that do a lot of fork, exit, mmap,
  exec, etc. have a high rate of HPT updates.
 
 If the rate is high enough, then there's no point in a live update.
 
 True, but doesn't that argument apply to memory pages as well?

In some cases it does.  The question is what happens in practice.  If
you migrate a kernel build, how many entries are sent in the guest
stopped phase?


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: PPC: Support ioeventfd

2012-10-17 Thread Avi Kivity
On 10/16/2012 04:49 PM, Alexander Graf wrote:

 If there is a lot of prioritization and/or queuing logic, then yes.  But
 what about MSI?  Doesn't that have a direct path?
 
 Nope. Well, yes, in a certain special case where the MPIC pushes the
 interrupt vector on interrupt delivery into a special register. But not
 for the normal case.

Ok.  The patches are fine then, but would be good to add the PIO check.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] KVM: PPC: Support ioeventfd

2012-10-17 Thread Alexander Graf

On 10/17/2012 04:50 PM, Avi Kivity wrote:

On 10/16/2012 04:49 PM, Alexander Graf wrote:


If there is a lot of prioritization and/or queuing logic, then yes.  But
what about MSI?  Doesn't that have a direct path?

Nope. Well, yes, in a certain special case where the MPIC pushes the
interrupt vector on interrupt delivery into a special register. But not
for the normal case.

Ok.  The patches are fine then, but would be good to add the PIO check.


Yup, will do as a separate patch.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Alexander Graf

On 10/14/2012 02:04 AM, Christoffer Dall wrote:

*** warning: this RFC patch series is only compile-tested ***

We need a way to specify the address at which we expect VMs to access
the interrupt controller (both the emulated distributor and the hardware
interface supporting virtualization).  User space should decide on this
address as user space decides on an emulated board and loads a device
tree describing these details directly to the guest.

Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
ioctl with a a highly device specific set of parameters, we try
something slightly more generic, that should fit well with how user
space (read QEMU) first builds the individual devices and later sets up
the emulated platform.


Have you talked to Ben about this one? He wanted to design a new, more 
flexible irqchip API that would work for XICS  MPIC. Maybe there's some 
room for cooperation here?



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Christoffer Dall
On Wed, Oct 17, 2012 at 4:38 PM, Alexander Graf ag...@suse.de wrote:
 On 10/14/2012 02:04 AM, Christoffer Dall wrote:

 *** warning: this RFC patch series is only compile-tested ***

 We need a way to specify the address at which we expect VMs to access
 the interrupt controller (both the emulated distributor and the hardware
 interface supporting virtualization).  User space should decide on this
 address as user space decides on an emulated board and loads a device
 tree describing these details directly to the guest.

 Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
 ioctl with a a highly device specific set of parameters, we try
 something slightly more generic, that should fit well with how user
 space (read QEMU) first builds the individual devices and later sets up
 the emulated platform.


 Have you talked to Ben about this one? He wanted to design a new, more
 flexible irqchip API that would work for XICS  MPIC. Maybe there's some
 room for cooperation here?

I have not - Ben, what do you have in mind?

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Benjamin Herrenschmidt
On Wed, 2012-10-17 at 16:39 -0400, Christoffer Dall wrote:

  Have you talked to Ben about this one? He wanted to design a new, more
  flexible irqchip API that would work for XICS  MPIC. Maybe there's some
  room for cooperation here?
 
 I have not - Ben, what do you have in mind?

I've been sidetracked to some other stuff so for now Paul (CC) is taking
over my interrupt patches.

We initially changes IRQ_CREATE_IRQCHIP to take an argument but that was
causing an x86 ABI breakage (ioctl number changing). So we'll probably
be creating a new one.

From there, nothing fancy really, just an ioctl with an IRQ chip type at
the beginning followed by a union of type-specific parameters.

The main problem we haven't sorted out yet is how to replace some of the
horrors related to mapping interrupts that have tendrils all the way
into virtio-pci etc... in kemu that don't apply to use (well mostly) and
the interaction with in-kernel generated interrupts to avoid going
through qemu for vhost ec...

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Paul Mackerras
On Wed, Oct 17, 2012 at 04:39:57PM -0400, Christoffer Dall wrote:
 On Wed, Oct 17, 2012 at 4:38 PM, Alexander Graf ag...@suse.de wrote:
  On 10/14/2012 02:04 AM, Christoffer Dall wrote:
 
  *** warning: this RFC patch series is only compile-tested ***
 
  We need a way to specify the address at which we expect VMs to access
  the interrupt controller (both the emulated distributor and the hardware
  interface supporting virtualization).  User space should decide on this
  address as user space decides on an emulated board and loads a device
  tree describing these details directly to the guest.
 
  Instead of modifying the copying KVM_CREATE_IRQCHIP to an ARM specific
  ioctl with a a highly device specific set of parameters, we try
  something slightly more generic, that should fit well with how user
  space (read QEMU) first builds the individual devices and later sets up
  the emulated platform.
 
 
  Have you talked to Ben about this one? He wanted to design a new, more
  flexible irqchip API that would work for XICS  MPIC. Maybe there's some
  room for cooperation here?
 
 I have not - Ben, what do you have in mind?

I've taken over Ben's patches in this area and I'm currently working
on getting them ready for submission.  So far we only have XICS
emulation, and it is accessed through hypercalls, so there are no
addresses in the create-iochip ioctl argument yet.

What we have so far is a new ioctl:

#define KVM_CREATE_IRQCHIP_ARGS   _IOW(KVMIO,  0xac, struct kvm_irqchip_args)

where kvm_irqchip_args is defined in an arch header and currently
looks like this:

/* for KVM_CAP_SPAPR_XICS */
#define __KVM_HAVE_IRQCHIP_ARGS
struct kvm_irqchip_args {
#define KVM_IRQCHIP_TYPE_ICP0   /* XICS: ICP (presentation controller) 
*/
#define KVM_IRQCHIP_TYPE_ICS1   /* XICS: ICS (source controller) */
__u32 type;
union {
/* XICS ICP arguments. This needs to be called once before
 * creating any VCPU to initialize the main kernel XICS data
 * structures.
 */
struct {
#define KVM_ICP_FLAG_NOREALMODE 0x0001 /* Disable real mode ICP */
__u32 flags;
} icp;

/* XICS ICS arguments. You can call this for every BUID you
 * want to make available.
 *
 * The BUID is 12 bits, the interrupt number within a BUID
 * is up to 12 bits as well. The resulting interrupt numbers
 * exposed to the guest are BUID || IRQ which is 24 bit
 *
 * BUID cannot be 0.
 */
struct {
__u32 flags;
__u16 buid;
__u16 nr_irqs;
} ics;
};
};

With the XICS, there are two types of irqchip: a source controller and
a presentation controller.  There is one presentation controller per
vcpu and typically one source controller per PCI host bridge (a source
controller can manage multiple sources).  The buid above is
basically an identifier for a source controller.

So with the above, it would be quite easy to add new types and
arguments for them.

Thoughts?

Paul.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [kvmarm] [RFC PATCH 0/3] KVM: ARM: Get rid of hardcoded VGIC addresses

2012-10-17 Thread Benjamin Herrenschmidt
On Thu, 2012-10-18 at 09:10 +1100, Paul Mackerras wrote:
 
 With the XICS, there are two types of irqchip: a source controller and
 a presentation controller.  There is one presentation controller per
 vcpu and typically one source controller per PCI host bridge (a source
 controller can manage multiple sources).  The buid above is
 basically an identifier for a source controller.
 
 So with the above, it would be quite easy to add new types and
 arguments for them. 

The only possible issue is that afiak, the ioctl number depends on the
structure size, no ? If it does, then we should add some padding to the
union to leave room for new types.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html