Re: [PATCH 18/18] KVM test: Add subtest of testing offload by ethtool

2010-10-06 Thread pradeep
On Mon, 27 Sep 2010 18:44:04 -0400
Lucas Meneghel Rodrigues l...@redhat.com wrote:


 +
 +vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
 +session = kvm_test_utils.wait_for_login(vm,
 +  timeout=int(params.get(login_timeout, 360)))
 +# Let's just error the test if we identify that there's no
 ethtool installed
 +if session.get_command_status(ethtool -h):
 +raise error.TestError(Command ethtool not installed on
 guest)
 +session2 = kvm_test_utils.wait_for_login(vm,
 +  timeout=int(params.get(login_timeout, 360)))
 +mtu = 1514
 +feature_status = {}
 +filename = /tmp/ethtool.dd
 +guest_ip = vm.get_address()
 +ethname = kvm_test_utils.get_linux_ifname(session,
 vm.get_mac_address(0))
 +supported_features = params.get(supported_features).split()

I guess split this expects input.

23:48:03 ERROR| Test failed: AttributeError: 'NoneType' object has no
attribute 'split'

22.12', '00:1a:4a:65:09:09': '192.168.122.66', '9a:52:2f:62:12:63':
'192.168.122.151', '9a:52:2f:62:6b:28': '192.168.122.35'}, 'version':
0, 'tcpdump': kvm_subprocess.kvm_tail instance at 0x27cb200} 23:48:05
INFO | ['iteration.1'] 23:48:05 ERROR| Exception escaping from test:
Traceback (most recent call last): File
/home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 412,
in _exec _call_test_function(self.execute, *p_args, **p_dargs) File
/home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 605,
in _call_test_function raise error.UnhandledTestFail(e)
UnhandledTestFail: Unhandled AttributeError: 'NoneType' object has no
attribute 'split' Traceback (most recent call last): File
/home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 598,
in _call_test_function return func(*args, **dargs) File
/home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 284,
in execute postprocess_profiled_run, args, dargs) File
/home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 202,
in _call_run_once self.run_once_profiling(postprocess_profiled_run,
*args, **dargs) File
/home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 308,
in run_once_profiling self.run_once(*args, **dargs) File
/home/pradeep/vhost_net/autotest/client/tests/kvm/kvm.py, line 73, in
run_once run_func(self, params, env) File
/home/pradeep/vhost_net/autotest/client/tests/kvm/tests/ethtool.py,
line 185, in run_ethtool supported_features =
params.get(supported_features).split() AttributeError: 'NoneType'
object has no attribute 'split'



--Pradeep
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: add oom notifier for virtio balloon

2010-10-06 Thread Rusty Russell
On Tue, 5 Oct 2010 11:15:21 pm Dave Young wrote:
 Balloon could cause guest memory oom killing and panic.
 
 Add oom notify to leak some memory and retry fill balloon after 5 minutes.

Have you tried registering a shrinker?  See mm.h.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] msix/kvm integration cleanups

2010-10-06 Thread Michael S. Tsirkin
On Tue, Sep 21, 2010 at 06:05:10PM +0200, Avi Kivity wrote:
  On 09/20/2010 07:02 PM, Michael S. Tsirkin wrote:
 On Mon, Sep 20, 2010 at 05:06:41PM +0200, Avi Kivity wrote:
   This cleans up msix/kvm integration a bit.  The really important patch is 
  the
   last one, which allows msix.o to be part of non-target-specific build.
 
 I actually thoought this later move should be done in a different way:
 - add all functions msix uses to kvm-stub.c
 
 Isn't that what I did?
 
 - kvm_irq_routing_entry should also have a stub
 
 I sent some minor comments in case you have a reason
 to prefer this way.
 
 My motivation is really the last patch.  If you explain what you'd
 like to see I'll try to do it.

Still looking at this?

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/9] msix/kvm integration cleanups

2010-10-06 Thread Avi Kivity

 On 10/06/2010 11:39 AM, Michael S. Tsirkin wrote:

On Tue, Sep 21, 2010 at 06:05:10PM +0200, Avi Kivity wrote:
   On 09/20/2010 07:02 PM, Michael S. Tsirkin wrote:
  On Mon, Sep 20, 2010 at 05:06:41PM +0200, Avi Kivity wrote:
 This cleans up msix/kvm integration a bit.  The really important patch 
is the
 last one, which allows msix.o to be part of non-target-specific build.
  
  I actually thoought this later move should be done in a different way:
  - add all functions msix uses to kvm-stub.c

  Isn't that what I did?

  - kvm_irq_routing_entry should also have a stub
  
  I sent some minor comments in case you have a reason
  to prefer this way.

  My motivation is really the last patch.  If you explain what you'd
  like to see I'll try to do it.

Still looking at this?



I plan to do this yes, when I get a bit of time.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 18/18] KVM test: Add subtest of testing offload by ethtool

2010-10-06 Thread pradeep
On Wed, 6 Oct 2010 14:26:46 +0530
pradeep psuri...@linux.vnet.ibm.com wrote:

 On Mon, 27 Sep 2010 18:44:04 -0400
 Lucas Meneghel Rodrigues l...@redhat.com wrote:
 
 
 ion,
  vm.get_mac_address(0))
  +supported_features = params.get(supported_features).split()
 
 I guess split this expects input.
 
 23:48:03 ERROR| Test failed: AttributeError: 'NoneType' object has no
 attribute 'split'
 
Neglect my earlier mail. i was using rtl8139.
rtl8139 doesnt support this. 




 --Pradeep
 ___
 Autotest mailing list
 autot...@test.kernel.org
 http://test.kernel.org/cgi-bin/mailman/listinfo/autotest

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


NIC limit

2010-10-06 Thread linux_kvm
Hi again everybody,
 
One of the admins at the ProxmoxVE project was gracious enough to
quickly release a package including the previously discussed change to
allow up to 32 NICs in qemu.
 
For future reference the .deb is here:
ftp://download.proxmox.com/debian/dists/lenny/pvetest/binary-amd64/pve-qemu-kvm_0.12.5-2_amd64.deb
 
Upon creating  running the VM with the newly patched qemu-kvm app
installed, I found a NIC limitation remained in place, presumably
imposed by some other aspect of the environment.
 
The machine would start when it had 33 PCI devices, as long as no more
than 28 of them were NICs.
 
This is still a vast improvement compared to the previous limit of 8
NICs, and is very good news for my project. I post here in hopes that
maybe someone will come across the link in a search and have a solution.
 
More likely however the new API will be in place and widely in use by
then, but whatever.
 
Either way, thanks for your help yesterday.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 10/12] Handle async PF in non preemptable context

2010-10-06 Thread Gleb Natapov
On Tue, Oct 05, 2010 at 04:51:50PM -0300, Marcelo Tosatti wrote:
 On Mon, Oct 04, 2010 at 05:56:32PM +0200, Gleb Natapov wrote:
  If async page fault is received by idle task or when preemp_count is
  not zero guest cannot reschedule, so do sti; hlt and wait for page to be
  ready. vcpu can still process interrupts while it waits for the page to
  be ready.
  
  Acked-by: Rik van Riel r...@redhat.com
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   arch/x86/kernel/kvm.c |   40 ++--
   1 files changed, 34 insertions(+), 6 deletions(-)
  
  diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
  index 36fb3e4..f73946f 100644
  --- a/arch/x86/kernel/kvm.c
  +++ b/arch/x86/kernel/kvm.c
  @@ -37,6 +37,7 @@
   #include asm/cpu.h
   #include asm/traps.h
   #include asm/desc.h
  +#include asm/tlbflush.h
   
   #define MMU_QUEUE_SIZE 1024
   
  @@ -78,6 +79,8 @@ struct kvm_task_sleep_node {
  wait_queue_head_t wq;
  u32 token;
  int cpu;
  +   bool halted;
  +   struct mm_struct *mm;
   };
   
   static struct kvm_task_sleep_head {
  @@ -106,6 +109,11 @@ void kvm_async_pf_task_wait(u32 token)
  struct kvm_task_sleep_head *b = async_pf_sleepers[key];
  struct kvm_task_sleep_node n, *e;
  DEFINE_WAIT(wait);
  +   int cpu, idle;
  +
  +   cpu = get_cpu();
  +   idle = idle_cpu(cpu);
  +   put_cpu();
   
  spin_lock(b-lock);
  e = _find_apf_task(b, token);
  @@ -119,19 +127,33 @@ void kvm_async_pf_task_wait(u32 token)
   
  n.token = token;
  n.cpu = smp_processor_id();
  +   n.mm = current-active_mm;
  +   n.halted = idle || preempt_count()  1;
  +   atomic_inc(n.mm-mm_count);
 
 Can't see why this reference is needed.
I thought that if kernel thread does fault on behalf of some
process mm can go away while kernel thread is sleeping. But it looks
like kernel thread increase reference to mm it runs with by himself, so
may be this is redundant (but not harmful).

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 09/12] Inject asynchronous page fault into a PV guest if page is swapped out.

2010-10-06 Thread Gleb Natapov
On Tue, Oct 05, 2010 at 04:00:51PM -0300, Marcelo Tosatti wrote:
 On Mon, Oct 04, 2010 at 05:56:31PM +0200, Gleb Natapov wrote:
  Send async page fault to a PV guest if it accesses swapped out memory.
  Guest will choose another task to run upon receiving the fault.
  
  Allow async page fault injection only when guest is in user mode since
  otherwise guest may be in non-sleepable context and will not be able
  to reschedule.
  
  Vcpu will be halted if guest will fault on the same page again or if
  vcpu executes kernel code.
  
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   arch/x86/include/asm/kvm_host.h |3 ++
   arch/x86/kvm/mmu.c  |1 +
   arch/x86/kvm/x86.c  |   49 
  --
   include/trace/events/kvm.h  |   17 
   virt/kvm/async_pf.c |3 +-
   5 files changed, 58 insertions(+), 15 deletions(-)
  
  diff --git a/arch/x86/include/asm/kvm_host.h 
  b/arch/x86/include/asm/kvm_host.h
  index de31551..2f6fc87 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -419,6 +419,7 @@ struct kvm_vcpu_arch {
  gfn_t gfns[roundup_pow_of_two(ASYNC_PF_PER_VCPU)];
  struct gfn_to_hva_cache data;
  u64 msr_val;
  +   u32 id;
  } apf;
   };
   
  @@ -594,6 +595,7 @@ struct kvm_x86_ops {
   };
   
   struct kvm_arch_async_pf {
  +   u32 token;
  gfn_t gfn;
   };
   
  @@ -842,6 +844,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
   struct kvm_async_pf *work);
   void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
 struct kvm_async_pf *work);
  +bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
   extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
   
   #endif /* _ASM_X86_KVM_HOST_H */
  diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
  index d85fda8..de53cab 100644
  --- a/arch/x86/kvm/mmu.c
  +++ b/arch/x86/kvm/mmu.c
  @@ -2580,6 +2580,7 @@ static int nonpaging_page_fault(struct kvm_vcpu 
  *vcpu, gva_t gva,
   int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
   {
  struct kvm_arch_async_pf arch;
  +   arch.token = (vcpu-arch.apf.id++  12) | vcpu-vcpu_id;
  arch.gfn = gfn;
   
  return kvm_setup_async_pf(vcpu, gva, gfn, arch);
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index 3e123ab..0e69d37 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -6225,25 +6225,58 @@ static void kvm_del_async_pf_gfn(struct kvm_vcpu 
  *vcpu, gfn_t gfn)
  }
   }
   
  +static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
  +{
  +
  +   return kvm_write_guest_cached(vcpu-kvm, vcpu-arch.apf.data, val,
  + sizeof(val));
  +}
  +
   void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
   struct kvm_async_pf *work)
   {
  -   vcpu-arch.mp_state = KVM_MP_STATE_HALTED;
  -
  -   if (work == kvm_double_apf)
  +   if (work == kvm_double_apf) {
  trace_kvm_async_pf_doublefault(kvm_rip_read(vcpu));
  -   else {
  -   trace_kvm_async_pf_not_present(work-gva);
  -
  +   vcpu-arch.mp_state = KVM_MP_STATE_HALTED;
  +   } else {
  +   trace_kvm_async_pf_not_present(work-arch.token, work-gva);
  kvm_add_async_pf_gfn(vcpu, work-arch.gfn);
  +
  +   if (!(vcpu-arch.apf.msr_val  KVM_ASYNC_PF_ENABLED) ||
  +   kvm_x86_ops-get_cpl(vcpu) == 0)
  +   vcpu-arch.mp_state = KVM_MP_STATE_HALTED;
  +   else if (!apf_put_user(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT)) {
  +   vcpu-arch.fault.error_code = 0;
  +   vcpu-arch.fault.address = work-arch.token;
  +   kvm_inject_page_fault(vcpu);
  +   }
 
 Missed !kvm_event_needs_reinjection(vcpu) ? 
This check is done in can_do_async_pf(). We will not get here if event is 
pending.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Autotest] [PATCH 14/18] KVM test: Add a netperf subtest

2010-10-06 Thread pradeep

 
 
 This case can pass with rhel5.5  rhel6.0, not test with fedora.
 it would not be the problem of testcase.
 
 I did not touch this problem, can you provide more debug info ? eg,
 tcpdump, ...

It seems like RHEL 5.5 issue
it fails only with TCP_CRR


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-06 Thread Avi Kivity

 On 10/05/2010 04:59 PM, Marcelo Tosatti wrote:

On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote:
  If a guest accesses swapped out memory do not swap it in from vcpu thread
  context. Schedule work to do swapping and put vcpu into halted state
  instead.

  Interrupts will still be delivered to the guest and if interrupt will
  cause reschedule guest will continue to run another task.

  Signed-off-by: Gleb Natapovg...@redhat.com
  ---
   arch/x86/include/asm/kvm_host.h |   17 +++
   arch/x86/kvm/Kconfig|1 +
   arch/x86/kvm/Makefile   |1 +
   arch/x86/kvm/mmu.c  |   51 +-
   arch/x86/kvm/paging_tmpl.h  |4 +-
   arch/x86/kvm/x86.c  |  109 +++-
   include/linux/kvm_host.h|   31 ++
   include/trace/events/kvm.h  |   88 
   virt/kvm/Kconfig|3 +
   virt/kvm/async_pf.c |  220 
+++
   virt/kvm/async_pf.h |   36 +++
   virt/kvm/kvm_main.c |   57 --
   12 files changed, 603 insertions(+), 15 deletions(-)
   create mode 100644 virt/kvm/async_pf.c
   create mode 100644 virt/kvm/async_pf.h


  + async_pf_cache = NULL;
  +}
  +
  +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu)
  +{
  + INIT_LIST_HEAD(vcpu-async_pf.done);
  + INIT_LIST_HEAD(vcpu-async_pf.queue);
  + spin_lock_init(vcpu-async_pf.lock);
  +}
  +
  +static void async_pf_execute(struct work_struct *work)
  +{
  + struct page *page;
  + struct kvm_async_pf *apf =
  + container_of(work, struct kvm_async_pf, work);
  + struct mm_struct *mm = apf-mm;
  + struct kvm_vcpu *vcpu = apf-vcpu;
  + unsigned long addr = apf-addr;
  + gva_t gva = apf-gva;
  +
  + might_sleep();
  +
  + use_mm(mm);
  + down_read(mm-mmap_sem);
  + get_user_pages(current, mm, addr, 1, 1, 0,page, NULL);
  + up_read(mm-mmap_sem);
  + unuse_mm(mm);
  +
  + spin_lock(vcpu-async_pf.lock);
  + list_add_tail(apf-link,vcpu-async_pf.done);
  + apf-page = page;
  + spin_unlock(vcpu-async_pf.lock);

This can fail, and apf-page become NULL.


Does it even become NULL?  On error, get_user_pages() won't update the 
pages argument, so page becomes garbage here.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-06 Thread Gleb Natapov
On Wed, Oct 06, 2010 at 12:50:01PM +0200, Avi Kivity wrote:
  On 10/05/2010 04:59 PM, Marcelo Tosatti wrote:
 On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote:
   If a guest accesses swapped out memory do not swap it in from vcpu thread
   context. Schedule work to do swapping and put vcpu into halted state
   instead.
 
   Interrupts will still be delivered to the guest and if interrupt will
   cause reschedule guest will continue to run another task.
 
   Signed-off-by: Gleb Natapovg...@redhat.com
   ---
arch/x86/include/asm/kvm_host.h |   17 +++
arch/x86/kvm/Kconfig|1 +
arch/x86/kvm/Makefile   |1 +
arch/x86/kvm/mmu.c  |   51 +-
arch/x86/kvm/paging_tmpl.h  |4 +-
arch/x86/kvm/x86.c  |  109 +++-
include/linux/kvm_host.h|   31 ++
include/trace/events/kvm.h  |   88 
virt/kvm/Kconfig|3 +
virt/kvm/async_pf.c |  220 
  +++
virt/kvm/async_pf.h |   36 +++
virt/kvm/kvm_main.c |   57 --
12 files changed, 603 insertions(+), 15 deletions(-)
create mode 100644 virt/kvm/async_pf.c
create mode 100644 virt/kvm/async_pf.h
 
 
   + async_pf_cache = NULL;
   +}
   +
   +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu)
   +{
   + INIT_LIST_HEAD(vcpu-async_pf.done);
   + INIT_LIST_HEAD(vcpu-async_pf.queue);
   + spin_lock_init(vcpu-async_pf.lock);
   +}
   +
   +static void async_pf_execute(struct work_struct *work)
   +{
   + struct page *page;
   + struct kvm_async_pf *apf =
   + container_of(work, struct kvm_async_pf, work);
   + struct mm_struct *mm = apf-mm;
   + struct kvm_vcpu *vcpu = apf-vcpu;
   + unsigned long addr = apf-addr;
   + gva_t gva = apf-gva;
   +
   + might_sleep();
   +
   + use_mm(mm);
   + down_read(mm-mmap_sem);
   + get_user_pages(current, mm, addr, 1, 1, 0,page, NULL);
   + up_read(mm-mmap_sem);
   + unuse_mm(mm);
   +
   + spin_lock(vcpu-async_pf.lock);
   + list_add_tail(apf-link,vcpu-async_pf.done);
   + apf-page = page;
   + spin_unlock(vcpu-async_pf.lock);
 
 This can fail, and apf-page become NULL.
 
 Does it even become NULL?  On error, get_user_pages() won't update
 the pages argument, so page becomes garbage here.
 
apf is allocated with kmem_cache_zalloc() and -page is set to NULL in
kvm_setup_async_pf() to be extra sure.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 07/12] Add async PF initialization to PV guest.

2010-10-06 Thread Gleb Natapov
On Tue, Oct 05, 2010 at 03:25:54PM -0300, Marcelo Tosatti wrote:
 On Mon, Oct 04, 2010 at 05:56:29PM +0200, Gleb Natapov wrote:
  Enable async PF in a guest if async PF capability is discovered.
  
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   Documentation/kernel-parameters.txt |3 +
   arch/x86/include/asm/kvm_para.h |5 ++
   arch/x86/kernel/kvm.c   |   92 
  +++
   3 files changed, 100 insertions(+), 0 deletions(-)
  
 
  +static int __cpuinit kvm_cpu_notify(struct notifier_block *self,
  +   unsigned long action, void *hcpu)
  +{
  +   int cpu = (unsigned long)hcpu;
  +   switch (action) {
  +   case CPU_ONLINE:
  +   case CPU_DOWN_FAILED:
  +   case CPU_ONLINE_FROZEN:
  +   smp_call_function_single(cpu, kvm_guest_cpu_notify, NULL, 0);
 
 wait parameter should probably be 1.
Why should we wait for it? FWIW I copied this from somewhere (May be
arch/x86/pci/amd_bus.c).

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 03/12] Retry fault before vmentry

2010-10-06 Thread Gleb Natapov
On Tue, Oct 05, 2010 at 12:54:09PM -0300, Marcelo Tosatti wrote:
 On Mon, Oct 04, 2010 at 05:56:25PM +0200, Gleb Natapov wrote:
  When page is swapped in it is mapped into guest memory only after guest
  tries to access it again and generate another fault. To save this fault
  we can map it immediately since we know that guest is going to access
  the page. Do it only when tdp is enabled for now. Shadow paging case is
  more complicated. CR[034] and EFER registers should be switched before
  doing mapping and then switched back.
  
  Acked-by: Rik van Riel r...@redhat.com
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   arch/x86/include/asm/kvm_host.h |4 +++-
   arch/x86/kvm/mmu.c  |   16 
   arch/x86/kvm/paging_tmpl.h  |6 +++---
   arch/x86/kvm/x86.c  |7 +++
   virt/kvm/async_pf.c |2 ++
   5 files changed, 23 insertions(+), 12 deletions(-)
  
  diff --git a/arch/x86/include/asm/kvm_host.h 
  b/arch/x86/include/asm/kvm_host.h
  index 5f154d3..b9f263e 100644
  --- a/arch/x86/include/asm/kvm_host.h
  +++ b/arch/x86/include/asm/kvm_host.h
  @@ -240,7 +240,7 @@ struct kvm_mmu {
  void (*new_cr3)(struct kvm_vcpu *vcpu);
  void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root);
  unsigned long (*get_cr3)(struct kvm_vcpu *vcpu);
  -   int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err);
  +   int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err, bool 
  no_apf);
  void (*inject_page_fault)(struct kvm_vcpu *vcpu);
  void (*free)(struct kvm_vcpu *vcpu);
  gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access,
  @@ -838,6 +838,8 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu 
  *vcpu,
   struct kvm_async_pf *work);
   void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
   struct kvm_async_pf *work);
  +void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
  +  struct kvm_async_pf *work);
   extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn);
   
   #endif /* _ASM_X86_KVM_HOST_H */
  diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
  index 4d49b5e..d85fda8 100644
  --- a/arch/x86/kvm/mmu.c
  +++ b/arch/x86/kvm/mmu.c
  @@ -2558,7 +2558,7 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct 
  kvm_vcpu *vcpu, gva_t vaddr,
   }
   
   static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
  -   u32 error_code)
  +   u32 error_code, bool no_apf)
   {
  gfn_t gfn;
  int r;
  @@ -2594,8 +2594,8 @@ static bool can_do_async_pf(struct kvm_vcpu *vcpu)
  return kvm_x86_ops-interrupt_allowed(vcpu);
   }
   
  -static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva,
  -pfn_t *pfn)
  +static bool try_async_pf(struct kvm_vcpu *vcpu, bool no_apf, gfn_t gfn,
  +gva_t gva, pfn_t *pfn)
   {
  bool async;
   
  @@ -2606,7 +2606,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t 
  gfn, gva_t gva,
   
  put_page(pfn_to_page(*pfn));
   
  -   if (can_do_async_pf(vcpu)) {
  +   if (!no_apf  can_do_async_pf(vcpu)) {
  trace_kvm_try_async_get_page(async, *pfn);
  if (kvm_find_async_pf_gfn(vcpu, gfn)) {
  vcpu-async_pf.work = kvm_double_apf;
  @@ -2620,8 +2620,8 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, gfn_t 
  gfn, gva_t gva,
  return false;
   }
   
  -static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa,
  -   u32 error_code)
  +static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
  + bool no_apf)
   {
  pfn_t pfn;
  int r;
  @@ -2643,7 +2643,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, 
  gva_t gpa,
  mmu_seq = vcpu-kvm-mmu_notifier_seq;
  smp_rmb();
   
  -   if (try_async_pf(vcpu, gfn, gpa, pfn))
  +   if (try_async_pf(vcpu, no_apf, gfn, gpa, pfn))
  return 0;
   
  /* mmio */
  @@ -3306,7 +3306,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t 
  cr2, u32 error_code)
  int r;
  enum emulation_result er;
   
  -   r = vcpu-arch.mmu.page_fault(vcpu, cr2, error_code);
  +   r = vcpu-arch.mmu.page_fault(vcpu, cr2, error_code, false);
  if (r  0)
  goto out;
   
  diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
  index 8154353..9ad90f8 100644
  --- a/arch/x86/kvm/paging_tmpl.h
  +++ b/arch/x86/kvm/paging_tmpl.h
  @@ -530,8 +530,8 @@ out_gpte_changed:
*  Returns: 1 if we need to emulate the instruction, 0 otherwise, or
*   a negative value on error.
*/
  -static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr,
  -  u32 error_code)
  +static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 
  error_code,
  +bool no_apf)
   {
  int write_fault = 

Re: [PATCH v6 04/12] Add memory slot versioning and use it to provide fast guest write interface

2010-10-06 Thread Gleb Natapov
On Tue, Oct 05, 2010 at 01:57:38PM -0300, Marcelo Tosatti wrote:
 On Mon, Oct 04, 2010 at 05:56:26PM +0200, Gleb Natapov wrote:
  Keep track of memslots changes by keeping generation number in memslots
  structure. Provide kvm_write_guest_cached() function that skips
  gfn_to_hva() translation if memslots was not changed since previous
  invocation.
  
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   include/linux/kvm_host.h  |7 +
   include/linux/kvm_types.h |7 +
   virt/kvm/kvm_main.c   |   57 
  +---
   3 files changed, 67 insertions(+), 4 deletions(-)
  
  diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
  index a08614e..4dff9a1 100644
  --- a/include/linux/kvm_host.h
  +++ b/include/linux/kvm_host.h
  @@ -199,6 +199,7 @@ struct kvm_irq_routing_table {};
   
   struct kvm_memslots {
  int nmemslots;
  +   u32 generation;
  struct kvm_memory_slot memslots[KVM_MEMORY_SLOTS +
  KVM_PRIVATE_MEM_SLOTS];
   };
  @@ -352,12 +353,18 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, 
  const void *data,
   int offset, int len);
   int kvm_write_guest(struct kvm *kvm, gpa_t gpa, const void *data,
  unsigned long len);
  +int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
  +  void *data, unsigned long len);
  +int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache 
  *ghc,
  + gpa_t gpa);
   int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
   int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
   struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
   int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
   unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn);
   void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
  +void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot 
  *memslot,
  +gfn_t gfn);
   
   void kvm_vcpu_block(struct kvm_vcpu *vcpu);
   void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu);
  diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
  index 7ac0d4e..ee6eb71 100644
  --- a/include/linux/kvm_types.h
  +++ b/include/linux/kvm_types.h
  @@ -67,4 +67,11 @@ struct kvm_lapic_irq {
  u32 dest_id;
   };
   
  +struct gfn_to_hva_cache {
  +   u32 generation;
  +   gpa_t gpa;
  +   unsigned long hva;
  +   struct kvm_memory_slot *memslot;
  +};
  +
   #endif /* __KVM_TYPES_H__ */
  diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
  index db58a1b..45ef50c 100644
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
  @@ -687,6 +687,7 @@ skip_lpage:
  memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
  if (mem-slot = slots-nmemslots)
  slots-nmemslots = mem-slot + 1;
  +   slots-generation++;
  slots-memslots[mem-slot].flags |= KVM_MEMSLOT_INVALID;
   
  old_memslots = kvm-memslots;
  @@ -723,6 +724,7 @@ skip_lpage:
  memcpy(slots, kvm-memslots, sizeof(struct kvm_memslots));
  if (mem-slot = slots-nmemslots)
  slots-nmemslots = mem-slot + 1;
  +   slots-generation++;
   
  /* actual memory is freed via old in kvm_free_physmem_slot below */
  if (!npages) {
  @@ -1247,6 +1249,47 @@ int kvm_write_guest(struct kvm *kvm, gpa_t gpa, 
  const void *data,
  return 0;
   }
   
  +int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache 
  *ghc,
  + gpa_t gpa)
  +{
  +   struct kvm_memslots *slots = kvm_memslots(kvm);
  +   int offset = offset_in_page(gpa);
  +   gfn_t gfn = gpa  PAGE_SHIFT;
  +
  +   ghc-gpa = gpa;
  +   ghc-generation = slots-generation;
  +   ghc-memslot = gfn_to_memslot(kvm, gfn);
  +   ghc-hva = gfn_to_hva(kvm, gfn);
  +   if (!kvm_is_error_hva(ghc-hva))
  +   ghc-hva += offset;
  +   else
  +   return -EFAULT;
  +
  +   return 0;
  +}
 
 Should use a unique kvm_memslots structure for the cache entry, since it
 can change in between (use gfn_to_hva_memslot, etc on slots pointer).
 
I do not understand what do you mean here. kvm_memslots structure itself
is not cached only various translation that use it are cached. Translation
result are never used if kvm_memslots was changed.

 Also should zap any cached entries on overflow, otherwise malicious
 userspace could make use of stale slots:
 
There is only one cached entry at each given time. User who wants to
write into guest memory often defines gfn_to_hva_cache variable
somewhere. Init it with kvm_gfn_to_hva_cache_init() and then calls
kvm_write_guest_cached() on it. If there was no slot changes in between
cached translation are used. Otherwise cache is recalculated.

  +void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
  +{
  +   struct kvm_memory_slot *memslot;
  +
  +   memslot = gfn_to_memslot(kvm, gfn);
  

Re: [PATCH v6 02/12] Halt vcpu if page it tries to access is swapped out.

2010-10-06 Thread Gleb Natapov
On Tue, Oct 05, 2010 at 11:59:16AM -0300, Marcelo Tosatti wrote:
 On Mon, Oct 04, 2010 at 05:56:24PM +0200, Gleb Natapov wrote:
  If a guest accesses swapped out memory do not swap it in from vcpu thread
  context. Schedule work to do swapping and put vcpu into halted state
  instead.
  
  Interrupts will still be delivered to the guest and if interrupt will
  cause reschedule guest will continue to run another task.
  
  Signed-off-by: Gleb Natapov g...@redhat.com
  ---
   arch/x86/include/asm/kvm_host.h |   17 +++
   arch/x86/kvm/Kconfig|1 +
   arch/x86/kvm/Makefile   |1 +
   arch/x86/kvm/mmu.c  |   51 +-
   arch/x86/kvm/paging_tmpl.h  |4 +-
   arch/x86/kvm/x86.c  |  109 +++-
   include/linux/kvm_host.h|   31 ++
   include/trace/events/kvm.h  |   88 
   virt/kvm/Kconfig|3 +
   virt/kvm/async_pf.c |  220 
  +++
   virt/kvm/async_pf.h |   36 +++
   virt/kvm/kvm_main.c |   57 --
   12 files changed, 603 insertions(+), 15 deletions(-)
   create mode 100644 virt/kvm/async_pf.c
   create mode 100644 virt/kvm/async_pf.h
  
 
  +   async_pf_cache = NULL;
  +}
  +
  +void kvm_async_pf_vcpu_init(struct kvm_vcpu *vcpu)
  +{
  +   INIT_LIST_HEAD(vcpu-async_pf.done);
  +   INIT_LIST_HEAD(vcpu-async_pf.queue);
  +   spin_lock_init(vcpu-async_pf.lock);
  +}
  +
  +static void async_pf_execute(struct work_struct *work)
  +{
  +   struct page *page;
  +   struct kvm_async_pf *apf =
  +   container_of(work, struct kvm_async_pf, work);
  +   struct mm_struct *mm = apf-mm;
  +   struct kvm_vcpu *vcpu = apf-vcpu;
  +   unsigned long addr = apf-addr;
  +   gva_t gva = apf-gva;
  +
  +   might_sleep();
  +
  +   use_mm(mm);
  +   down_read(mm-mmap_sem);
  +   get_user_pages(current, mm, addr, 1, 1, 0, page, NULL);
  +   up_read(mm-mmap_sem);
  +   unuse_mm(mm);
  +
  +   spin_lock(vcpu-async_pf.lock);
  +   list_add_tail(apf-link, vcpu-async_pf.done);
  +   apf-page = page;
  +   spin_unlock(vcpu-async_pf.lock);
 
 This can fail, and apf-page become NULL.
 
  +   if (list_empty_careful(vcpu-async_pf.done))
  +   return;
  +
  +   spin_lock(vcpu-async_pf.lock);
  +   work = list_first_entry(vcpu-async_pf.done, typeof(*work), link);
  +   list_del(work-link);
  +   spin_unlock(vcpu-async_pf.lock);
  +
  +   kvm_arch_async_page_present(vcpu, work);
  +
  +free:
  +   list_del(work-queue);
  +   vcpu-async_pf.queued--;
  +   put_page(work-page);
  +   kmem_cache_free(async_pf_cache, work);
  +}
 
 Better handle it here (and other sites).
Yeah. We should just reenter gust and let usual code path handle error
on next guest access.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-06 Thread Arnd Bergmann
On Tuesday 05 October 2010, Krishna Kumar2 wrote:
 After testing various combinations of #txqs, #vhosts, #netperf
 sessions, I think the drop for 1 stream is due to TX and RX for
 a flow being processed on different cpus.  I did two more tests:
 1. Pin vhosts to same CPU:
 - BW drop is much lower for 1 stream case (- 5 to -8% range)
 - But performance is not so high for more sessions.
 2. Changed vhost to be single threaded:
   - No degradation for 1 session, and improvement for upto
   8, sometimes 16 streams (5-12%).
   - BW degrades after that, all the way till 128 netperf sessions.
   - But overall CPU utilization improves.
 Summary of the entire run (for 1-128 sessions):
 txq=4:  BW: (-2.3)  CPU: (-16.5)RCPU: (-5.3)
 txq=16: BW: (-1.9)  CPU: (-24.9)RCPU: (-9.6)
 
 I don't see any reasons mentioned above.  However, for higher
 number of netperf sessions, I see a big increase in retransmissions:
 ___
 #netperf  ORG   NEW
 BW (#retr)BW (#retr)
 ___
 1  70244 (0) 64102 (0)
 4  21421 (0) 36570 (416)
 8  21746 (0) 38604 (148)
 16 21783 (0) 40632 (464)
 32 22677 (0) 37163 (1053)
 64 23648 (4) 36449 (2197)
 12823251 (2) 31676 (3185)
 ___


This smells like it could be related to a problem that Ben Greear found
recently (see macvlan:  Enable qdisc backoff logic). When the hardware
is busy, used to just drop the packet. With Ben's patch, we return -EAGAIN
to qemu (or vhost-net) to trigger a resend.

I suppose what we really should do is feed that condition back to the
guest network stack and implement the backoff in there.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm: fix typo in copyright notice

2010-10-06 Thread Nicolas Kaiser
Fix typo in copyright notice.

Signed-off-by: Nicolas Kaiser ni...@nikai.net
---
 arch/x86/kvm/emulate.c |2 +-
 arch/x86/kvm/i8254.c   |2 +-
 arch/x86/kvm/i8259.c   |2 +-
 arch/x86/kvm/irq.c |2 +-
 arch/x86/kvm/lapic.c   |2 +-
 arch/x86/kvm/mmu.c |2 +-
 arch/x86/kvm/mmu_audit.c   |2 +-
 arch/x86/kvm/paging_tmpl.h |2 +-
 arch/x86/kvm/svm.c |2 +-
 arch/x86/kvm/timer.c   |2 +-
 arch/x86/kvm/vmx.c |2 +-
 arch/x86/kvm/x86.c |2 +-
 virt/kvm/irq_comm.c|2 +-
 virt/kvm/kvm_main.c|2 +-
 14 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index aead72e..cb8bd2e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -9,7 +9,7 @@
  * privileged instructions:
  *
  * Copyright (C) 2006 Qumranet
- * Copyright 2010 Red Hat, Inc. and/or its affilates.
+ * Copyright 2010 Red Hat, Inc. and/or its affiliates.
  *
  *   Avi Kivity a...@qumranet.com
  *   Yaniv Kamay ya...@qumranet.com
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 2ad40a4..efad723 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -5,7 +5,7 @@
  * Copyright (c) 2006 Intel Corporation
  * Copyright (c) 2007 Keir Fraser, XenSource Inc
  * Copyright (c) 2008 Intel Corporation
- * Copyright 2009 Red Hat, Inc. and/or its affilates.
+ * Copyright 2009 Red Hat, Inc. and/or its affiliates.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the Software), to 
deal
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 6e77471..cf585f7 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -3,7 +3,7 @@
  *
  * Copyright (c) 2003-2004 Fabrice Bellard
  * Copyright (c) 2007 Intel Corporation
- * Copyright 2009 Red Hat, Inc. and/or its affilates.
+ * Copyright 2009 Red Hat, Inc. and/or its affiliates.
  *
  * Permission is hereby granted, free of charge, to any person obtaining a copy
  * of this software and associated documentation files (the Software), to 
deal
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index f994da4..7e06ba1 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -1,7 +1,7 @@
 /*
  * irq.c: API for in kernel interrupt controller
  * Copyright (c) 2007, Intel Corporation.
- * Copyright 2009 Red Hat, Inc. and/or its affilates.
+ * Copyright 2009 Red Hat, Inc. and/or its affiliates.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 22b06f7..ed1a533 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -5,7 +5,7 @@
  * Copyright (C) 2006 Qumranet, Inc.
  * Copyright (C) 2007 Novell
  * Copyright (C) 2007 Intel
- * Copyright 2009 Red Hat, Inc. and/or its affilates.
+ * Copyright 2009 Red Hat, Inc. and/or its affiliates.
  *
  * Authors:
  *   Dor Laor dor.l...@qumranet.com
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6e248d8..3c7d024 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -7,7 +7,7 @@
  * MMU support
  *
  * Copyright (C) 2006 Qumranet, Inc.
- * Copyright 2010 Red Hat, Inc. and/or its affilates.
+ * Copyright 2010 Red Hat, Inc. and/or its affiliates.
  *
  * Authors:
  *   Yaniv Kamay  ya...@qumranet.com
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index bd2b1be..ee0feef 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -4,7 +4,7 @@
  * Audit code for KVM MMU
  *
  * Copyright (C) 2006 Qumranet, Inc.
- * Copyright 2010 Red Hat, Inc. and/or its affilates.
+ * Copyright 2010 Red Hat, Inc. and/or its affiliates.
  *
  * Authors:
  *   Yaniv Kamay  ya...@qumranet.com
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 2bdd843..30cde53 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -7,7 +7,7 @@
  * MMU support
  *
  * Copyright (C) 2006 Qumranet, Inc.
- * Copyright 2010 Red Hat, Inc. and/or its affilates.
+ * Copyright 2010 Red Hat, Inc. and/or its affiliates.
  *
  * Authors:
  *   Yaniv Kamay  ya...@qumranet.com
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index eeb08d6..a7fdd78 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4,7 +4,7 @@
  * AMD SVM support
  *
  * Copyright (C) 2006 Qumranet, Inc.
- * Copyright 2010 Red Hat, Inc. and/or its affilates.
+ * Copyright 2010 Red Hat, Inc. and/or its affiliates.
  *
  * Authors:
  *   Yaniv Kamay  ya...@qumranet.com
diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c
index e16a0db..fc7a101 100644
--- a/arch/x86/kvm/timer.c
+++ b/arch/x86/kvm/timer.c
@@ -6,7 +6,7 @@
  *
  * timer support
  *
- * Copyright 2010 Red Hat, Inc. and/or its affilates.
+ * Copyright 2010 Red Hat, Inc. and/or its affiliates.
  *
  * This work is licensed under the 

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-06 Thread Michael S. Tsirkin
On Fri, Sep 17, 2010 at 03:33:07PM +0530, Krishna Kumar wrote:
 For 1 TCP netperf, I ran 7 iterations and summed it. Explanation
 for degradation for 1 stream case:

I thought about possible RX/TX contention reasons, and I realized that
we get/put the mm counter all the time.  So I write the following: I
haven't seen any performance gain from this in a single queue case, but
maybe this will help multiqueue?

Thanks,

Michael S. Tsirkin (2):
  vhost: put mm after thread stop
  vhost-net: batch use/unuse mm

 drivers/vhost/net.c   |7 ---
 drivers/vhost/vhost.c |   16 ++--
 2 files changed, 10 insertions(+), 13 deletions(-)

-- 
1.7.3-rc1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] vhost: put mm after thread stop

2010-10-06 Thread Michael S. Tsirkin
makes it possible to batch use/unuse mm

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/vhost.c |9 -
 1 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 677d112..8b9d474 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -207,7 +207,7 @@ static int vhost_worker(void *data)
if (work) {
__set_current_state(TASK_RUNNING);
work-fn(work);
-   if (n++) {
+   if (dev-nvqs = ++n) {
__set_current_state(TASK_RUNNING);
schedule();
n = 0;
@@ -409,15 +409,14 @@ void vhost_dev_cleanup(struct vhost_dev *dev)
/* No one will access memory at this point */
kfree(dev-memory);
dev-memory = NULL;
-   if (dev-mm)
-   mmput(dev-mm);
-   dev-mm = NULL;
-
WARN_ON(!list_empty(dev-work_list));
if (dev-worker) {
kthread_stop(dev-worker);
dev-worker = NULL;
}
+   if (dev-mm)
+   mmput(dev-mm);
+   dev-mm = NULL;
 }
 
 static int log_access_ok(void __user *log_base, u64 addr, unsigned long sz)
-- 
1.7.3-rc1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] vhost-net: batch use/unuse mm

2010-10-06 Thread Michael S. Tsirkin
Move use/unuse mm to vhost.c which makes it possible to batch these
operations.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/vhost/net.c   |7 ---
 drivers/vhost/vhost.c |7 ++-
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 271678e..ff02ea4 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -10,7 +10,6 @@
 #include linux/eventfd.h
 #include linux/vhost.h
 #include linux/virtio_net.h
-#include linux/mmu_context.h
 #include linux/miscdevice.h
 #include linux/module.h
 #include linux/mutex.h
@@ -136,7 +135,6 @@ static void handle_tx(struct vhost_net *net)
return;
}
 
-   use_mm(net-dev.mm);
mutex_lock(vq-mutex);
vhost_disable_notify(vq);
 
@@ -197,7 +195,6 @@ static void handle_tx(struct vhost_net *net)
}
 
mutex_unlock(vq-mutex);
-   unuse_mm(net-dev.mm);
 }
 
 static int peek_head_len(struct sock *sk)
@@ -302,7 +299,6 @@ static void handle_rx_big(struct vhost_net *net)
if (!sock || skb_queue_empty(sock-sk-sk_receive_queue))
return;
 
-   use_mm(net-dev.mm);
mutex_lock(vq-mutex);
vhost_disable_notify(vq);
hdr_size = vq-vhost_hlen;
@@ -381,7 +377,6 @@ static void handle_rx_big(struct vhost_net *net)
}
 
mutex_unlock(vq-mutex);
-   unuse_mm(net-dev.mm);
 }
 
 /* Expects to be always run from workqueue - which acts as
@@ -413,7 +408,6 @@ static void handle_rx_mergeable(struct vhost_net *net)
if (!sock || skb_queue_empty(sock-sk-sk_receive_queue))
return;
 
-   use_mm(net-dev.mm);
mutex_lock(vq-mutex);
vhost_disable_notify(vq);
vhost_hlen = vq-vhost_hlen;
@@ -490,7 +484,6 @@ static void handle_rx_mergeable(struct vhost_net *net)
}
 
mutex_unlock(vq-mutex);
-   unuse_mm(net-dev.mm);
 }
 
 static void handle_rx(struct vhost_net *net)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 8b9d474..c83d1c2 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -15,6 +15,7 @@
 #include linux/vhost.h
 #include linux/virtio_net.h
 #include linux/mm.h
+#include linux/mmu_context.h
 #include linux/miscdevice.h
 #include linux/mutex.h
 #include linux/rcupdate.h
@@ -179,6 +180,8 @@ static int vhost_worker(void *data)
unsigned uninitialized_var(seq);
int n = 0;
 
+   use_mm(dev-mm);
+
for (;;) {
/* mb paired w/ kthread_stop */
set_current_state(TASK_INTERRUPTIBLE);
@@ -193,7 +196,7 @@ static int vhost_worker(void *data)
if (kthread_should_stop()) {
spin_unlock_irq(dev-work_lock);
__set_current_state(TASK_RUNNING);
-   return 0;
+   break;
}
if (!list_empty(dev-work_list)) {
work = list_first_entry(dev-work_list,
@@ -218,6 +221,8 @@ static int vhost_worker(void *data)
}
 
}
+   unuse_mm(dev-mm);
+   return 0;
 }
 
 /* Helper to allocate iovec buffers for all vqs. */
-- 
1.7.3-rc1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm: add oom notifier for virtio balloon

2010-10-06 Thread Dave Young
On Wed, Oct 6, 2010 at 5:05 PM, Rusty Russell ru...@rustcorp.com.au wrote:
 On Tue, 5 Oct 2010 11:15:21 pm Dave Young wrote:
 Balloon could cause guest memory oom killing and panic.

 Add oom notify to leak some memory and retry fill balloon after 5 minutes.

 Have you tried registering a shrinker?  See mm.h.

Hi, thanks. I didn't know shrinker can shrink mem beyond slab. Will try


 Thanks,
 Rusty.




-- 
Regards
dave
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Fix calculation of number of entries based on number of mce_banks

2010-10-06 Thread Dean Nelson
The number of mce_banks needs to be multiplied by 4 in order to actually
reference all of the entries.

Signed-off-by: Dean Nelson dnel...@redhat.com

---
 qemu-kvm-x86.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index fd974b3..7fd82fb 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -975,7 +975,7 @@ void kvm_arch_load_regs(CPUState *env, int level)
 } else if (level == KVM_PUT_FULL_STATE) {
 kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status);
 kvm_msr_entry_set(msrs[n++], MSR_MCG_CTL, env-mcg_ctl);
-for (i = 0; i  (env-mcg_cap  0xff); i++) {
+for (i = 0; i  (env-mcg_cap  0xff) * 4; i++) {
 kvm_msr_entry_set(msrs[n++], MSR_MC0_CTL + i, 
env-mce_banks[i]);
 }
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 03/12] Retry fault before vmentry

2010-10-06 Thread Marcelo Tosatti
On Wed, Oct 06, 2010 at 01:07:04PM +0200, Gleb Natapov wrote:
  Can't you set a bit in vcpu-requests instead, and handle it in out:
  at the end of vcpu_enter_guest? 
  
  To have a single entry point for pagefaults, after vmexit handling.
 Jumping to out: will skip vmexit handling anyway, so we will not reuse
 same call site anyway. I don't see yet why the way you propose will have
 an advantage.

What i meant was to call pagefault handler after vmexit handling.

Because the way it is in your patch now, with pre pagefault on entry,
one has to make an effort to verify ordering wrt other events on entry
processing.

With pre pagefault after vmexit, its more natural.

Does that make sense?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 07/12] Add async PF initialization to PV guest.

2010-10-06 Thread Marcelo Tosatti
On Wed, Oct 06, 2010 at 12:55:04PM +0200, Gleb Natapov wrote:
 On Tue, Oct 05, 2010 at 03:25:54PM -0300, Marcelo Tosatti wrote:
  On Mon, Oct 04, 2010 at 05:56:29PM +0200, Gleb Natapov wrote:
   Enable async PF in a guest if async PF capability is discovered.
   
   Signed-off-by: Gleb Natapov g...@redhat.com
   ---
Documentation/kernel-parameters.txt |3 +
arch/x86/include/asm/kvm_para.h |5 ++
arch/x86/kernel/kvm.c   |   92 
   +++
3 files changed, 100 insertions(+), 0 deletions(-)
   
  
   +static int __cpuinit kvm_cpu_notify(struct notifier_block *self,
   + unsigned long action, void *hcpu)
   +{
   + int cpu = (unsigned long)hcpu;
   + switch (action) {
   + case CPU_ONLINE:
   + case CPU_DOWN_FAILED:
   + case CPU_ONLINE_FROZEN:
   + smp_call_function_single(cpu, kvm_guest_cpu_notify, NULL, 0);
  
  wait parameter should probably be 1.
 Why should we wait for it? FWIW I copied this from somewhere (May be
 arch/x86/pci/amd_bus.c).

So that you know its executed in a defined point in cpu bringup.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 04/12] Add memory slot versioning and use it to provide fast guest write interface

2010-10-06 Thread Marcelo Tosatti
On Wed, Oct 06, 2010 at 01:14:17PM +0200, Gleb Natapov wrote:
   +int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache 
   *ghc,
   +   gpa_t gpa)
   +{
   + struct kvm_memslots *slots = kvm_memslots(kvm);
   + int offset = offset_in_page(gpa);
   + gfn_t gfn = gpa  PAGE_SHIFT;
   +
   + ghc-gpa = gpa;
   + ghc-generation = slots-generation;

kvm-memslots can change here.

   + ghc-memslot = gfn_to_memslot(kvm, gfn);
   + ghc-hva = gfn_to_hva(kvm, gfn);

And if so, gfn_to_memslot / gfn_to_hva will use new memslots pointer.

Should dereference all values from one copy of kvm-memslots pointer.
 
   + if (!kvm_is_error_hva(ghc-hva))
   + ghc-hva += offset;
   + else
   + return -EFAULT;
   +
   + return 0;
   +}
  
  Should use a unique kvm_memslots structure for the cache entry, since it
  can change in between (use gfn_to_hva_memslot, etc on slots pointer).
  
 I do not understand what do you mean here. kvm_memslots structure itself
 is not cached only various translation that use it are cached. Translation
 result are never used if kvm_memslots was changed.

  Also should zap any cached entries on overflow, otherwise malicious
  userspace could make use of stale slots:
  
 There is only one cached entry at each given time. User who wants to
 write into guest memory often defines gfn_to_hva_cache variable
 somewhere. Init it with kvm_gfn_to_hva_cache_init() and then calls
 kvm_write_guest_cached() on it. If there was no slot changes in between
 cached translation are used. Otherwise cache is recalculated.

Malicious userspace can cause entry to be cached, ioctl
SET_USER_MEMORY_REGION 2^32 times, generation number will match,
mark_page_dirty_in_slot will be called with pointer to freed memory.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] qemu-kvm/vhost: fix up irqfd support

2010-10-06 Thread Michael S. Tsirkin
vhost irqfd support: case where many vqs are
mapped to a single msix vector is currently broken.
Fix it up.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

This is on top of the qemu patchset, which is unchanged.
Fixes from v1:
correct error handling 

 hw/msix.c   |   68 ++-
 hw/msix.h   |4 +-
 hw/pci.h|3 +-
 hw/virtio-pci.c |   56 ++---
 4 files changed, 97 insertions(+), 34 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 3dd0456..3d4dd61 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -300,10 +300,8 @@ static void msix_mmio_writel(void *opaque, 
target_phys_addr_t addr,
 if (kvm_enabled()  kvm_irqchip_in_kernel()) {
 kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, vector));
 }
-if (was_masked != msix_is_masked(dev, vector) 
-dev-msix_mask_notifier  dev-msix_mask_notifier_opaque[vector]) {
+if (was_masked != msix_is_masked(dev, vector)  dev-msix_mask_notifier) {
 int r = dev-msix_mask_notifier(dev, vector,
-   dev-msix_mask_notifier_opaque[vector],
msix_is_masked(dev, vector));
 assert(r = 0);
 }
@@ -351,9 +349,8 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned 
nentries)
 int was_masked = msix_is_masked(dev, vector);
 dev-msix_table_page[offset] |= MSIX_VECTOR_MASK;
 if (was_masked != msix_is_masked(dev, vector) 
-dev-msix_mask_notifier  dev-msix_mask_notifier_opaque[vector]) 
{
+dev-msix_mask_notifier) {
 r = dev-msix_mask_notifier(dev, vector,
-dev-msix_mask_notifier_opaque[vector],
 msix_is_masked(dev, vector));
 assert(r = 0);
 }
@@ -379,8 +376,6 @@ int msix_init(struct PCIDevice *dev, unsigned short 
nentries,
 sizeof *dev-msix_irq_entries);
 }
 #endif
-dev-msix_mask_notifier_opaque =
-qemu_mallocz(nentries * sizeof *dev-msix_mask_notifier_opaque);
 dev-msix_mask_notifier = NULL;
 dev-msix_entry_used = qemu_mallocz(MSIX_MAX_ENTRIES *
 sizeof *dev-msix_entry_used);
@@ -444,8 +439,6 @@ int msix_uninit(PCIDevice *dev)
 dev-msix_entry_used = NULL;
 qemu_free(dev-msix_irq_entries);
 dev-msix_irq_entries = NULL;
-qemu_free(dev-msix_mask_notifier_opaque);
-dev-msix_mask_notifier_opaque = NULL;
 dev-cap_present = ~QEMU_PCI_CAP_MSIX;
 return 0;
 }
@@ -590,46 +583,79 @@ void msix_unuse_all_vectors(PCIDevice *dev)
 msix_free_irq_entries(dev);
 }
 
-int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque)
+static int msix_set_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
 {
 int r = 0;
 if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
 return 0;
 
 assert(dev-msix_mask_notifier);
-assert(opaque);
-assert(!dev-msix_mask_notifier_opaque[vector]);
 
 /* Unmask the new notifier unless vector is masked. */
 if (!msix_is_masked(dev, vector)) {
-r = dev-msix_mask_notifier(dev, vector, opaque, false);
+r = dev-msix_mask_notifier(dev, vector, false);
 if (r  0) {
 return r;
 }
 }
-dev-msix_mask_notifier_opaque[vector] = opaque;
 return r;
 }
 
-int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector)
+static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
 {
 int r = 0;
-void *opaque;
 if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
 return 0;
 
-opaque = dev-msix_mask_notifier_opaque[vector];
-
 assert(dev-msix_mask_notifier);
-assert(opaque);
 
 /* Mask the old notifier unless it is already masked. */
 if (!msix_is_masked(dev, vector)) {
-r = dev-msix_mask_notifier(dev, vector, opaque, true);
+r = dev-msix_mask_notifier(dev, vector, true);
 if (r  0) {
 return r;
 }
 }
-dev-msix_mask_notifier_opaque[vector] = NULL;
+return r;
+}
+
+int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f)
+{
+int r, n;
+assert(!dev-msix_mask_notifier);
+dev-msix_mask_notifier = f;
+for (n = 0; n  dev-msix_entries_nr; ++n) {
+r = msix_set_mask_notifier_for_vector(dev, n);
+if (r  0) {
+goto undo;
+}
+}
+return 0;
+
+undo:
+while (--n = 0) {
+msix_unset_mask_notifier_for_vector(dev, n);
+}
+dev-msix_mask_notifier = NULL;
+return r;
+}
+
+int msix_unset_mask_notifier(PCIDevice *dev)
+{
+int r, n;
+assert(dev-msix_mask_notifier);
+for (n = 0; n  dev-msix_entries_nr; ++n) {
+r = msix_unset_mask_notifier_for_vector(dev, n);
+if (r  0) {
+goto undo;
+ 

Re: [PATCH 18/18] KVM test: Add subtest of testing offload by ethtool

2010-10-06 Thread Ryan Harper
* pradeep psuri...@linux.vnet.ibm.com [2010-10-06 03:57]:
 On Mon, 27 Sep 2010 18:44:04 -0400
 Lucas Meneghel Rodrigues l...@redhat.com wrote:
 
 
  +
  +vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
  +session = kvm_test_utils.wait_for_login(vm,
  +  timeout=int(params.get(login_timeout, 360)))
  +# Let's just error the test if we identify that there's no
  ethtool installed
  +if session.get_command_status(ethtool -h):
  +raise error.TestError(Command ethtool not installed on
  guest)
  +session2 = kvm_test_utils.wait_for_login(vm,
  +  timeout=int(params.get(login_timeout, 360)))
  +mtu = 1514
  +feature_status = {}
  +filename = /tmp/ethtool.dd
  +guest_ip = vm.get_address()
  +ethname = kvm_test_utils.get_linux_ifname(session,
  vm.get_mac_address(0))
  +supported_features = params.get(supported_features).split()
 
 I guess split this expects input.
 
 23:48:03 ERROR| Test failed: AttributeError: 'NoneType' object has no
 attribute 'split'

That'll need an update to the tests_base.cfg file to ensure the test
type has that config value set.

Did the patchset miss updating tests_base.cfg.sample with this one ?

 
 22.12', '00:1a:4a:65:09:09': '192.168.122.66', '9a:52:2f:62:12:63':
 '192.168.122.151', '9a:52:2f:62:6b:28': '192.168.122.35'}, 'version':
 0, 'tcpdump': kvm_subprocess.kvm_tail instance at 0x27cb200} 23:48:05
 INFO | ['iteration.1'] 23:48:05 ERROR| Exception escaping from test:
 Traceback (most recent call last): File
 /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 412,
 in _exec _call_test_function(self.execute, *p_args, **p_dargs) File
 /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 605,
 in _call_test_function raise error.UnhandledTestFail(e)
 UnhandledTestFail: Unhandled AttributeError: 'NoneType' object has no
 attribute 'split' Traceback (most recent call last): File
 /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 598,
 in _call_test_function return func(*args, **dargs) File
 /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 284,
 in execute postprocess_profiled_run, args, dargs) File
 /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 202,
 in _call_run_once self.run_once_profiling(postprocess_profiled_run,
 *args, **dargs) File
 /home/pradeep/vhost_net/autotest/client/common_lib/test.py, line 308,
 in run_once_profiling self.run_once(*args, **dargs) File
 /home/pradeep/vhost_net/autotest/client/tests/kvm/kvm.py, line 73, in
 run_once run_func(self, params, env) File
 /home/pradeep/vhost_net/autotest/client/tests/kvm/tests/ethtool.py,
 line 185, in run_ethtool supported_features =
 params.get(supported_features).split() AttributeError: 'NoneType'
 object has no attribute 'split'
 
 
 
 --Pradeep
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ry...@us.ibm.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest

2010-10-06 Thread Marcelo Tosatti
On Wed, Oct 06, 2010 at 10:10:51AM +0900, Hidetoshi Seto wrote:
 
 (snip)
 
  Index: qemu/kvm.h
  ===
  --- qemu.orig/kvm.h
  +++ qemu/kvm.h
  @@ -110,6 +110,9 @@ int kvm_arch_init_vcpu(CPUState *env);
   
   void kvm_arch_reset_vcpu(CPUState *env);
   
  +int kvm_on_sigbus(CPUState *env, int code, void *addr);
  +int kvm_on_sigbus_vcpu(int code, void *addr);
  +
   struct kvm_guest_debug;
   struct kvm_debug_exit_arch;
   
 
 So kvm_on_sigbus() is called from qemu_kvm_eat_signal() that is
 called on vcpu thread, while kvm_on_sigbus_vcpu() is called via
 sigbus_handler that invoked on iothread using signalfd.
 
 ... Inverse naming?

Yes, fixed.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest

2010-10-06 Thread Marcelo Tosatti
On Wed, Oct 06, 2010 at 10:58:36AM +0900, Hidetoshi Seto wrote:
 I got some more question:
 
 (2010/10/05 3:54), Marcelo Tosatti wrote:
  Index: qemu/target-i386/cpu.h
  ===
  --- qemu.orig/target-i386/cpu.h
  +++ qemu/target-i386/cpu.h
  @@ -250,16 +250,32 @@
   #define PG_ERROR_RSVD_MASK 0x08
   #define PG_ERROR_I_D_MASK  0x10
   
  -#define MCG_CTL_P  (1UL8)   /* MCG_CAP register available */
  +#define MCG_CTL_P  (1ULL8)   /* MCG_CAP register available */
  +#define MCG_SER_P  (1ULL24) /* MCA recovery/new status bits */
   
  -#define MCE_CAP_DEFMCG_CTL_P
  +#define MCE_CAP_DEF(MCG_CTL_P|MCG_SER_P)
   #define MCE_BANKS_DEF  10
   
 
 It seems that current kvm doesn't support SER_P, so injecting SRAO
 to guest will mean that guest receives VAL|UC|!PCC and RIPV event
 from virtual processor that doesn't have SER_P.

Dean also noted this. I don't think it was deliberate choice to not
expose SER_P. Huang?

 I think most OSes don't expect that it can receives MCE with !PCC
 on traditional x86 processor without SER_P.
 
 Q1: Is it safe to expect that guests can handle such !PCC event?
 Q2: What is the expected behavior on the guest?
 Q3: What happen if guest reboots itself in response to the MCE?
 
 
 Thanks,
 H.Seto
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] qemu-kvm/vhost: fix up irqfd support

2010-10-06 Thread Alex Williamson
On Wed, 2010-10-06 at 16:56 +0200, Michael S. Tsirkin wrote:
 vhost irqfd support: case where many vqs are
 mapped to a single msix vector is currently broken.
 Fix it up.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 This is on top of the qemu patchset, which is unchanged.
 Fixes from v1:
   correct error handling 
 
  hw/msix.c   |   68 
 ++-
  hw/msix.h   |4 +-
  hw/pci.h|3 +-
  hw/virtio-pci.c |   56 ++---
  4 files changed, 97 insertions(+), 34 deletions(-)
 
 diff --git a/hw/msix.c b/hw/msix.c
 index 3dd0456..3d4dd61 100644
 --- a/hw/msix.c
 +++ b/hw/msix.c
 @@ -300,10 +300,8 @@ static void msix_mmio_writel(void *opaque, 
 target_phys_addr_t addr,
  if (kvm_enabled()  kvm_irqchip_in_kernel()) {
  kvm_msix_update(dev, vector, was_masked, msix_is_masked(dev, 
 vector));
  }
 -if (was_masked != msix_is_masked(dev, vector) 
 -dev-msix_mask_notifier  dev-msix_mask_notifier_opaque[vector]) {
 +if (was_masked != msix_is_masked(dev, vector)  
 dev-msix_mask_notifier) {
  int r = dev-msix_mask_notifier(dev, vector,
 - dev-msix_mask_notifier_opaque[vector],
   msix_is_masked(dev, vector));
  assert(r = 0);
  }
 @@ -351,9 +349,8 @@ static void msix_mask_all(struct PCIDevice *dev, unsigned 
 nentries)
  int was_masked = msix_is_masked(dev, vector);
  dev-msix_table_page[offset] |= MSIX_VECTOR_MASK;
  if (was_masked != msix_is_masked(dev, vector) 
 -dev-msix_mask_notifier  
 dev-msix_mask_notifier_opaque[vector]) {
 +dev-msix_mask_notifier) {
  r = dev-msix_mask_notifier(dev, vector,
 -
 dev-msix_mask_notifier_opaque[vector],
  msix_is_masked(dev, vector));
  assert(r = 0);
  }
 @@ -379,8 +376,6 @@ int msix_init(struct PCIDevice *dev, unsigned short 
 nentries,
  sizeof *dev-msix_irq_entries);
  }
  #endif
 -dev-msix_mask_notifier_opaque =
 -qemu_mallocz(nentries * sizeof *dev-msix_mask_notifier_opaque);
  dev-msix_mask_notifier = NULL;
  dev-msix_entry_used = qemu_mallocz(MSIX_MAX_ENTRIES *
  sizeof *dev-msix_entry_used);
 @@ -444,8 +439,6 @@ int msix_uninit(PCIDevice *dev)
  dev-msix_entry_used = NULL;
  qemu_free(dev-msix_irq_entries);
  dev-msix_irq_entries = NULL;
 -qemu_free(dev-msix_mask_notifier_opaque);
 -dev-msix_mask_notifier_opaque = NULL;
  dev-cap_present = ~QEMU_PCI_CAP_MSIX;
  return 0;
  }
 @@ -590,46 +583,79 @@ void msix_unuse_all_vectors(PCIDevice *dev)
  msix_free_irq_entries(dev);
  }
  
 -int msix_set_mask_notifier(PCIDevice *dev, unsigned vector, void *opaque)
 +static int msix_set_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
  {
  int r = 0;
  if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
  return 0;
  
  assert(dev-msix_mask_notifier);
 -assert(opaque);
 -assert(!dev-msix_mask_notifier_opaque[vector]);
  
  /* Unmask the new notifier unless vector is masked. */
  if (!msix_is_masked(dev, vector)) {
 -r = dev-msix_mask_notifier(dev, vector, opaque, false);
 +r = dev-msix_mask_notifier(dev, vector, false);
  if (r  0) {
  return r;
  }
  }
 -dev-msix_mask_notifier_opaque[vector] = opaque;
  return r;
  }
  
 -int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector)
 +static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned 
 vector)
  {
  int r = 0;
 -void *opaque;
  if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
  return 0;
  
 -opaque = dev-msix_mask_notifier_opaque[vector];
 -
  assert(dev-msix_mask_notifier);
 -assert(opaque);
  
  /* Mask the old notifier unless it is already masked. */
  if (!msix_is_masked(dev, vector)) {
 -r = dev-msix_mask_notifier(dev, vector, opaque, true);
 +r = dev-msix_mask_notifier(dev, vector, true);
  if (r  0) {
  return r;
  }
  }
 -dev-msix_mask_notifier_opaque[vector] = NULL;
 +return r;
 +}

The above need to be combined to a single function now since the only
difference is s/true/false.

Alex

 +
 +int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f)
 +{
 +int r, n;
 +assert(!dev-msix_mask_notifier);
 +dev-msix_mask_notifier = f;
 +for (n = 0; n  dev-msix_entries_nr; ++n) {
 +r = msix_set_mask_notifier_for_vector(dev, n);
 +if (r  0) {
 +goto undo;
 +}
 +}
 +return 0;
 +
 +undo:
 +while (--n = 0) {
 +msix_unset_mask_notifier_for_vector(dev, n);
 +}
 +

Re: [PATCH 18/18] KVM test: Add subtest of testing offload by ethtool

2010-10-06 Thread Lucas Meneghel Rodrigues
On Wed, 2010-10-06 at 10:55 -0500, Ryan Harper wrote:
 * pradeep psuri...@linux.vnet.ibm.com [2010-10-06 03:57]:
  On Mon, 27 Sep 2010 18:44:04 -0400
  Lucas Meneghel Rodrigues l...@redhat.com wrote:
  
  
   +
   +vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
   +session = kvm_test_utils.wait_for_login(vm,
   +  timeout=int(params.get(login_timeout, 360)))
   +# Let's just error the test if we identify that there's no
   ethtool installed
   +if session.get_command_status(ethtool -h):
   +raise error.TestError(Command ethtool not installed on
   guest)
   +session2 = kvm_test_utils.wait_for_login(vm,
   +  timeout=int(params.get(login_timeout, 360)))
   +mtu = 1514
   +feature_status = {}
   +filename = /tmp/ethtool.dd
   +guest_ip = vm.get_address()
   +ethname = kvm_test_utils.get_linux_ifname(session,
   vm.get_mac_address(0))
   +supported_features = params.get(supported_features).split()
  
  I guess split this expects input.
  
  23:48:03 ERROR| Test failed: AttributeError: 'NoneType' object has no
  attribute 'split'
 
 That'll need an update to the tests_base.cfg file to ensure the test
 type has that config value set.
 
 Did the patchset miss updating tests_base.cfg.sample with this one ?

I think pradeep forgot to update tests_base.cfg indeed. It's working
fine for me.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-06 Thread Krishna Kumar2
Michael S. Tsirkin m...@redhat.com wrote on 10/06/2010 07:04:31 PM:

 Michael S. Tsirkin m...@redhat.com
 10/06/2010 07:04 PM

 To

 Krishna Kumar2/India/i...@ibmin

 cc

 ru...@rustcorp.com.au, da...@davemloft.net, kvm@vger.kernel.org,
 a...@arndb.de, net...@vger.kernel.org, a...@redhat.com,
anth...@codemonkey.ws

 Subject

 Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

 On Fri, Sep 17, 2010 at 03:33:07PM +0530, Krishna Kumar wrote:
  For 1 TCP netperf, I ran 7 iterations and summed it. Explanation
  for degradation for 1 stream case:

 I thought about possible RX/TX contention reasons, and I realized that
 we get/put the mm counter all the time.  So I write the following: I
 haven't seen any performance gain from this in a single queue case, but
 maybe this will help multiqueue?

Great! I am on vacation tomorrow, but will test with this patch
tomorrow night.

Thanks,

- KK

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] qemu-kvm/vhost: fix up irqfd support

2010-10-06 Thread Michael S. Tsirkin
On Wed, Oct 06, 2010 at 10:48:44AM -0600, Alex Williamson wrote:
  -int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector)
  +static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned 
  vector)
   {
   int r = 0;
  -void *opaque;
   if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
   return 0;
   
  -opaque = dev-msix_mask_notifier_opaque[vector];
  -
   assert(dev-msix_mask_notifier);
  -assert(opaque);
   
   /* Mask the old notifier unless it is already masked. */
   if (!msix_is_masked(dev, vector)) {
  -r = dev-msix_mask_notifier(dev, vector, opaque, true);
  +r = dev-msix_mask_notifier(dev, vector, true);
   if (r  0) {
   return r;
   }
   }
  -dev-msix_mask_notifier_opaque[vector] = NULL;
  +return r;
  +}
 
 The above need to be combined to a single function now since the only
 difference is s/true/false.
 
 Alex

This is the way it was in the past, and it turned out to be very
confusing to read since both variables: mask and assign are bool but
polarity is reversed.

Unrolled it seems easier to grok.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 8 NIC limit - patch - places limit at 32

2010-10-06 Thread linux_kvm
It's 8 otherwise- and after the patch is applied, it still only goes to
28 for some reason.
28's acceptable for my needs, so I'll step aside from here  leave it to
the experts.

As for the new -device method, that's all fine  good but AFAIK it's not
implemented on my platform, so this was the answer.

On Wed, 06 Oct 2010 07:54 -0500, Anthony Liguori
anth...@codemonkey.ws wrote:
 On 10/06/2010 12:46 AM, linux_...@proinbox.com wrote:
  Attached is a patch that allows qemu to have up to 32 NICs, without
  using the qdev -device method.
 
 
 I'd rather there be no fixed limit and we validate that when add fails 
 because there isn't a TCP slot available, we do the right thing.
 
 BTW, using -device, it should be possible to add a very high number of 
 nics because you can specify the PCI address including a function.  If 
 this doesn't Just Work today, we should make it work.
 
 Regards,
 
 Anthony Liguori
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-06 Thread Krishna Kumar2
Arnd Bergmann a...@arndb.de wrote on 10/06/2010 05:49:00 PM:

  I don't see any reasons mentioned above.  However, for higher
  number of netperf sessions, I see a big increase in retransmissions:
  ___
  #netperf  ORG   NEW
  BW (#retr)BW (#retr)
  ___
  1  70244 (0) 64102 (0)
  4  21421 (0) 36570 (416)
  8  21746 (0) 38604 (148)
  16 21783 (0) 40632 (464)
  32 22677 (0) 37163 (1053)
  64 23648 (4) 36449 (2197)
  12823251 (2) 31676 (3185)
  ___


 This smells like it could be related to a problem that Ben Greear found
 recently (see macvlan:  Enable qdisc backoff logic). When the hardware
 is busy, used to just drop the packet. With Ben's patch, we return
-EAGAIN
 to qemu (or vhost-net) to trigger a resend.

 I suppose what we really should do is feed that condition back to the
 guest network stack and implement the backoff in there.

Thanks for the pointer. I will take a look at this as I hadn't seen
this patch earlier. Is there any way to figure out if this is the
issue?

Thanks,

- KK

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


virtio network performance [was: Re: BCM5708 performance issues]

2010-10-06 Thread Chris Wright
* Chris Wright (chr...@sous-sol.org) wrote:
 * Pete Ashdown (pashd...@xmission.com) wrote:
  ProxMox guest:
  /usr/bin/kvm -monitor unix:/var/run/qemu-server/104.mon,server,nowait -vnc 
  unix:/var/run/qemu-server/104.vnc,password -pidfile 
  /var/run/qemu-server/104.pid -daemonize -usbdevice tablet -name 
  UbuntuServer -smp sockets=2,cores=2 -nodefaults -boot menu=on -vga cirrus 
  -tdf -k en-us -drive 
  file=/var/lib/vz/images/104/vm-104-disk-2.raw,if=ide,index=3 -drive 
  file=/var/lib/vz/images/104/vm-104-disk-1.raw,if=virtio,index=0,boot=on -m 
  1024 -net 
  tap,vlan=0,ifname=vmtab104i0,script=/var/lib/qemu-server/bridge-vlan -net 
  nic,vlan=0,model=virtio,macaddr=76:3F:1A:03:6D:6F
  
  Ubuntu guest:
  /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 1024 -smp 1 -name ubutest -uuid 
  c0537369-fffa-9680-2f29-2e0cc0406561 -chardev 
  socket,id=monitor,path=/var/lib/libvirt/qemu/ubutest.monitor,server,nowait 
  -monitor chardev:monitor -boot c -drive 
  file=/dev/vg/ubutest,if=virtio,index=0,boot=on -net 
  nic,macaddr=52:54:00:35:11:f1,vlan=0,model=virtio,name=virtio.0 -net 
  tap,fd=51,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 
  -parallel none -usb -vnc 0.0.0.0
 
 Not sure what userspace you are using, but you are probably not getting
 any of the useful offload features set.  Checking ethtool -k $ETH
 in the guest will verify that.
 
 Try changing this:
 
 -net nic,macaddr=52:54:00:35:11:f1,vlan=0,model=virtio,name=virtio.0 \
 -net tap,fd=51,vlan=0,name=tap.0
 
 to use newer syntax:
 
 -netdev type=tap,id=netdev0
 -device virtio-net-pci,mac=52:54:00:35:11:f1,netdev=netdev0
 
 With just a 1Gb link, you should see line rate from guest via virtio.

Just to follow-up for the archives.  Pete replied offlist that using
the above cmdline eliminates the performance issue.

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NIC limit

2010-10-06 Thread Chris Wright
* linux_...@proinbox.com (linux_...@proinbox.com) wrote:
 Hi again everybody,
  
 One of the admins at the ProxmoxVE project was gracious enough to
 quickly release a package including the previously discussed change to
 allow up to 32 NICs in qemu.

You mean they patched qemu to increase the MAX_NICS constant?  Nice to
get the quick turn around.

Te better choice is to use a newer command line.  Not only does it avoid
the MAX_NICS limitation, but it also enables standard virtio-net offload
accelerations.

 For future reference the .deb is here:
 ftp://download.proxmox.com/debian/dists/lenny/pvetest/binary-amd64/pve-qemu-kvm_0.12.5-2_amd64.deb
  
 Upon creating  running the VM with the newly patched qemu-kvm app
 installed, I found a NIC limitation remained in place, presumably
 imposed by some other aspect of the environment.
  
 The machine would start when it had 33 PCI devices, as long as no more
 than 28 of them were NICs.

The PCI bus has only 32 slots (devices), 3 taken by chipset + vga, and
a 4th if you have, for example, a virtio disk.  Are you sure these are
33 PCI devices and not 33 PCI functions?

thanks,
-chris
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] qemu-kvm/vhost: fix up irqfd support

2010-10-06 Thread Alex Williamson
On Wed, 2010-10-06 at 19:02 +0200, Michael S. Tsirkin wrote:
 On Wed, Oct 06, 2010 at 10:48:44AM -0600, Alex Williamson wrote:
   -int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector)
   +static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned 
   vector)
{
int r = 0;
   -void *opaque;
if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
return 0;

   -opaque = dev-msix_mask_notifier_opaque[vector];
   -
assert(dev-msix_mask_notifier);
   -assert(opaque);

/* Mask the old notifier unless it is already masked. */
if (!msix_is_masked(dev, vector)) {
   -r = dev-msix_mask_notifier(dev, vector, opaque, true);
   +r = dev-msix_mask_notifier(dev, vector, true);
if (r  0) {
return r;
}
}
   -dev-msix_mask_notifier_opaque[vector] = NULL;
   +return r;
   +}
  
  The above need to be combined to a single function now since the only
  difference is s/true/false.
  
  Alex
 
 This is the way it was in the past, and it turned out to be very
 confusing to read since both variables: mask and assign are bool but
 polarity is reversed.
 
 Unrolled it seems easier to grok.

You could always keep the functions as separate wrapper callers of the
common function so you only need to keep true = unset, false = set
straight in one place.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] qemu-kvm/vhost: fix up irqfd support

2010-10-06 Thread Michael S. Tsirkin
On Wed, Oct 06, 2010 at 11:24:24AM -0600, Alex Williamson wrote:
 On Wed, 2010-10-06 at 19:02 +0200, Michael S. Tsirkin wrote:
  On Wed, Oct 06, 2010 at 10:48:44AM -0600, Alex Williamson wrote:
-int msix_unset_mask_notifier(PCIDevice *dev, unsigned vector)
+static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, 
unsigned vector)
 {
 int r = 0;
-void *opaque;
 if (vector = dev-msix_entries_nr || 
!dev-msix_entry_used[vector])
 return 0;
 
-opaque = dev-msix_mask_notifier_opaque[vector];
-
 assert(dev-msix_mask_notifier);
-assert(opaque);
 
 /* Mask the old notifier unless it is already masked. */
 if (!msix_is_masked(dev, vector)) {
-r = dev-msix_mask_notifier(dev, vector, opaque, true);
+r = dev-msix_mask_notifier(dev, vector, true);
 if (r  0) {
 return r;
 }
 }
-dev-msix_mask_notifier_opaque[vector] = NULL;
+return r;
+}
   
   The above need to be combined to a single function now since the only
   difference is s/true/false.
   
   Alex
  
  This is the way it was in the past, and it turned out to be very
  confusing to read since both variables: mask and assign are bool but
  polarity is reversed.
  
  Unrolled it seems easier to grok.
 
 You could always keep the functions as separate wrapper callers of the
 common function so you only need to keep true = unset, false = set
 straight in one place.  Thanks,
 
 Alex

wrappers still make this confusing.
we had so many bugs here, I feel minor duplication
is worth it.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 3/8] Expose thread_id in info cpus

2010-10-06 Thread Marcelo Tosatti
commit ce6325ff1af34dbaee91c8d28e792277e43f1227
Author: Glauber Costa gco...@redhat.com
Date:   Wed Mar 5 17:01:10 2008 -0300

Augment info cpus

This patch exposes the thread id associated with each
cpu through the already well known 'info cpus' interface.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu/cpu-defs.h
===
--- qemu.orig/cpu-defs.h
+++ qemu/cpu-defs.h
@@ -197,6 +197,7 @@ typedef struct CPUWatchpoint {
 int nr_cores;  /* number of cores within this CPU package */\
 int nr_threads;/* number of threads within this CPU */  \
 int running; /* Nonzero if cpu is currently running(usermode).  */  \
+int thread_id;  \
 /* user data */ \
 void *opaque;   \
 \
Index: qemu/cpus.c
===
--- qemu.orig/cpus.c
+++ qemu/cpus.c
@@ -539,6 +539,7 @@ static void *kvm_cpu_thread_fn(void *arg
 
 qemu_mutex_lock(qemu_global_mutex);
 qemu_thread_self(env-thread);
+env-thread_id = get_thread_id();
 if (kvm_enabled())
 kvm_init_vcpu(env);
 
@@ -578,6 +579,10 @@ static void *tcg_cpu_thread_fn(void *arg
 while (!qemu_system_ready)
 qemu_cond_timedwait(qemu_system_cond, qemu_global_mutex, 100);
 
+for (env = first_cpu; env != NULL; env = env-next_cpu) {
+env-thread_id = get_thread_id();
+}
+
 while (1) {
 cpu_exec_all();
 qemu_tcg_wait_io_event();
Index: qemu/exec.c
===
--- qemu.orig/exec.c
+++ qemu/exec.c
@@ -637,6 +637,7 @@ void cpu_exec_init(CPUState *env)
 env-numa_node = 0;
 QTAILQ_INIT(env-breakpoints);
 QTAILQ_INIT(env-watchpoints);
+env-thread_id = get_thread_id();
 *penv = env;
 #if defined(CONFIG_USER_ONLY)
 cpu_list_unlock();
Index: qemu/osdep.c
===
--- qemu.orig/osdep.c
+++ qemu/osdep.c
@@ -44,6 +44,10 @@
 extern int madvise(caddr_t, size_t, int);
 #endif
 
+#ifdef CONFIG_LINUX
+#include sys/syscall.h
+#endif
+
 #ifdef CONFIG_EVENTFD
 #include sys/eventfd.h
 #endif
@@ -200,6 +204,17 @@ int qemu_create_pidfile(const char *file
 return 0;
 }
 
+int get_thread_id(void)
+{
+#if defined (_WIN32)
+return GetCurrentThreadId();
+#elif defined (__linux__)
+return syscall(SYS_gettid);
+#else
+return getpid();
+#endif
+}
+
 #ifdef _WIN32
 
 /* mingw32 needs ffs for compilations without optimization. */
Index: qemu/osdep.h
===
--- qemu.orig/osdep.h
+++ qemu/osdep.h
@@ -126,6 +126,7 @@ void qemu_vfree(void *ptr);
 int qemu_madvise(void *addr, size_t len, int advice);
 
 int qemu_create_pidfile(const char *filename);
+int get_thread_id(void);
 
 #ifdef _WIN32
 int ffs(int i);
Index: qemu/monitor.c
===
--- qemu.orig/monitor.c
+++ qemu/monitor.c
@@ -878,6 +878,9 @@ static void print_cpu_iter(QObject *obj,
 monitor_printf(mon,  (halted));
 }
 
+monitor_printf(mon,  thread_id=% PRId64  ,
+   qdict_get_int(cpu, thread_id));
+
 monitor_printf(mon, \n);
 }
 
@@ -922,6 +925,7 @@ static void do_info_cpus(Monitor *mon, Q
 #elif defined(TARGET_MIPS)
 qdict_put(cpu, PC, qint_from_int(env-active_tc.PC));
 #endif
+qdict_put(cpu, thread_id, qint_from_int(env-thread_id));
 
 qlist_append(cpu_list, cpu);
 }


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 6/8] Add RAM - physical addr mapping in MCE simulation

2010-10-06 Thread Marcelo Tosatti
From: Huang Ying ying.hu...@intel.com

In QEMU-KVM, physical address != RAM address. While MCE simulation
needs physical address instead of RAM address. So
kvm_physical_memory_addr_from_ram() is implemented to do the
conversion, and it is invoked before being filled in the IA32_MCi_ADDR
MSR.

Reported-by: Dean Nelson dnel...@redhat.com
Signed-off-by: Huang Ying ying.hu...@intel.com
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu/kvm-all.c
===
--- qemu.orig/kvm-all.c
+++ qemu/kvm-all.c
@@ -137,6 +137,24 @@ static KVMSlot *kvm_lookup_overlapping_s
 return found;
 }
 
+int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
+  target_phys_addr_t *phys_addr)
+{
+int i;
+
+for (i = 0; i  ARRAY_SIZE(s-slots); i++) {
+KVMSlot *mem = s-slots[i];
+
+if (ram_addr = mem-phys_offset 
+ram_addr  mem-phys_offset + mem-memory_size) {
+*phys_addr = mem-start_addr + (ram_addr - mem-phys_offset);
+return 1;
+}
+}
+
+return 0;
+}
+
 static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot)
 {
 struct kvm_userspace_memory_region mem;
Index: qemu/kvm.h
===
--- qemu.orig/kvm.h
+++ qemu/kvm.h
@@ -174,6 +174,9 @@ static inline void cpu_synchronize_post_
 }
 }
 
+int kvm_physical_memory_addr_from_ram(KVMState *s, ram_addr_t ram_addr,
+  target_phys_addr_t *phys_addr);
+
 #endif
 int kvm_set_ioeventfd_mmio_long(int fd, uint32_t adr, uint32_t val, bool 
assign);
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 8/8] Add savevm/loadvm support for MCE

2010-10-06 Thread Marcelo Tosatti
Port qemu-kvm's

commit 1bab5d11545d8de5facf46c28630085a2f9651ae
Author: Huang Ying ying.hu...@intel.com
Date:   Wed Mar 3 16:52:46 2010 +0800

Add savevm/loadvm support for MCE

MCE registers are saved/load into/from CPUState in
kvm_arch_save/load_regs. To simulate the MCG_STATUS clearing upon
reset, MSR_MCG_STATUS is set to 0 for KVM_PUT_RESET_STATE.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu/target-i386/kvm.c
===
--- qemu.orig/target-i386/kvm.c
+++ qemu/target-i386/kvm.c
@@ -774,7 +774,7 @@ static int kvm_put_msrs(CPUState *env, i
 struct kvm_msr_entry entries[100];
 } msr_data;
 struct kvm_msr_entry *msrs = msr_data.entries;
-int n = 0;
+int i, n = 0;
 
 kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_CS, env-sysenter_cs);
 kvm_msr_entry_set(msrs[n++], MSR_IA32_SYSENTER_ESP, env-sysenter_esp);
@@ -794,6 +794,18 @@ static int kvm_put_msrs(CPUState *env, i
   env-system_time_msr);
 kvm_msr_entry_set(msrs[n++], MSR_KVM_WALL_CLOCK, env-wall_clock_msr);
 }
+#ifdef KVM_CAP_MCE
+if (env-mcg_cap) {
+if (level == KVM_PUT_RESET_STATE)
+kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status);
+else if (level == KVM_PUT_FULL_STATE) {
+kvm_msr_entry_set(msrs[n++], MSR_MCG_STATUS, env-mcg_status);
+kvm_msr_entry_set(msrs[n++], MSR_MCG_CTL, env-mcg_ctl);
+for (i = 0; i  (env-mcg_cap  0xff) * 4; i++)
+kvm_msr_entry_set(msrs[n++], MSR_MC0_CTL + i, 
env-mce_banks[i]);
+}
+}
+#endif
 
 msr_data.info.nmsrs = n;
 
@@ -1001,6 +1013,15 @@ static int kvm_get_msrs(CPUState *env)
 msrs[n++].index = MSR_KVM_SYSTEM_TIME;
 msrs[n++].index = MSR_KVM_WALL_CLOCK;
 
+#ifdef KVM_CAP_MCE
+if (env-mcg_cap) {
+msrs[n++].index = MSR_MCG_STATUS;
+msrs[n++].index = MSR_MCG_CTL;
+for (i = 0; i  (env-mcg_cap  0xff) * 4; i++)
+msrs[n++].index = MSR_MC0_CTL + i;
+}
+#endif
+
 msr_data.info.nmsrs = n;
 ret = kvm_vcpu_ioctl(env, KVM_GET_MSRS, msr_data);
 if (ret  0)
@@ -1043,6 +1064,22 @@ static int kvm_get_msrs(CPUState *env)
 case MSR_KVM_WALL_CLOCK:
 env-wall_clock_msr = msrs[i].data;
 break;
+#ifdef KVM_CAP_MCE
+case MSR_MCG_STATUS:
+env-mcg_status = msrs[i].data;
+break;
+case MSR_MCG_CTL:
+env-mcg_ctl = msrs[i].data;
+break;
+#endif
+default:
+#ifdef KVM_CAP_MCE
+if (msrs[i].index = MSR_MC0_CTL 
+msrs[i].index  MSR_MC0_CTL + (env-mcg_cap  0xff) * 4) {
+env-mce_banks[msrs[i].index - MSR_MC0_CTL] = msrs[i].data;
+break;
+}
+#endif
 }
 }
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 5/8] Export qemu_ram_addr_from_host

2010-10-06 Thread Marcelo Tosatti
To be used by next patches.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu/cpu-common.h
===
--- qemu.orig/cpu-common.h
+++ qemu/cpu-common.h
@@ -47,7 +47,8 @@ void qemu_ram_free(ram_addr_t addr);
 /* This should only be used for ram local to a device.  */
 void *qemu_get_ram_ptr(ram_addr_t addr);
 /* This should not be used by devices.  */
-ram_addr_t qemu_ram_addr_from_host(void *ptr);
+int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr);
+ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr);
 
 int cpu_register_io_memory(CPUReadMemoryFunc * const *mem_read,
CPUWriteMemoryFunc * const *mem_write,
Index: qemu/exec.c
===
--- qemu.orig/exec.c
+++ qemu/exec.c
@@ -2086,7 +2086,7 @@ static inline void tlb_update_dirty(CPUT
 if ((tlb_entry-addr_write  ~TARGET_PAGE_MASK) == IO_MEM_RAM) {
 p = (void *)(unsigned long)((tlb_entry-addr_write  TARGET_PAGE_MASK)
 + tlb_entry-addend);
-ram_addr = qemu_ram_addr_from_host(p);
+ram_addr = qemu_ram_addr_from_host_nofail(p);
 if (!cpu_physical_memory_is_dirty(ram_addr)) {
 tlb_entry-addr_write |= TLB_NOTDIRTY;
 }
@@ -2938,23 +2938,31 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 return NULL;
 }
 
-/* Some of the softmmu routines need to translate from a host pointer
-   (typically a TLB entry) back to a ram offset.  */
-ram_addr_t qemu_ram_addr_from_host(void *ptr)
+int qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
 {
 RAMBlock *block;
 uint8_t *host = ptr;
 
 QLIST_FOREACH(block, ram_list.blocks, next) {
 if (host - block-host  block-length) {
-return block-offset + (host - block-host);
+*ram_addr = block-offset + (host - block-host);
+return 0;
 }
 }
+return -1;
+}
 
-fprintf(stderr, Bad ram pointer %p\n, ptr);
-abort();
+/* Some of the softmmu routines need to translate from a host pointer
+   (typically a TLB entry) back to a ram offset.  */
+ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr)
+{
+ram_addr_t ram_addr;
 
-return 0;
+if (qemu_ram_addr_from_host(ptr, ram_addr)) {
+fprintf(stderr, Bad ram pointer %p\n, ptr);
+abort();
+}
+return ram_addr;
 }
 
 static uint32_t unassigned_mem_readb(void *opaque, target_phys_addr_t addr)
@@ -3703,7 +3711,7 @@ void cpu_physical_memory_unmap(void *buf
 {
 if (buffer != bounce.buffer) {
 if (is_write) {
-ram_addr_t addr1 = qemu_ram_addr_from_host(buffer);
+ram_addr_t addr1 = qemu_ram_addr_from_host_nofail(buffer);
 while (access_len) {
 unsigned l;
 l = TARGET_PAGE_SIZE;
Index: qemu/exec-all.h
===
--- qemu.orig/exec-all.h
+++ qemu/exec-all.h
@@ -334,7 +334,7 @@ static inline tb_page_addr_t get_page_ad
 }
 p = (void *)(unsigned long)addr
 + env1-tlb_table[mmu_idx][page_index].addend;
-return qemu_ram_addr_from_host(p);
+return qemu_ram_addr_from_host_nofail(p);
 }
 #endif
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix calculation of number of entries based on number of mce_banks

2010-10-06 Thread Marcelo Tosatti
On Wed, Oct 06, 2010 at 10:08:19AM -0400, Dean Nelson wrote:
 The number of mce_banks needs to be multiplied by 4 in order to actually
 reference all of the entries.
 
 Signed-off-by: Dean Nelson dnel...@redhat.com

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 2/8] iothread: use signalfd

2010-10-06 Thread Marcelo Tosatti
Block SIGALRM, SIGIO and consume them via signalfd.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu/cpus.c
===
--- qemu.orig/cpus.c
+++ qemu/cpus.c
@@ -33,6 +33,7 @@
 #include exec-all.h
 
 #include cpus.h
+#include compatfd.h
 
 #ifdef SIGRTMIN
 #define SIG_IPI (SIGRTMIN+4)
@@ -329,14 +330,75 @@ static QemuCond qemu_work_cond;
 
 static void tcg_init_ipi(void);
 static void kvm_init_ipi(CPUState *env);
-static void unblock_io_signals(void);
+static sigset_t block_io_signals(void);
+
+/* If we have signalfd, we mask out the signals we want to handle and then
+ * use signalfd to listen for them.  We rely on whatever the current signal
+ * handler is to dispatch the signals when we receive them.
+ */
+static void sigfd_handler(void *opaque)
+{
+int fd = (unsigned long) opaque;
+struct qemu_signalfd_siginfo info;
+struct sigaction action;
+ssize_t len;
+
+while (1) {
+do {
+len = read(fd, info, sizeof(info));
+} while (len == -1  errno == EINTR);
+
+if (len == -1  errno == EAGAIN) {
+break;
+}
+
+if (len != sizeof(info)) {
+printf(read from sigfd returned %zd: %m\n, len);
+return;
+}
+
+sigaction(info.ssi_signo, NULL, action);
+if ((action.sa_flags  SA_SIGINFO)  action.sa_sigaction) {
+action.sa_sigaction(info.ssi_signo,
+(siginfo_t *)info, NULL);
+} else if (action.sa_handler) {
+action.sa_handler(info.ssi_signo);
+}
+}
+}
+
+static int qemu_signalfd_init(sigset_t mask)
+{
+int sigfd;
+
+sigfd = qemu_signalfd(mask);
+if (sigfd == -1) {
+fprintf(stderr, failed to create signalfd\n);
+return -errno;
+}
+
+fcntl_setfl(sigfd, O_NONBLOCK);
+
+qemu_set_fd_handler2(sigfd, NULL, sigfd_handler, NULL,
+ (void *)(unsigned long) sigfd);
+
+return 0;
+}
 
 int qemu_init_main_loop(void)
 {
 int ret;
+sigset_t blocked_signals;
 
 cpu_set_debug_excp_handler(cpu_debug_handler);
 
+blocked_signals = block_io_signals();
+
+ret = qemu_signalfd_init(blocked_signals);
+if (ret)
+return ret;
+
+/* Note eventfd must be drained before signalfd handlers run */
 ret = qemu_event_init();
 if (ret)
 return ret;
@@ -347,7 +409,6 @@ int qemu_init_main_loop(void)
 qemu_mutex_init(qemu_global_mutex);
 qemu_mutex_lock(qemu_global_mutex);
 
-unblock_io_signals();
 qemu_thread_self(io_thread);
 
 return 0;
@@ -586,19 +647,22 @@ static void kvm_init_ipi(CPUState *env)
 }
 }
 
-static void unblock_io_signals(void)
+static sigset_t block_io_signals(void)
 {
 sigset_t set;
 
+/* SIGUSR2 used by posix-aio-compat.c */
 sigemptyset(set);
 sigaddset(set, SIGUSR2);
-sigaddset(set, SIGIO);
-sigaddset(set, SIGALRM);
 pthread_sigmask(SIG_UNBLOCK, set, NULL);
 
 sigemptyset(set);
+sigaddset(set, SIGIO);
+sigaddset(set, SIGALRM);
 sigaddset(set, SIG_IPI);
 pthread_sigmask(SIG_BLOCK, set, NULL);
+
+return set;
 }
 
 void qemu_mutex_lock_iothread(void)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 1/8] signalfd compatibility

2010-10-06 Thread Marcelo Tosatti
Port qemu-kvm's signalfd compat code.

commit 5a7fdd0abd7cd24dac205317a4195446ab8748b5
Author: Anthony Liguori aligu...@us.ibm.com
Date:   Wed May 7 11:55:47 2008 -0500

Use signalfd() in io-thread

This patch reworks the IO thread to use signalfd() instead of sigtimedwait()
This will eliminate the need to use SIGIO everywhere.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu/compatfd.c
===
--- /dev/null
+++ qemu/compatfd.c
@@ -0,0 +1,117 @@
+/*
+ * signalfd/eventfd compatibility
+ *
+ * Copyright IBM, Corp. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   aligu...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include qemu-common.h
+#include compatfd.h
+
+#include sys/syscall.h
+#include pthread.h
+
+struct sigfd_compat_info
+{
+sigset_t mask;
+int fd;
+};
+
+static void *sigwait_compat(void *opaque)
+{
+struct sigfd_compat_info *info = opaque;
+int err;
+sigset_t all;
+
+sigfillset(all);
+sigprocmask(SIG_BLOCK, all, NULL);
+
+do {
+siginfo_t siginfo;
+
+err = sigwaitinfo(info-mask, siginfo);
+if (err == -1  errno == EINTR) {
+err = 0;
+continue;
+}
+
+if (err  0) {
+char buffer[128];
+size_t offset = 0;
+
+memcpy(buffer, err, sizeof(err));
+while (offset  sizeof(buffer)) {
+ssize_t len;
+
+len = write(info-fd, buffer + offset,
+sizeof(buffer) - offset);
+if (len == -1  errno == EINTR)
+continue;
+
+if (len = 0) {
+err = -1;
+break;
+}
+
+offset += len;
+}
+}
+} while (err = 0);
+
+return NULL;
+}
+
+static int qemu_signalfd_compat(const sigset_t *mask)
+{
+pthread_attr_t attr;
+pthread_t tid;
+struct sigfd_compat_info *info;
+int fds[2];
+
+info = malloc(sizeof(*info));
+if (info == NULL) {
+errno = ENOMEM;
+return -1;
+}
+
+if (pipe(fds) == -1) {
+free(info);
+return -1;
+}
+
+qemu_set_cloexec(fds[0]);
+qemu_set_cloexec(fds[1]);
+
+memcpy(info-mask, mask, sizeof(*mask));
+info-fd = fds[1];
+
+pthread_attr_init(attr);
+pthread_attr_setdetachstate(attr, PTHREAD_CREATE_DETACHED);
+
+pthread_create(tid, attr, sigwait_compat, info);
+
+pthread_attr_destroy(attr);
+
+return fds[0];
+}
+
+int qemu_signalfd(const sigset_t *mask)
+{
+#if defined(CONFIG_SIGNALFD)
+int ret;
+
+ret = syscall(SYS_signalfd, -1, mask, _NSIG / 8);
+if (ret != -1) {
+qemu_set_cloexec(ret);
+return ret;
+}
+#endif
+
+return qemu_signalfd_compat(mask);
+}
Index: qemu/compatfd.h
===
--- /dev/null
+++ qemu/compatfd.h
@@ -0,0 +1,43 @@
+/*
+ * signalfd/eventfd compatibility
+ *
+ * Copyright IBM, Corp. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   aligu...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_COMPATFD_H
+#define QEMU_COMPATFD_H
+
+#include signal.h
+
+struct qemu_signalfd_siginfo {
+uint32_t ssi_signo;   /* Signal number */
+int32_t  ssi_errno;   /* Error number (unused) */
+int32_t  ssi_code;/* Signal code */
+uint32_t ssi_pid; /* PID of sender */
+uint32_t ssi_uid; /* Real UID of sender */
+int32_t  ssi_fd;  /* File descriptor (SIGIO) */
+uint32_t ssi_tid; /* Kernel timer ID (POSIX timers) */
+uint32_t ssi_band;/* Band event (SIGIO) */
+uint32_t ssi_overrun; /* POSIX timer overrun count */
+uint32_t ssi_trapno;  /* Trap number that caused signal */
+int32_t  ssi_status;  /* Exit status or signal (SIGCHLD) */
+int32_t  ssi_int; /* Integer sent by sigqueue(2) */
+uint64_t ssi_ptr; /* Pointer sent by sigqueue(2) */
+uint64_t ssi_utime;   /* User CPU time consumed (SIGCHLD) */
+uint64_t ssi_stime;   /* System CPU time consumed (SIGCHLD) */
+uint64_t ssi_addr;/* Address that generated signal
+ (for hardware-generated signals) */
+uint8_t  pad[48]; /* Pad size to 128 bytes (allow for
+ additional fields in the future) */
+};
+
+int qemu_signalfd(const sigset_t *mask);
+
+#endif
Index: qemu/Makefile.objs
===
--- qemu.orig/Makefile.objs
+++ qemu/Makefile.objs
@@ -121,6 +121,7 @@ common-obj-y += $(addprefix ui/, $(ui-ob
 
 common-obj-y += iov.o acl.o
 common-obj-$(CONFIG_THREAD) += qemu-thread.o
+common-obj-$(CONFIG_IOTHREAD) += compatfd.o
 common-obj-y += 

[patch uq/master 0/8] port qemu-kvm's MCE support (v2)

2010-10-06 Thread Marcelo Tosatti
Port qemu-kvm's KVM MCE (Machine Check Exception) handling to qemu. It
allows qemu to propagate MCEs to the guest.

v2:
- rename do_qemu_ram_addr_from_host.
- fix kvm_on_sigbus/kvm_on_sigbus_vcpu naming.
- fix bank register restoration (Dean Nelson).



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 4/8] kvm: x86: add mce support

2010-10-06 Thread Marcelo Tosatti
Port qemu-kvm's MCE support

commit c68b2374c9048812f488e00ffb95db66c0bc07a7
Author: Huang Ying ying.hu...@intel.com
Date:   Mon Jul 20 10:00:53 2009 +0800

Add MCE simulation support to qemu/kvm

KVM ioctls are used to initialize MCE simulation and inject MCE. The
real MCE simulation is implemented in Linux kernel. The Kernel part
has been merged.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu/target-i386/helper.c
===
--- qemu.orig/target-i386/helper.c
+++ qemu/target-i386/helper.c
@@ -27,6 +27,7 @@
 #include exec-all.h
 #include qemu-common.h
 #include kvm.h
+#include kvm_x86.h
 
 //#define DEBUG_MMU
 
@@ -1030,6 +1031,11 @@ void cpu_inject_x86_mce(CPUState *cenv, 
 if (bank = bank_num || !(status  MCI_STATUS_VAL))
 return;
 
+if (kvm_enabled()) {
+kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
+return;
+}
+
 /*
  * if MSR_MCG_CTL is not all 1s, the uncorrected error
  * reporting is disabled
Index: qemu/target-i386/kvm.c
===
--- qemu.orig/target-i386/kvm.c
+++ qemu/target-i386/kvm.c
@@ -27,6 +27,7 @@
 #include hw/pc.h
 #include hw/apic.h
 #include ioport.h
+#include kvm_x86.h
 
 #ifdef CONFIG_KVM_PARA
 #include linux/kvm_para.h
@@ -167,6 +168,67 @@ static int get_para_features(CPUState *e
 }
 #endif
 
+#ifdef KVM_CAP_MCE
+static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
+ int *max_banks)
+{
+int r;
+
+r = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_MCE);
+if (r  0) {
+*max_banks = r;
+return kvm_ioctl(s, KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap);
+}
+return -ENOSYS;
+}
+
+static int kvm_setup_mce(CPUState *env, uint64_t *mcg_cap)
+{
+return kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, mcg_cap);
+}
+
+static int kvm_set_mce(CPUState *env, struct kvm_x86_mce *m)
+{
+return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, m);
+}
+
+struct kvm_x86_mce_data
+{
+CPUState *env;
+struct kvm_x86_mce *mce;
+};
+
+static void kvm_do_inject_x86_mce(void *_data)
+{
+struct kvm_x86_mce_data *data = _data;
+int r;
+
+r = kvm_set_mce(data-env, data-mce);
+if (r  0)
+perror(kvm_set_mce FAILED);
+}
+#endif
+
+void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+uint64_t mcg_status, uint64_t addr, uint64_t misc)
+{
+#ifdef KVM_CAP_MCE
+struct kvm_x86_mce mce = {
+.bank = bank,
+.status = status,
+.mcg_status = mcg_status,
+.addr = addr,
+.misc = misc,
+};
+struct kvm_x86_mce_data data = {
+.env = cenv,
+.mce = mce,
+};
+
+run_on_cpu(cenv, kvm_do_inject_x86_mce, data);
+#endif
+}
+
 int kvm_arch_init_vcpu(CPUState *env)
 {
 struct {
@@ -274,6 +336,28 @@ int kvm_arch_init_vcpu(CPUState *env)
 
 cpuid_data.cpuid.nent = cpuid_i;
 
+#ifdef KVM_CAP_MCE
+if (((env-cpuid_version  8)0xF) = 6
+ (env-cpuid_features(CPUID_MCE|CPUID_MCA)) == (CPUID_MCE|CPUID_MCA)
+ kvm_check_extension(env-kvm_state, KVM_CAP_MCE)  0) {
+uint64_t mcg_cap;
+int banks;
+
+if (kvm_get_mce_cap_supported(env-kvm_state, mcg_cap, banks))
+perror(kvm_get_mce_cap_supported FAILED);
+else {
+if (banks  MCE_BANKS_DEF)
+banks = MCE_BANKS_DEF;
+mcg_cap = MCE_CAP_DEF;
+mcg_cap |= banks;
+if (kvm_setup_mce(env, mcg_cap))
+perror(kvm_setup_mce FAILED);
+else
+env-mcg_cap = mcg_cap;
+}
+}
+#endif
+
 return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, cpuid_data);
 }
 
Index: qemu/target-i386/kvm_x86.h
===
--- /dev/null
+++ qemu/target-i386/kvm_x86.h
@@ -0,0 +1,21 @@
+/*
+ * QEMU KVM support
+ *
+ * Copyright (C) 2009 Red Hat Inc.
+ * Copyright IBM, Corp. 2008
+ *
+ * Authors:
+ *  Anthony Liguori   aligu...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef __KVM_X86_H__
+#define __KVM_X86_H__
+
+void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+uint64_t mcg_status, uint64_t addr, uint64_t misc);
+
+#endif


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch uq/master 7/8] MCE: Relay UCR MCE to guest

2010-10-06 Thread Marcelo Tosatti
Port qemu-kvm's

commit 4b62fff1101a7ad77553147717a8bd3bf79df7ef
Author: Huang Ying ying.hu...@intel.com
Date:   Mon Sep 21 10:43:25 2009 +0800

MCE: Relay UCR MCE to guest

UCR (uncorrected recovery) MCE is supported in recent Intel CPUs,
where some hardware error such as some memory error can be reported
without PCC (processor context corrupted). To recover from such MCE,
the corresponding memory will be unmapped, and all processes accessing
the memory will be killed via SIGBUS.

For KVM, if QEMU/KVM is killed, all guest processes will be killed
too. So we relay SIGBUS from host OS to guest system via a UCR MCE
injection. Then guest OS can isolate corresponding memory and kill
necessary guest processes only. SIGBUS sent to main thread (not VCPU
threads) will be broadcast to all VCPU threads as UCR MCE.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: qemu/cpus.c
===
--- qemu.orig/cpus.c
+++ qemu/cpus.c
@@ -34,6 +34,10 @@
 
 #include cpus.h
 #include compatfd.h
+#ifdef CONFIG_LINUX
+#include sys/prctl.h
+#include sys/signalfd.h
+#endif
 
 #ifdef SIGRTMIN
 #define SIG_IPI (SIGRTMIN+4)
@@ -41,6 +45,10 @@
 #define SIG_IPI SIGUSR1
 #endif
 
+#ifndef PR_MCE_KILL
+#define PR_MCE_KILL 33
+#endif
+
 static CPUState *next_cpu;
 
 /***/
@@ -498,28 +506,77 @@ static void qemu_tcg_wait_io_event(void)
 }
 }
 
+static void sigbus_reraise(void)
+{
+sigset_t set;
+struct sigaction action;
+
+memset(action, 0, sizeof(action));
+action.sa_handler = SIG_DFL;
+if (!sigaction(SIGBUS, action, NULL)) {
+raise(SIGBUS);
+sigemptyset(set);
+sigaddset(set, SIGBUS);
+sigprocmask(SIG_UNBLOCK, set, NULL);
+}
+perror(Failed to re-raise SIGBUS!\n);
+abort();
+}
+
+static void sigbus_handler(int n, struct qemu_signalfd_siginfo *siginfo,
+   void *ctx)
+{
+#if defined(TARGET_I386)
+if (kvm_on_sigbus(siginfo-ssi_code, (void *)(intptr_t)siginfo-ssi_addr))
+#endif
+sigbus_reraise();
+}
+
 static void qemu_kvm_eat_signal(CPUState *env, int timeout)
 {
 struct timespec ts;
 int r, e;
 siginfo_t siginfo;
 sigset_t waitset;
+sigset_t chkset;
 
 ts.tv_sec = timeout / 1000;
 ts.tv_nsec = (timeout % 1000) * 100;
 
 sigemptyset(waitset);
 sigaddset(waitset, SIG_IPI);
+sigaddset(waitset, SIGBUS);
 
-qemu_mutex_unlock(qemu_global_mutex);
-r = sigtimedwait(waitset, siginfo, ts);
-e = errno;
-qemu_mutex_lock(qemu_global_mutex);
+do {
+qemu_mutex_unlock(qemu_global_mutex);
 
-if (r == -1  !(e == EAGAIN || e == EINTR)) {
-fprintf(stderr, sigtimedwait: %s\n, strerror(e));
-exit(1);
-}
+r = sigtimedwait(waitset, siginfo, ts);
+e = errno;
+
+qemu_mutex_lock(qemu_global_mutex);
+
+if (r == -1  !(e == EAGAIN || e == EINTR)) {
+fprintf(stderr, sigtimedwait: %s\n, strerror(e));
+exit(1);
+}
+
+switch (r) {
+case SIGBUS:
+#ifdef TARGET_I386
+if (kvm_on_sigbus_vcpu(env, siginfo.si_code, siginfo.si_addr))
+#endif
+sigbus_reraise();
+break;
+default:
+break;
+}
+
+r = sigpending(chkset);
+if (r == -1) {
+fprintf(stderr, sigpending: %s\n, strerror(e));
+exit(1);
+}
+} while (sigismember(chkset, SIG_IPI) || sigismember(chkset, SIGBUS));
 }
 
 static void qemu_kvm_wait_io_event(CPUState *env)
@@ -645,6 +702,7 @@ static void kvm_init_ipi(CPUState *env)
 
 pthread_sigmask(SIG_BLOCK, NULL, set);
 sigdelset(set, SIG_IPI);
+sigdelset(set, SIGBUS);
 r = kvm_set_signal_mask(env, set);
 if (r) {
 fprintf(stderr, kvm_set_signal_mask: %s\n, strerror(r));
@@ -655,6 +713,7 @@ static void kvm_init_ipi(CPUState *env)
 static sigset_t block_io_signals(void)
 {
 sigset_t set;
+struct sigaction action;
 
 /* SIGUSR2 used by posix-aio-compat.c */
 sigemptyset(set);
@@ -665,8 +724,15 @@ static sigset_t block_io_signals(void)
 sigaddset(set, SIGIO);
 sigaddset(set, SIGALRM);
 sigaddset(set, SIG_IPI);
+sigaddset(set, SIGBUS);
 pthread_sigmask(SIG_BLOCK, set, NULL);
 
+memset(action, 0, sizeof(action));
+action.sa_flags = SA_SIGINFO;
+action.sa_sigaction = (void (*)(int, siginfo_t*, void*))sigbus_handler;
+sigaction(SIGBUS, action, NULL);
+prctl(PR_MCE_KILL, 1, 1, 0, 0);
+
 return set;
 }
 
Index: qemu/kvm.h
===
--- qemu.orig/kvm.h
+++ qemu/kvm.h
@@ -110,6 +110,9 @@ int kvm_arch_init_vcpu(CPUState *env);
 
 void kvm_arch_reset_vcpu(CPUState *env);
 
+int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr);
+int kvm_on_sigbus(int code, void *addr);
+
 struct 

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-06 Thread Krishna Kumar2
Michael S. Tsirkin m...@redhat.com wrote on 10/05/2010 11:53:23 PM:

   Any idea where does this come from?
   Do you see more TX interrupts? RX interrupts? Exits?
   Do interrupts bounce more between guest CPUs?
   4. Identify reasons for single netperf BW regression.
 
  After testing various combinations of #txqs, #vhosts, #netperf
  sessions, I think the drop for 1 stream is due to TX and RX for
  a flow being processed on different cpus.

 Right. Can we fix it?

I am not sure how to. My initial patch had one thread but gave
small gains and ran into limitations once number of sessions
became large.

   I did two more tests:
  1. Pin vhosts to same CPU:
  - BW drop is much lower for 1 stream case (- 5 to -8% range)
  - But performance is not so high for more sessions.
  2. Changed vhost to be single threaded:
- No degradation for 1 session, and improvement for upto
   8, sometimes 16 streams (5-12%).
- BW degrades after that, all the way till 128 netperf
sessions.
- But overall CPU utilization improves.
  Summary of the entire run (for 1-128 sessions):
  txq=4:  BW: (-2.3)  CPU: (-16.5)RCPU: (-5.3)
  txq=16: BW: (-1.9)  CPU: (-24.9)RCPU: (-9.6)
 
  I don't see any reasons mentioned above.  However, for higher
  number of netperf sessions, I see a big increase in retransmissions:

 Hmm, ok, and do you see any errors?

I haven't seen any in any statistics, messages, etc. Also no
retranmissions for txq=1.

  Single netperf case didn't have any retransmissions so that is not
  the cause for drop.  I tested ixgbe (MQ):
  ___
  #netperf  ixgbe ixgbe (pin intrs to cpu#0 on
 both server/client)
  BW (#retr)  BW (#retr)
  ___
  1   3567 (117)  6000 (251)
  2   4406 (477)  6298 (725)
  4   6119 (1085) 7208 (3387)
  8   6595 (4276) 7381 (15296)
  16  6651 (11651)6856 (30394)

 Interesting.
 You are saying we get much more retransmissions with physical nic as
 well?

Yes, with ixgbe. I re-ran with 16 netperfs running for 15 secs on
both ixgbe and cxgb3 just now to reconfirm:

ixgbe: BW: 6186.85  SD/Remote: 135.711, 339.376  CPU/Remote: 79.99, 200.00,
Retrans: 545
cxgb3: BW: 8051.07  SD/Remote: 144.416, 260.487  CPU/Remote: 110.88,
200.00, Retrans: 0

However 64 netperfs for 30 secs gave:

ixgbe: BW: 6691.12  SD/Remote: 8046.617, 5259.992  CPU/Remote: 1223.86,
799.97, Retrans: 1424
cxgb3: BW: 7799.16  SD/Remote: 2589.875, 4317.013  CPU/Remote: 480.39
800.64, Retrans: 649

# ethtool -i eth4
driver: ixgbe
version: 2.0.84-k2
firmware-version: 0.9-3
bus-info: :1f:00.1

# ifconfig output:
   RX packets:783241 errors:0 dropped:0 overruns:0 frame:0
   TX packets:689533 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000

# lspci output:
1f:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit Network
Connec
tion (rev 01)
Subsystem: Intel Corporation Ethernet Server Adapter X520-2
Flags: bus master, fast devsel, latency 0, IRQ 30
Memory at 9890 (64-bit, prefetchable) [size=512K]
I/O ports at 2020 [size=32]
Memory at 98a0 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
Capabilities: [a0] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Device Serial Number 00-1b-21-ff-ff-40-4a-b4
Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
Kernel driver in use: ixgbe
Kernel modules: ixgbe

  I haven't done this right now since I don't have a setup.  I guess
  it would be limited by wire speed and gains may not be there.  I
  will try to do this later when I get the setup.

 OK but at least need to check that it does not hurt things.

Yes, sure.

  Summary:
 
  1. Average BW increase for regular I/O is best for #txq=16 with the
 least CPU utilization increase.
  2. The average BW for 512 byte I/O is best for lower #txq=2. For higher
 #txqs, BW increased only after a particular #netperf sessions - in
 my testing that limit was 32 netperf sessions.
  3. Multiple txq for guest by itself doesn't seem to have any issues.
 Guest CPU% increase is slightly higher than BW improvement.  I
 think it is true for all mq drivers since more paths run in parallel
 upto the device instead of sleeping and allowing one thread to send
 all packets via qdisc_restart.
  4. Having high number of txqs gives better gains 

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-06 Thread Arnd Bergmann
On Wednesday 06 October 2010 19:14:42 Krishna Kumar2 wrote:
 Arnd Bergmann a...@arndb.de wrote on 10/06/2010 05:49:00 PM:
 
   I don't see any reasons mentioned above.  However, for higher
   number of netperf sessions, I see a big increase in retransmissions:
   ___
   #netperf  ORG   NEW
   BW (#retr)BW (#retr)
   ___
   1  70244 (0) 64102 (0)
   4  21421 (0) 36570 (416)
   8  21746 (0) 38604 (148)
   16 21783 (0) 40632 (464)
   32 22677 (0) 37163 (1053)
   64 23648 (4) 36449 (2197)
   12823251 (2) 31676 (3185)
   ___
 
 
  This smells like it could be related to a problem that Ben Greear found
  recently (see macvlan:  Enable qdisc backoff logic). When the hardware
  is busy, used to just drop the packet. With Ben's patch, we return
 -EAGAIN
  to qemu (or vhost-net) to trigger a resend.
 
  I suppose what we really should do is feed that condition back to the
  guest network stack and implement the backoff in there.
 
 Thanks for the pointer. I will take a look at this as I hadn't seen
 this patch earlier. Is there any way to figure out if this is the
 issue?

I think a good indication would be if this changes with/without the
patch, and if you see -EAGAIN in qemu with the patch applied.

Arnd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest

2010-10-06 Thread Dean Nelson

On 10/06/2010 11:05 AM, Marcelo Tosatti wrote:

On Wed, Oct 06, 2010 at 10:58:36AM +0900, Hidetoshi Seto wrote:

I got some more question:

(2010/10/05 3:54), Marcelo Tosatti wrote:

Index: qemu/target-i386/cpu.h
===
--- qemu.orig/target-i386/cpu.h
+++ qemu/target-i386/cpu.h
@@ -250,16 +250,32 @@
  #define PG_ERROR_RSVD_MASK 0x08
  #define PG_ERROR_I_D_MASK  0x10

-#define MCG_CTL_P  (1UL8)   /* MCG_CAP register available */
+#define MCG_CTL_P  (1ULL8)   /* MCG_CAP register available */
+#define MCG_SER_P  (1ULL24) /* MCA recovery/new status bits */

-#define MCE_CAP_DEFMCG_CTL_P
+#define MCE_CAP_DEF(MCG_CTL_P|MCG_SER_P)
  #define MCE_BANKS_DEF 10



It seems that current kvm doesn't support SER_P, so injecting SRAO
to guest will mean that guest receives VAL|UC|!PCC and RIPV event
from virtual processor that doesn't have SER_P.


Dean also noted this. I don't think it was deliberate choice to not
expose SER_P. Huang?


In my testing, I found that MCG_SER_P was not being set (and I was
running on a Nehalem-EX system). Injecting a MCE resulted in the
guest entering into panic() from mce_panic(). If crash_kexec()
finds a kexec_crash_image the system ends up rebooting, otherwise,
what happens next requires operator intervention.

When I applied a patch to the guest's kernel which forces mce_ser to be
set, as if MCG_SER_P was set (see __mcheck_cpu_cap_init()), I found
that when the memory page was 'owned' by a guest process, the process
would be killed (if the page was dirty), and the guest would stay
running. The HWPoisoned page would be sidelined and not cause any more
issues.


I think most OSes don't expect that it can receives MCE with !PCC
on traditional x86 processor without SER_P.

Q1: Is it safe to expect that guests can handle such !PCC event?


This might be best answered by Huang, but as I mentioned above, without
MCG_SER_P being set, the result was an orderly system panic on the
guest.


Q2: What is the expected behavior on the guest?


I think I answered this above.


Q3: What happen if guest reboots itself in response to the MCE?


That depends...

And the following issue also holds for a guest that is rebooted at
some point having successfully sidelined the bad page.

After the guest has panic'd, a system_reset of the guest or a restart
initiated by crash_kexec() (called by panic() on the guest), usually
results in the guest hanging because the bad page still belongs
to qemu-kvm and is now being referenced by the new guest in some way.
(It actually may not hang, but successfully reboot and be runnable,
with the bad page lurking in the background. It all seems to depend on
where the bad page ends up, and whether it's ever referenced.)

I believe there was an attempt to deal with this in kvm on the host.
See kvm_handle_bad_page(). This function was suppose to result in the
sending of a BUS_MCEERR_AR flavored SIGBUS by do_sigbus() to qemu-kvm
which in theory would result in the right thing happening. But commit
96054569190bdec375fe824e48ca1f4e3b53dd36 prevents the signal from being
sent. So this mechanism needs to be re-worked, and the issue remains.

I would think that if the the bad page can't be sidelined, such that
the newly booting guest can't use it, then the new guest shouldn't be
allowed to boot. But perhaps there is some merit in letting it try to
boot and see if one gets 'lucky'.

I understand that Huang is looking into what should be done. He can
give you better information than I in answer to your questions.

Dean
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] Add svm cpuid features

2010-10-06 Thread Marcelo Tosatti
On Tue, Sep 28, 2010 at 12:05:20PM +0200, Roedel, Joerg wrote:
 On Tue, Sep 28, 2010 at 05:37:58AM -0400, Avi Kivity wrote:
On 09/28/2010 11:28 AM, Roedel, Joerg wrote:
 
   Weird, it worked here as I tested it. I had it on qemu/master and with
   all three patches. But patch 1 should not make the difference. I take a
   look, have you pushed the failing uq/master?
  
  Yes, 8fe6a21c76.
  
   What was your command line?
  
  qemu-system-x86_64 -m 2G -cpu kvm64,+svm,+npt -enable-kvm ...
  
  Note this is qemu.git, so -enable-kvm is needed.
 
 Ok, I apparently forgot to force the CPUID xlevel to be 0x800A when
 SVM is enabled, probably because I only tested CPUID models where xlevel
 already defaults to 0x800A. Attached is a fix, thanks for catching
 this.
 
   Joerg

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Make kvm64 the default cpu model when kvm_enabled()

2010-10-06 Thread Marcelo Tosatti
On Mon, Sep 27, 2010 at 03:16:15PM +0200, Joerg Roedel wrote:
 As requested by Alex this patch makes kvm64 the default CPU
 model when qemu is started with -enable-kvm. This takes only
 effect for qemu-versions newer or equal to 0.14.0.
 
 Signed-off-by: Joerg Roedel joerg.roe...@amd.com
 ---
  hw/boards.h|1 +
  hw/pc.c|   21 -
  hw/pc_piix.c   |6 ++
  qemu-version.h |   35 +++
  vl.c   |4 
  5 files changed, 62 insertions(+), 5 deletions(-)
  create mode 100644 qemu-version.h
 
 diff --git a/hw/boards.h b/hw/boards.h
 index 6f0f0d7..2d41b2d 100644
 --- a/hw/boards.h
 +++ b/hw/boards.h
 @@ -19,6 +19,7 @@ typedef struct QEMUMachine {
  QEMUMachineInitFunc *init;
  int use_scsi;
  int max_cpus;
 +unsigned int compat_version;
  unsigned int no_serial:1,
  no_parallel:1,
  use_virtcon:1,
 diff --git a/hw/pc.c b/hw/pc.c
 index 69b13bf..372ec4c 100644
 --- a/hw/pc.c
 +++ b/hw/pc.c
 @@ -40,6 +40,16 @@
  #include sysbus.h
  #include sysemu.h
  #include blockdev.h
 +#include kvm.h
 +#include qemu-version.h
 +
 +#ifdef TARGET_X86_64
 +#define DEFAULT_KVM_CPU_MODEL kvm64
 +#define DEFAULT_QEMU_CPU_MODEL qemu64
 +#else
 +#define DEFAULT_KVM_CPU_MODEL kvm32
 +#define DEFAULT_QEMU_CPU_MODEL qemu32
 +#endif
  
  /* output Bochs bios info messages */
  //#define DEBUG_BIOS
 @@ -867,11 +877,12 @@ void pc_cpus_init(const char *cpu_model)
  
  /* init CPUs */
  if (cpu_model == NULL) {
 -#ifdef TARGET_X86_64
 -cpu_model = qemu64;
 -#else
 -cpu_model = qemu32;
 -#endif
 +if (kvm_enabled() 
 +qemu_compat_version = QEMU_COMPAT_VERSION(0, 14, 0)) {
 +cpu_model = DEFAULT_KVM_CPU_MODEL;
 +} else {
 +cpu_model = DEFAULT_QEMU_CPU_MODEL;
 +}
  }
  
  for(i = 0; i  smp_cpus; i++) {
 diff --git a/hw/pc_piix.c b/hw/pc_piix.c
 index 12359a7..9e46b71 100644
 --- a/hw/pc_piix.c
 +++ b/hw/pc_piix.c
 @@ -35,6 +35,7 @@
  #include sysemu.h
  #include sysbus.h
  #include blockdev.h
 +#include qemu-version.h
  
  #define MAX_IDE_BUS 2
  
 @@ -217,6 +218,7 @@ static QEMUMachine pc_machine = {
  .desc = Standard PC,
  .init = pc_init_pci,
  .max_cpus = 255,
 +.compat_version = QEMU_COMPAT_VERSION(0, 13, 0),
  .is_default = 1,
  };
  
 @@ -225,6 +227,7 @@ static QEMUMachine pc_machine_v0_12 = {
  .desc = Standard PC,
  .init = pc_init_pci,
  .max_cpus = 255,
 +.compat_version = QEMU_COMPAT_VERSION(0, 12, 0),
  .compat_props = (GlobalProperty[]) {
  {
  .driver   = virtio-serial-pci,
 @@ -244,6 +247,7 @@ static QEMUMachine pc_machine_v0_11 = {
  .desc = Standard PC, qemu 0.11,
  .init = pc_init_pci,
  .max_cpus = 255,
 +.compat_version = QEMU_COMPAT_VERSION(0, 11, 0),
  .compat_props = (GlobalProperty[]) {
  {
  .driver   = virtio-blk-pci,
 @@ -279,6 +283,7 @@ static QEMUMachine pc_machine_v0_10 = {
  .desc = Standard PC, qemu 0.10,
  .init = pc_init_pci,
  .max_cpus = 255,
 +.compat_version = QEMU_COMPAT_VERSION(0, 10, 0),
  .compat_props = (GlobalProperty[]) {
  {
  .driver   = virtio-blk-pci,
 @@ -325,6 +330,7 @@ static QEMUMachine isapc_machine = {
  .name = isapc,
  .desc = ISA-only PC,
  .init = pc_init_isa,
 +.compat_version = QEMU_COMPAT_VERSION(0, 10, 0),
  .max_cpus = 1,
  };
  
 diff --git a/qemu-version.h b/qemu-version.h
 new file mode 100644
 index 000..b4bfe48
 --- /dev/null
 +++ b/qemu-version.h
 @@ -0,0 +1,35 @@
 +/*
 + * qemu-version.h
 + *
 + * Defines needed for handling QEMU version compatibility
 + *
 + * Copyright (c) 2010 Joerg Roedel joerg.roe...@amd.com
 + *
 + * Permission is hereby granted, free of charge, to any person obtaining a 
 copy
 + * of this software and associated documentation files (the Software), to 
 deal
 + * in the Software without restriction, including without limitation the 
 rights
 + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 + * copies of the Software, and to permit persons to whom the Software is
 + * furnished to do so, subject to the following conditions:
 + *
 + * The above copyright notice and this permission notice shall be included in
 + * all copies or substantial portions of the Software.
 + *
 + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
 FROM,
 + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 + * THE SOFTWARE.
 + */
 +
 +#ifndef _QEMU_VERSION_H_
 +#define _QEMU_VERSION_H_
 +
 +extern unsigned int 

Re: [v2 RFC PATCH 0/4] Implement multiqueue virtio-net

2010-10-06 Thread Michael S. Tsirkin
On Wed, Oct 06, 2010 at 11:13:31PM +0530, Krishna Kumar2 wrote:
 Michael S. Tsirkin m...@redhat.com wrote on 10/05/2010 11:53:23 PM:
 
Any idea where does this come from?
Do you see more TX interrupts? RX interrupts? Exits?
Do interrupts bounce more between guest CPUs?
4. Identify reasons for single netperf BW regression.
  
   After testing various combinations of #txqs, #vhosts, #netperf
   sessions, I think the drop for 1 stream is due to TX and RX for
   a flow being processed on different cpus.
 
  Right. Can we fix it?
 
 I am not sure how to. My initial patch had one thread but gave
 small gains and ran into limitations once number of sessions
 became large.

Sure. We will need multiple RX queues, and have a single
thread handle a TX and RX pair. Then we need to make sure packets
from a given flow on TX land on the same thread on RX.
As flows can be hashed differently, for this to work we'll have to
expose this info in host/guest interface.
But since multiqueue implies host/guest ABI changes anyway,
this point is moot.

BTW, an interesting approach could be using bonding
and multiple virtio-net interfaces.
What are the disadvantages of such a setup?  One advantage
is it can be made to work in existing guests.

I did two more tests:
   1. Pin vhosts to same CPU:
   - BW drop is much lower for 1 stream case (- 5 to -8% range)
   - But performance is not so high for more sessions.
   2. Changed vhost to be single threaded:
 - No degradation for 1 session, and improvement for upto
8, sometimes 16 streams (5-12%).
 - BW degrades after that, all the way till 128 netperf
 sessions.
 - But overall CPU utilization improves.
   Summary of the entire run (for 1-128 sessions):
   txq=4:  BW: (-2.3)  CPU: (-16.5)RCPU: (-5.3)
   txq=16: BW: (-1.9)  CPU: (-24.9)RCPU: (-9.6)
  
   I don't see any reasons mentioned above.  However, for higher
   number of netperf sessions, I see a big increase in retransmissions:
 
  Hmm, ok, and do you see any errors?
 
 I haven't seen any in any statistics, messages, etc.

Herbert, could you help out debugging this increase in retransmissions
please?  Older mail on netdev in this thread has some numbers that seem
to imply that we start hitting retransmissions much more as # of flows
goes up.

 Also no
 retranmissions for txq=1.

While it's nice that we have this parameter, the need to choose between
single stream and multi stream performance when you start the vm makes
this patch much less interesting IMHO.


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] Make kvm64 the default cpu model when kvm_enabled()

2010-10-06 Thread Anthony Liguori

On 10/06/2010 01:53 PM, Marcelo Tosatti wrote:

On Mon, Sep 27, 2010 at 03:16:15PM +0200, Joerg Roedel wrote:
   

As requested by Alex this patch makes kvm64 the default CPU
model when qemu is started with -enable-kvm. This takes only
effect for qemu-versions newer or equal to 0.14.0.

Signed-off-by: Joerg Roedeljoerg.roe...@amd.com
---
  hw/boards.h|1 +
  hw/pc.c|   21 -
  hw/pc_piix.c   |6 ++
  qemu-version.h |   35 +++
  vl.c   |4 
  5 files changed, 62 insertions(+), 5 deletions(-)
  create mode 100644 qemu-version.h

diff --git a/hw/boards.h b/hw/boards.h
index 6f0f0d7..2d41b2d 100644
--- a/hw/boards.h
+++ b/hw/boards.h
@@ -19,6 +19,7 @@ typedef struct QEMUMachine {
  QEMUMachineInitFunc *init;
  int use_scsi;
  int max_cpus;
+unsigned int compat_version;
  unsigned int no_serial:1,
  no_parallel:1,
  use_virtcon:1,
diff --git a/hw/pc.c b/hw/pc.c
index 69b13bf..372ec4c 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -40,6 +40,16 @@
  #include sysbus.h
  #include sysemu.h
  #include blockdev.h
+#include kvm.h
+#include qemu-version.h
+
+#ifdef TARGET_X86_64
+#define DEFAULT_KVM_CPU_MODEL kvm64
+#define DEFAULT_QEMU_CPU_MODEL qemu64
+#else
+#define DEFAULT_KVM_CPU_MODEL kvm32
+#define DEFAULT_QEMU_CPU_MODEL qemu32
+#endif

  /* output Bochs bios info messages */
  //#define DEBUG_BIOS
@@ -867,11 +877,12 @@ void pc_cpus_init(const char *cpu_model)

  /* init CPUs */
  if (cpu_model == NULL) {
-#ifdef TARGET_X86_64
-cpu_model = qemu64;
-#else
-cpu_model = qemu32;
-#endif
+if (kvm_enabled()
+qemu_compat_version= QEMU_COMPAT_VERSION(0, 14, 0)) {
+cpu_model = DEFAULT_KVM_CPU_MODEL;
+} else {
+cpu_model = DEFAULT_QEMU_CPU_MODEL;
+}
  }

  for(i = 0; i  smp_cpus; i++) {
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 12359a7..9e46b71 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -35,6 +35,7 @@
  #include sysemu.h
  #include sysbus.h
  #include blockdev.h
+#include qemu-version.h

  #define MAX_IDE_BUS 2

@@ -217,6 +218,7 @@ static QEMUMachine pc_machine = {
  .desc = Standard PC,
  .init = pc_init_pci,
  .max_cpus = 255,
+.compat_version = QEMU_COMPAT_VERSION(0, 13, 0),
  .is_default = 1,
  };

@@ -225,6 +227,7 @@ static QEMUMachine pc_machine_v0_12 = {
  .desc = Standard PC,
  .init = pc_init_pci,
  .max_cpus = 255,
+.compat_version = QEMU_COMPAT_VERSION(0, 12, 0),
  .compat_props = (GlobalProperty[]) {
  {
  .driver   = virtio-serial-pci,
@@ -244,6 +247,7 @@ static QEMUMachine pc_machine_v0_11 = {
  .desc = Standard PC, qemu 0.11,
  .init = pc_init_pci,
  .max_cpus = 255,
+.compat_version = QEMU_COMPAT_VERSION(0, 11, 0),
  .compat_props = (GlobalProperty[]) {
  {
  .driver   = virtio-blk-pci,
@@ -279,6 +283,7 @@ static QEMUMachine pc_machine_v0_10 = {
  .desc = Standard PC, qemu 0.10,
  .init = pc_init_pci,
  .max_cpus = 255,
+.compat_version = QEMU_COMPAT_VERSION(0, 10, 0),
  .compat_props = (GlobalProperty[]) {
  {
  .driver   = virtio-blk-pci,
@@ -325,6 +330,7 @@ static QEMUMachine isapc_machine = {
  .name = isapc,
  .desc = ISA-only PC,
  .init = pc_init_isa,
+.compat_version = QEMU_COMPAT_VERSION(0, 10, 0),
  .max_cpus = 1,
  };

diff --git a/qemu-version.h b/qemu-version.h
new file mode 100644
index 000..b4bfe48
--- /dev/null
+++ b/qemu-version.h
@@ -0,0 +1,35 @@
+/*
+ * qemu-version.h
+ *
+ * Defines needed for handling QEMU version compatibility
+ *
+ * Copyright (c) 2010 Joerg Roedeljoerg.roe...@amd.com
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the Software), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef _QEMU_VERSION_H_
+#define _QEMU_VERSION_H_
+
+extern unsigned int qemu_compat_version;
+
+#define QEMU_COMPAT_VERSION(major, minor, 

Re: [patch uq/master 4/8] kvm: x86: add mce support

2010-10-06 Thread Anthony Liguori

On 10/06/2010 12:34 PM, Marcelo Tosatti wrote:

Port qemu-kvm's MCE support

commit c68b2374c9048812f488e00ffb95db66c0bc07a7
Author: Huang Yingying.hu...@intel.com
Date:   Mon Jul 20 10:00:53 2009 +0800

 Add MCE simulation support to qemu/kvm

 KVM ioctls are used to initialize MCE simulation and inject MCE. The
 real MCE simulation is implemented in Linux kernel. The Kernel part
 has been merged.

Signed-off-by: Marcelo Tosattimtosa...@redhat.com

Index: qemu/target-i386/helper.c
===
--- qemu.orig/target-i386/helper.c
+++ qemu/target-i386/helper.c
@@ -27,6 +27,7 @@
  #include exec-all.h
  #include qemu-common.h
  #include kvm.h
+#include kvm_x86.h

  //#define DEBUG_MMU

@@ -1030,6 +1031,11 @@ void cpu_inject_x86_mce(CPUState *cenv,
  if (bank= bank_num || !(status  MCI_STATUS_VAL))
  return;

+if (kvm_enabled()) {
+kvm_inject_x86_mce(cenv, bank, status, mcg_status, addr, misc);
+return;
+}
+
  /*
   * if MSR_MCG_CTL is not all 1s, the uncorrected error
   * reporting is disabled
Index: qemu/target-i386/kvm.c
===
--- qemu.orig/target-i386/kvm.c
+++ qemu/target-i386/kvm.c
@@ -27,6 +27,7 @@
  #include hw/pc.h
  #include hw/apic.h
  #include ioport.h
+#include kvm_x86.h

  #ifdef CONFIG_KVM_PARA
  #includelinux/kvm_para.h
@@ -167,6 +168,67 @@ static int get_para_features(CPUState *e
  }
  #endif

+#ifdef KVM_CAP_MCE
+static int kvm_get_mce_cap_supported(KVMState *s, uint64_t *mce_cap,
+ int *max_banks)
+{
+int r;
+
+r = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_MCE);
+if (r  0) {
+*max_banks = r;
+return kvm_ioctl(s, KVM_X86_GET_MCE_CAP_SUPPORTED, mce_cap);
+}
+return -ENOSYS;
+}
+
+static int kvm_setup_mce(CPUState *env, uint64_t *mcg_cap)
+{
+return kvm_vcpu_ioctl(env, KVM_X86_SETUP_MCE, mcg_cap);
+}
+
+static int kvm_set_mce(CPUState *env, struct kvm_x86_mce *m)
+{
+return kvm_vcpu_ioctl(env, KVM_X86_SET_MCE, m);
+}
+
+struct kvm_x86_mce_data
+{
+CPUState *env;
+struct kvm_x86_mce *mce;
+};
+
+static void kvm_do_inject_x86_mce(void *_data)
+{
+struct kvm_x86_mce_data *data = _data;
+int r;
+
+r = kvm_set_mce(data-env, data-mce);
+if (r  0)
+perror(kvm_set_mce FAILED);
+}
+#endif
+
+void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+uint64_t mcg_status, uint64_t addr, uint64_t misc)
+{
+#ifdef KVM_CAP_MCE
+struct kvm_x86_mce mce = {
+.bank = bank,
+.status = status,
+.mcg_status = mcg_status,
+.addr = addr,
+.misc = misc,
+};
+struct kvm_x86_mce_data data = {
+.env = cenv,
+.mce =mce,
+};
+
+run_on_cpu(cenv, kvm_do_inject_x86_mce,data);
+#endif
+}
+
  int kvm_arch_init_vcpu(CPUState *env)
  {
  struct {
@@ -274,6 +336,28 @@ int kvm_arch_init_vcpu(CPUState *env)

  cpuid_data.cpuid.nent = cpuid_i;

+#ifdef KVM_CAP_MCE
+if (((env-cpuid_version  8)0xF)= 6
+  (env-cpuid_features(CPUID_MCE|CPUID_MCA)) == (CPUID_MCE|CPUID_MCA)
+  kvm_check_extension(env-kvm_state, KVM_CAP_MCE)  0) {
+uint64_t mcg_cap;
+int banks;
+
+if (kvm_get_mce_cap_supported(env-kvm_state,mcg_cap,banks))
+perror(kvm_get_mce_cap_supported FAILED);
+else {
+if (banks  MCE_BANKS_DEF)
+banks = MCE_BANKS_DEF;
+mcg_cap= MCE_CAP_DEF;
+mcg_cap |= banks;
+if (kvm_setup_mce(env,mcg_cap))
+perror(kvm_setup_mce FAILED);
+else
+env-mcg_cap = mcg_cap;
+}
+}
+#endif
+
  return kvm_vcpu_ioctl(env, KVM_SET_CPUID2,cpuid_data);
  }

Index: qemu/target-i386/kvm_x86.h
===
--- /dev/null
+++ qemu/target-i386/kvm_x86.h
@@ -0,0 +1,21 @@
+/*
+ * QEMU KVM support
+ *
+ * Copyright (C) 2009 Red Hat Inc.
+ * Copyright IBM, Corp. 2008
+ *
+ * Authors:
+ *  Anthony Liguorialigu...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
   


BTW, I'm fairly sure I didn't write any of this code so this copyright 
statement is probably bogus.


Regards,

Anthony Liguori


+
+#ifndef __KVM_X86_H__
+#define __KVM_X86_H__
+
+void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
+uint64_t mcg_status, uint64_t addr, uint64_t misc);
+
+#endif


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
   


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Re: [PATCH v6 07/12] Add async PF initialization to PV guest.

2010-10-06 Thread Gleb Natapov
On Wed, Oct 06, 2010 at 11:45:12AM -0300, Marcelo Tosatti wrote:
 On Wed, Oct 06, 2010 at 12:55:04PM +0200, Gleb Natapov wrote:
  On Tue, Oct 05, 2010 at 03:25:54PM -0300, Marcelo Tosatti wrote:
   On Mon, Oct 04, 2010 at 05:56:29PM +0200, Gleb Natapov wrote:
Enable async PF in a guest if async PF capability is discovered.

Signed-off-by: Gleb Natapov g...@redhat.com
---
 Documentation/kernel-parameters.txt |3 +
 arch/x86/include/asm/kvm_para.h |5 ++
 arch/x86/kernel/kvm.c   |   92 
+++
 3 files changed, 100 insertions(+), 0 deletions(-)

   
+static int __cpuinit kvm_cpu_notify(struct notifier_block *self,
+   unsigned long action, void *hcpu)
+{
+   int cpu = (unsigned long)hcpu;
+   switch (action) {
+   case CPU_ONLINE:
+   case CPU_DOWN_FAILED:
+   case CPU_ONLINE_FROZEN:
+   smp_call_function_single(cpu, kvm_guest_cpu_notify, 
NULL, 0);
   
   wait parameter should probably be 1.
  Why should we wait for it? FWIW I copied this from somewhere (May be
  arch/x86/pci/amd_bus.c).
 
 So that you know its executed in a defined point in cpu bringup.
 
If I read code correctly CPU we are notified about is already running when
callback is called, so I do not see what waiting for IPI to be processed will
accomplish here. With many cpus we will make boot a little bit slower. I don't
care too much though, so if you still think that 1 is required here I'll make
it so. 

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 04/12] Add memory slot versioning and use it to provide fast guest write interface

2010-10-06 Thread Gleb Natapov
On Wed, Oct 06, 2010 at 11:38:47AM -0300, Marcelo Tosatti wrote:
 On Wed, Oct 06, 2010 at 01:14:17PM +0200, Gleb Natapov wrote:
+int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache 
*ghc,
+ gpa_t gpa)
+{
+   struct kvm_memslots *slots = kvm_memslots(kvm);
+   int offset = offset_in_page(gpa);
+   gfn_t gfn = gpa  PAGE_SHIFT;
+
+   ghc-gpa = gpa;
+   ghc-generation = slots-generation;
 
 kvm-memslots can change here.
 
+   ghc-memslot = gfn_to_memslot(kvm, gfn);
+   ghc-hva = gfn_to_hva(kvm, gfn);
 
 And if so, gfn_to_memslot / gfn_to_hva will use new memslots pointer.
 
 Should dereference all values from one copy of kvm-memslots pointer.
  
Ah, I see now. Thanks! Will fix.

+   if (!kvm_is_error_hva(ghc-hva))
+   ghc-hva += offset;
+   else
+   return -EFAULT;
+
+   return 0;
+}
   
   Should use a unique kvm_memslots structure for the cache entry, since it
   can change in between (use gfn_to_hva_memslot, etc on slots pointer).
   
  I do not understand what do you mean here. kvm_memslots structure itself
  is not cached only various translation that use it are cached. Translation
  result are never used if kvm_memslots was changed.
 
   Also should zap any cached entries on overflow, otherwise malicious
   userspace could make use of stale slots:
   
  There is only one cached entry at each given time. User who wants to
  write into guest memory often defines gfn_to_hva_cache variable
  somewhere. Init it with kvm_gfn_to_hva_cache_init() and then calls
  kvm_write_guest_cached() on it. If there was no slot changes in between
  cached translation are used. Otherwise cache is recalculated.
 
 Malicious userspace can cause entry to be cached, ioctl
 SET_USER_MEMORY_REGION 2^32 times, generation number will match,
 mark_page_dirty_in_slot will be called with pointer to freed memory.
 
Hmm. To zap all cached entires on overflow we need to track them. If we
will track then we can zap them on each slot update and drop generation
entirely.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL net-next-2.6] vhost-net patchset for 2.6.37

2010-10-06 Thread David Miller
From: Michael S. Tsirkin m...@redhat.com
Date: Tue, 5 Oct 2010 20:27:32 +0200

 It looks like it was a quiet cycle for vhost-net:
 probably because most of energy was spent on bugfixes
 that went in for 2.6.36.
 People are working on multiqueue, tracing but I'm not
 sure it'll get done in time for 2.6.37 - so here's
 a tree with a single patch that helps windows guests
 which we definitely want in the next kernel.
 
 Please merge for 2.6.37.

Pulled, thanks Michael.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] device-assignment: Re-work PCI option ROM support

2010-10-06 Thread Marcelo Tosatti
On Mon, Oct 04, 2010 at 03:26:18PM -0600, Alex Williamson wrote:
 This cleans up device assignment option ROM support and allows
 us to use romfile and rombar default PCI options.  Thanks,
 
 Alex
 
 ---
 
 Alex Williamson (2):
   device-assignment: Allow PCI to manage the option ROM
   PCI: Export pci_map_option_rom()
 
 
  hw/device-assignment.c |  155 
 +---
  hw/device-assignment.h |4 +
  hw/pci.c   |2 -
  hw/pci.h   |3 +
  4 files changed, 75 insertions(+), 89 deletions(-)

Applied, thanks.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/6] Save state error handling (kill off no_migrate)

2010-10-06 Thread Alex Williamson
Our code paths for saving or migrating a VM are full of functions that
return void, leaving no opportunity for a device to cancel a migration,
either from error or incompatibility.  The ivshmem driver attempted to
solve this with a no_migrate flag on the save state entry.  I think the
more generic and flexible way to solve this is to allow driver save
functions to fail.  This series implements that and converts ivshmem
to uses a set_params function to NAK migration much earlier in the
processes.  This touches a lot of files, but bulk of those changes are
simply s/void/int/ and tacking a return 0 to the end of functions.
Thanks,

Alex

---

Alex Williamson (6):
  savevm: Remove register_device_unmigratable()
  savevm: Allow set_params and save_live_state to error
  virtio: Allow virtio_save() errors
  pci: Allow pci_device_save() to return error
  savevm: Allow vmsd-pre_save to return error
  savevm: Allow SaveStateHandler() to return error


 block-migration.c   |4 +-
 hw/adb.c|8 +++-
 hw/ads7846.c|4 +-
 hw/arm_gic.c|4 +-
 hw/arm_timer.c  |6 ++-
 hw/armv7m_nvic.c|4 +-
 hw/cuda.c   |4 +-
 hw/fdc.c|3 +
 hw/g364fb.c |4 +-
 hw/grackle_pci.c|4 +-
 hw/gt64xxx.c|4 +-
 hw/heathrow_pic.c   |4 +-
 hw/hpet.c   |3 +
 hw/hw.h |   12 ++
 hw/i2c.c|3 +
 hw/ide/core.c   |4 +-
 hw/ivshmem.c|   30 +++
 hw/lsi53c895a.c |4 +-
 hw/m48t59.c |4 +-
 hw/mac_dbdma.c  |4 +-
 hw/mac_nvram.c  |4 +-
 hw/max111x.c|4 +-
 hw/mipsnet.c|4 +-
 hw/mst_fpga.c   |3 +
 hw/nand.c   |3 +
 hw/openpic.c|4 +-
 hw/pci.c|9 +++-
 hw/pci.h|2 -
 hw/piix4.c  |4 +-
 hw/pl011.c  |4 +-
 hw/pl022.c  |4 +-
 hw/pl061.c  |4 +-
 hw/ppc4xx_pci.c |   11 -
 hw/ppce500_pci.c|   11 -
 hw/pxa2xx.c |   28 ++
 hw/pxa2xx_dma.c |4 +-
 hw/pxa2xx_gpio.c|4 +-
 hw/pxa2xx_keypad.c  |3 +
 hw/pxa2xx_lcd.c |4 +-
 hw/pxa2xx_mmci.c|4 +-
 hw/pxa2xx_pic.c |4 +-
 hw/pxa2xx_timer.c   |4 +-
 hw/rc4030.c |4 +-
 hw/rtl8139.c|4 +-
 hw/serial.c |3 +
 hw/spitz.c  |   14 +--
 hw/ssd0323.c|4 +-
 hw/ssi-sd.c |4 +-
 hw/stellaris.c  |   20 +++---
 hw/stellaris_enet.c |4 +-
 hw/stellaris_input.c|4 +-
 hw/syborg_fb.c  |4 +-
 hw/syborg_interrupt.c   |3 +
 hw/syborg_keyboard.c|3 +
 hw/syborg_pointer.c |3 +
 hw/syborg_rtc.c |4 +-
 hw/syborg_serial.c  |4 +-
 hw/syborg_timer.c   |4 +-
 hw/tsc2005.c|4 +-
 hw/tsc210x.c|4 +-
 hw/twl92230.c   |3 +
 hw/unin_pci.c   |4 +-
 hw/usb-uhci.c   |3 +
 hw/virtio-balloon.c |9 +++-
 hw/virtio-blk.c |   10 -
 hw/virtio-net.c |   11 -
 hw/virtio-pci.c |   10 -
 hw/virtio-serial-bus.c  |   10 -
 hw/virtio.c |   14 +--
 hw/virtio.h |4 +-
 hw/wm8750.c |3 +
 hw/zaurus.c |4 +-
 qemu-common.h   |2 -
 savevm.c|   88 +++
 slirp/slirp.c   |6 ++-
 target-arm/machine.c|3 +
 target-cris/machine.c   |3 +
 target-i386/machine.c   |7 ++-
 target-microblaze/machine.c |3 +
 target-mips/machine.c   |3 +
 target-ppc/machine.c|3 +
 target-s390x/machine.c  |3 +
 target-sparc/machine.c  |3 +
 83 files changed, 365 insertions(+), 181 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/6] savevm: Allow SaveStateHandler() to return error

2010-10-06 Thread Alex Williamson
Some devices may not always able to save their state, allow
the save handler to return an error.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/adb.c|8 ++--
 hw/ads7846.c|4 +++-
 hw/arm_gic.c|4 +++-
 hw/arm_timer.c  |6 --
 hw/armv7m_nvic.c|4 +++-
 hw/cuda.c   |4 +++-
 hw/g364fb.c |4 +++-
 hw/grackle_pci.c|4 +++-
 hw/gt64xxx.c|3 ++-
 hw/heathrow_pic.c   |4 +++-
 hw/hw.h |2 +-
 hw/ivshmem.c|3 ++-
 hw/m48t59.c |4 +++-
 hw/mac_dbdma.c  |4 +++-
 hw/mac_nvram.c  |4 +++-
 hw/max111x.c|4 +++-
 hw/mipsnet.c|4 +++-
 hw/mst_fpga.c   |3 ++-
 hw/nand.c   |3 ++-
 hw/openpic.c|4 +++-
 hw/piix4.c  |3 ++-
 hw/pl011.c  |4 +++-
 hw/pl022.c  |4 +++-
 hw/pl061.c  |4 +++-
 hw/ppc4xx_pci.c |4 +++-
 hw/ppce500_pci.c|4 +++-
 hw/pxa2xx.c |   28 +---
 hw/pxa2xx_dma.c |4 +++-
 hw/pxa2xx_gpio.c|4 +++-
 hw/pxa2xx_keypad.c  |3 ++-
 hw/pxa2xx_lcd.c |4 +++-
 hw/pxa2xx_mmci.c|4 +++-
 hw/pxa2xx_pic.c |4 +++-
 hw/pxa2xx_timer.c   |4 +++-
 hw/rc4030.c |4 +++-
 hw/spitz.c  |   14 ++
 hw/ssd0323.c|4 +++-
 hw/ssi-sd.c |4 +++-
 hw/stellaris.c  |   20 +++-
 hw/stellaris_enet.c |4 +++-
 hw/stellaris_input.c|4 +++-
 hw/syborg_fb.c  |4 +++-
 hw/syborg_interrupt.c   |3 ++-
 hw/syborg_keyboard.c|3 ++-
 hw/syborg_pointer.c |3 ++-
 hw/syborg_rtc.c |4 +++-
 hw/syborg_serial.c  |4 +++-
 hw/syborg_timer.c   |4 +++-
 hw/tsc2005.c|4 +++-
 hw/tsc210x.c|4 +++-
 hw/unin_pci.c   |4 +++-
 hw/virtio-balloon.c |3 ++-
 hw/virtio-blk.c |4 +++-
 hw/virtio-net.c |4 +++-
 hw/virtio-serial-bus.c  |4 +++-
 hw/zaurus.c |4 +++-
 qemu-common.h   |2 +-
 savevm.c|3 +--
 slirp/slirp.c   |6 --
 target-arm/machine.c|3 ++-
 target-cris/machine.c   |3 ++-
 target-i386/machine.c   |3 ++-
 target-microblaze/machine.c |3 ++-
 target-mips/machine.c   |3 ++-
 target-ppc/machine.c|3 ++-
 target-s390x/machine.c  |3 ++-
 target-sparc/machine.c  |3 ++-
 67 files changed, 219 insertions(+), 84 deletions(-)

diff --git a/hw/adb.c b/hw/adb.c
index 99b30f6..f400d12 100644
--- a/hw/adb.c
+++ b/hw/adb.c
@@ -261,7 +261,7 @@ static int adb_kbd_request(ADBDevice *d, uint8_t *obuf,
 return olen;
 }
 
-static void adb_kbd_save(QEMUFile *f, void *opaque)
+static int adb_kbd_save(QEMUFile *f, void *opaque)
 {
 KBDState *s = (KBDState *)opaque;
 
@@ -269,6 +269,8 @@ static void adb_kbd_save(QEMUFile *f, void *opaque)
 qemu_put_sbe32s(f, s-rptr);
 qemu_put_sbe32s(f, s-wptr);
 qemu_put_sbe32s(f, s-count);
+
+return 0;
 }
 
 static int adb_kbd_load(QEMUFile *f, void *opaque, int version_id)
@@ -439,7 +441,7 @@ static int adb_mouse_reset(ADBDevice *d)
 return 0;
 }
 
-static void adb_mouse_save(QEMUFile *f, void *opaque)
+static int adb_mouse_save(QEMUFile *f, void *opaque)
 {
 MouseState *s = (MouseState *)opaque;
 
@@ -448,6 +450,8 @@ static void adb_mouse_save(QEMUFile *f, void *opaque)
 qemu_put_sbe32s(f, s-dx);
 qemu_put_sbe32s(f, s-dy);
 qemu_put_sbe32s(f, s-dz);
+
+return 0;
 }
 
 static int adb_mouse_load(QEMUFile *f, void *opaque, int version_id)
diff --git a/hw/ads7846.c b/hw/ads7846.c
index b3bbeaf..4440ed2 100644
--- a/hw/ads7846.c
+++ b/hw/ads7846.c
@@ -105,7 +105,7 @@ static void ads7846_ts_event(void *opaque,
 }
 }
 
-static void ads7846_save(QEMUFile *f, void *opaque)
+static int ads7846_save(QEMUFile *f, void *opaque)
 {
 ADS7846State *s = (ADS7846State *) opaque;
 int i;
@@ -115,6 +115,8 @@ static void ads7846_save(QEMUFile *f, void *opaque)
 qemu_put_be32(f, s-noise);
 qemu_put_be32(f, s-cycle);
 qemu_put_be32(f, s-output);
+
+return 0;
 }
 
 static int ads7846_load(QEMUFile *f, void *opaque, int version_id)
diff --git a/hw/arm_gic.c b/hw/arm_gic.c
index 8286a28..7790a10 100644
--- a/hw/arm_gic.c
+++ b/hw/arm_gic.c
@@ -653,7 +653,7 @@ static void gic_reset(gic_state *s)
 #endif
 }
 
-static void gic_save(QEMUFile *f, void *opaque)
+static int gic_save(QEMUFile *f, void *opaque)
 {
 

[PATCH 2/6] savevm: Allow vmsd-pre_save to return error

2010-10-06 Thread Alex Williamson
This allows vmsd based saves to also have a way to signal that
they can't be saved or migrated.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/fdc.c  |3 ++-
 hw/hpet.c |3 ++-
 hw/hw.h   |6 +++---
 hw/i2c.c  |3 ++-
 hw/ide/core.c |4 +++-
 hw/lsi53c895a.c   |4 +++-
 hw/rtl8139.c  |4 +++-
 hw/serial.c   |3 ++-
 hw/twl92230.c |3 ++-
 hw/usb-uhci.c |3 ++-
 hw/wm8750.c   |3 ++-
 savevm.c  |   36 +++-
 target-i386/machine.c |6 +++---
 13 files changed, 52 insertions(+), 29 deletions(-)

diff --git a/hw/fdc.c b/hw/fdc.c
index c159dcb..ff48c70 100644
--- a/hw/fdc.c
+++ b/hw/fdc.c
@@ -643,11 +643,12 @@ static const VMStateDescription vmstate_fdrive = {
 }
 };
 
-static void fdc_pre_save(void *opaque)
+static int fdc_pre_save(void *opaque)
 {
 FDCtrl *s = opaque;
 
 s-dor_vmstate = s-dor | GET_CUR_DRV(s);
+return 0;
 }
 
 static int fdc_post_load(void *opaque, int version_id)
diff --git a/hw/hpet.c b/hw/hpet.c
index d5c406c..e586e68 100644
--- a/hw/hpet.c
+++ b/hw/hpet.c
@@ -204,12 +204,13 @@ static void update_irq(struct HPETTimer *timer, int set)
 }
 }
 
-static void hpet_pre_save(void *opaque)
+static int hpet_pre_save(void *opaque)
 {
 HPETState *s = opaque;
 
 /* save current counter value */
 s-hpet_counter = hpet_get_ticks(s);
+return 0;
 }
 
 static int hpet_pre_load(void *opaque)
diff --git a/hw/hw.h b/hw/hw.h
index b6f1236..91a60ca 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -328,7 +328,7 @@ struct VMStateDescription {
 LoadStateHandler *load_state_old;
 int (*pre_load)(void *opaque);
 int (*post_load)(void *opaque, int version_id);
-void (*pre_save)(void *opaque);
+int (*pre_save)(void *opaque);
 VMStateField *fields;
 const VMStateSubsection *subsections;
 };
@@ -773,8 +773,8 @@ extern const VMStateDescription vmstate_i2c_slave;
 
 extern int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
   void *opaque, int version_id);
-extern void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
-   void *opaque);
+extern int vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
+  void *opaque);
 extern int vmstate_register(DeviceState *dev, int instance_id,
 const VMStateDescription *vmsd, void *base);
 extern int vmstate_register_with_alias_id(DeviceState *dev,
diff --git a/hw/i2c.c b/hw/i2c.c
index f80d12d..f05c2ef 100644
--- a/hw/i2c.c
+++ b/hw/i2c.c
@@ -26,11 +26,12 @@ static struct BusInfo i2c_bus_info = {
 }
 };
 
-static void i2c_bus_pre_save(void *opaque)
+static int i2c_bus_pre_save(void *opaque)
 {
 i2c_bus *bus = opaque;
 
 bus-saved_address = bus-current_dev ? bus-current_dev-address : -1;
+return 0;
 }
 
 static int i2c_bus_post_load(void *opaque, int version_id)
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 06b6e14..eb5f095 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2792,7 +2792,7 @@ static int ide_drive_pio_post_load(void *opaque, int 
version_id)
 return 0;
 }
 
-static void ide_drive_pio_pre_save(void *opaque)
+static int ide_drive_pio_pre_save(void *opaque)
 {
 IDEState *s = opaque;
 int idx;
@@ -2808,6 +2808,8 @@ static void ide_drive_pio_pre_save(void *opaque)
 } else {
 s-end_transfer_fn_idx = idx;
 }
+
+return 0;
 }
 
 static bool ide_drive_pio_state_needed(void *opaque)
diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index 5eaf69e..7315a3f 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -2045,7 +2045,7 @@ static void lsi_scsi_reset(DeviceState *dev)
 lsi_soft_reset(s);
 }
 
-static void lsi_pre_save(void *opaque)
+static int lsi_pre_save(void *opaque)
 {
 LSIState *s = opaque;
 
@@ -2054,6 +2054,8 @@ static void lsi_pre_save(void *opaque)
 assert(s-current-dma_len == 0);
 }
 assert(QTAILQ_EMPTY(s-queue));
+
+return 0;
 }
 
 static const VMStateDescription vmstate_lsi_scsi = {
diff --git a/hw/rtl8139.c b/hw/rtl8139.c
index d92981d..56271fb 100644
--- a/hw/rtl8139.c
+++ b/hw/rtl8139.c
@@ -3173,7 +3173,7 @@ static int rtl8139_post_load(void *opaque, int version_id)
 return 0;
 }
 
-static void rtl8139_pre_save(void *opaque)
+static int rtl8139_pre_save(void *opaque)
 {
 RTL8139State* s = opaque;
 int64_t current_time = qemu_get_clock(vm_clock);
@@ -3182,6 +3182,8 @@ static void rtl8139_pre_save(void *opaque)
 rtl8139_set_next_tctr_time(s, current_time);
 s-TCTR = muldiv64(current_time - s-TCTR_base, PCI_FREQUENCY,
get_ticks_per_sec());
+
+return 0;
 }
 
 static const VMStateDescription vmstate_rtl8139 = {
diff --git a/hw/serial.c b/hw/serial.c
index 9ebc452..edfdd4d 100644
--- a/hw/serial.c
+++ b/hw/serial.c
@@ -659,10 +659,11 @@ static void 

[PATCH 3/6] pci: Allow pci_device_save() to return error

2010-10-06 Thread Alex Williamson
Carry the vmsd pre_save error reporting through pci_device_save().

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/grackle_pci.c |4 +---
 hw/gt64xxx.c |3 +--
 hw/ivshmem.c |7 ++-
 hw/openpic.c |4 +---
 hw/pci.c |9 +++--
 hw/pci.h |2 +-
 hw/piix4.c   |3 +--
 hw/ppc4xx_pci.c  |7 +--
 hw/ppce500_pci.c |7 +--
 hw/unin_pci.c|4 +---
 10 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/hw/grackle_pci.c b/hw/grackle_pci.c
index f6905fb..c7164c5 100644
--- a/hw/grackle_pci.c
+++ b/hw/grackle_pci.c
@@ -61,9 +61,7 @@ static int pci_grackle_save(QEMUFile* f, void *opaque)
 {
 PCIDevice *d = opaque;
 
-pci_device_save(d, f);
-
-return 0;
+return pci_device_save(d, f);
 }
 
 static int pci_grackle_load(QEMUFile* f, void *opaque, int version_id)
diff --git a/hw/gt64xxx.c b/hw/gt64xxx.c
index 7d8c3b3..21a0e57 100644
--- a/hw/gt64xxx.c
+++ b/hw/gt64xxx.c
@@ -1089,8 +1089,7 @@ static void gt64120_reset(void *opaque)
 static int gt64120_save(QEMUFile* f, void *opaque)
 {
 PCIDevice *d = opaque;
-pci_device_save(d, f);
-return 0;
+return pci_device_save(d, f);
 }
 
 static int gt64120_load(QEMUFile* f, void *opaque, int version_id)
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 0919c4e..3726a7f 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -619,9 +619,14 @@ static void ivshmem_setup_msi(IVShmemState * s) {
 static int ivshmem_save(QEMUFile* f, void *opaque)
 {
 IVShmemState *proxy = opaque;
+int ret;
 
 IVSHMEM_DPRINTF(ivshmem_save\n);
-pci_device_save(proxy-dev, f);
+
+ret = pci_device_save(proxy-dev, f);
+if (ret  0) {
+return ret;
+}
 
 if (ivshmem_has_feature(proxy, IVSHMEM_MSI)) {
 msix_save(proxy-dev, f);
diff --git a/hw/openpic.c b/hw/openpic.c
index 4ca4ba3..4537239 100644
--- a/hw/openpic.c
+++ b/hw/openpic.c
@@ -1102,9 +1102,7 @@ static int openpic_save(QEMUFile* f, void *opaque)
 }
 #endif
 
-pci_device_save(opp-pci_dev, f);
-
-return 0;
+return pci_device_save(opp-pci_dev, f);
 }
 
 static void openpic_load_IRQ_queue(QEMUFile* f, IRQ_queue_t *q)
diff --git a/hw/pci.c b/hw/pci.c
index 15416dd..a30f6ec 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -434,16 +434,21 @@ static inline const VMStateDescription 
*pci_get_vmstate(PCIDevice *s)
 return pci_is_express(s) ? vmstate_pcie_device : vmstate_pci_device;
 }
 
-void pci_device_save(PCIDevice *s, QEMUFile *f)
+int pci_device_save(PCIDevice *s, QEMUFile *f)
 {
+int ret;
 /* Clear interrupt status bit: it is implicit
  * in irq_state which we are saving.
  * This makes us compatible with old devices
  * which never set or clear this bit. */
 s-config[PCI_STATUS] = ~PCI_STATUS_INTERRUPT;
-vmstate_save_state(f, pci_get_vmstate(s), s);
+ret = vmstate_save_state(f, pci_get_vmstate(s), s);
+if (ret  0) {
+return ret;
+}
 /* Restore the interrupt status bit. */
 pci_update_irq_status(s);
+return 0;
 }
 
 int pci_device_load(PCIDevice *s, QEMUFile *f)
diff --git a/hw/pci.h b/hw/pci.h
index 3d23f03..bb9ad79 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -198,7 +198,7 @@ uint32_t pci_default_read_config(PCIDevice *d,
  uint32_t address, int len);
 void pci_default_write_config(PCIDevice *d,
   uint32_t address, uint32_t val, int len);
-void pci_device_save(PCIDevice *s, QEMUFile *f);
+int pci_device_save(PCIDevice *s, QEMUFile *f);
 int pci_device_load(PCIDevice *s, QEMUFile *f);
 
 typedef void (*pci_set_irq_fn)(void *opaque, int irq_num, int level);
diff --git a/hw/piix4.c b/hw/piix4.c
index 5209061..9f560ac 100644
--- a/hw/piix4.c
+++ b/hw/piix4.c
@@ -71,8 +71,7 @@ static void piix4_reset(void *opaque)
 static int piix_save(QEMUFile* f, void *opaque)
 {
 PCIDevice *d = opaque;
-pci_device_save(d, f);
-return 0;
+return pci_device_save(d, f);
 }
 
 static int piix_load(QEMUFile* f, void *opaque, int version_id)
diff --git a/hw/ppc4xx_pci.c b/hw/ppc4xx_pci.c
index 7507d08..3499270 100644
--- a/hw/ppc4xx_pci.c
+++ b/hw/ppc4xx_pci.c
@@ -301,9 +301,12 @@ static void ppc4xx_pci_set_irq(void *opaque, int irq_num, 
int level)
 static int ppc4xx_pci_save(QEMUFile *f, void *opaque)
 {
 PPC4xxPCIState *controller = opaque;
-int i;
+int i, ret;
 
-pci_device_save(controller-pci_dev, f);
+ret = pci_device_save(controller-pci_dev, f);
+if (ret  0) {
+return ret;
+}
 
 for (i = 0; i  PPC4xx_PCI_NR_PMMS; i++) {
 qemu_put_be32s(f, controller-pmm[i].la);
diff --git a/hw/ppce500_pci.c b/hw/ppce500_pci.c
index 9babe05..97a7743 100644
--- a/hw/ppce500_pci.c
+++ b/hw/ppce500_pci.c
@@ -219,9 +219,12 @@ static void mpc85xx_pci_set_irq(void *opaque, int irq_num, 
int level)
 static int ppce500_pci_save(QEMUFile *f, void *opaque)
 {
 PPCE500PCIState *controller = opaque;
-int i;
+int i, ret;
 
-   

[PATCH 4/6] virtio: Allow virtio_save() errors

2010-10-06 Thread Alex Williamson
Carry pci_device_save() error through to virtio_save().

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/virtio-balloon.c|6 +-
 hw/virtio-blk.c|6 +-
 hw/virtio-net.c|7 ++-
 hw/virtio-pci.c|   10 --
 hw/virtio-serial-bus.c |6 +-
 hw/virtio.c|   14 ++
 hw/virtio.h|4 ++--
 7 files changed, 41 insertions(+), 12 deletions(-)

diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index 719e72c..2bf009d 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -230,8 +230,12 @@ static void virtio_balloon_to_target(void *opaque, 
ram_addr_t target,
 static int virtio_balloon_save(QEMUFile *f, void *opaque)
 {
 VirtIOBalloon *s = opaque;
+int ret;
 
-virtio_save(s-vdev, f);
+ret = virtio_save(s-vdev, f);
+if (ret  0) {
+return ret;
+}
 
 qemu_put_be32(f, s-num_pages);
 qemu_put_be32(f, s-actual);
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 3770901..b4772bf 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -464,8 +464,12 @@ static int virtio_blk_save(QEMUFile *f, void *opaque)
 {
 VirtIOBlock *s = opaque;
 VirtIOBlockReq *req = s-rq;
+int ret;
 
-virtio_save(s-vdev, f);
+ret = virtio_save(s-vdev, f);
+if (ret  0) {
+return ret;
+}
 
 while (req) {
 qemu_put_sbyte(f, 1);
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index 1b683d9..6673320 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -782,6 +782,7 @@ static void virtio_net_tx_bh(void *opaque)
 static int virtio_net_save(QEMUFile *f, void *opaque)
 {
 VirtIONet *n = opaque;
+int ret;
 
 if (n-vhost_started) {
 /* TODO: should we really stop the backend?
@@ -789,7 +790,11 @@ static int virtio_net_save(QEMUFile *f, void *opaque)
 vhost_net_stop(tap_get_vhost_net(n-nic-nc.peer), n-vdev);
 n-vhost_started = 0;
 }
-virtio_save(n-vdev, f);
+
+ret = virtio_save(n-vdev, f);
+if (ret  0) {
+return ret;
+}
 
 qemu_put_buffer(f, n-mac, ETH_ALEN);
 qemu_put_be32(f, n-tx_waiting);
diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 86e6b0a..a7603bb 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -121,13 +121,19 @@ static void virtio_pci_notify(void *opaque, uint16_t 
vector)
 qemu_set_irq(proxy-pci_dev.irq[0], proxy-vdev-isr  1);
 }
 
-static void virtio_pci_save_config(void * opaque, QEMUFile *f)
+static int virtio_pci_save_config(void * opaque, QEMUFile *f)
 {
 VirtIOPCIProxy *proxy = opaque;
-pci_device_save(proxy-pci_dev, f);
+int ret;
+
+ret = pci_device_save(proxy-pci_dev, f);
+if (ret  0) {
+return ret;
+}
 msix_save(proxy-pci_dev, f);
 if (msix_present(proxy-pci_dev))
 qemu_put_be16(f, proxy-vdev-config_vector);
+return 0;
 }
 
 static void virtio_pci_save_queue(void * opaque, int n, QEMUFile *f)
diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 7f00fcf..ca57dda 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -459,9 +459,13 @@ static int virtio_serial_save(QEMUFile *f, void *opaque)
 VirtIOSerialPort *port;
 uint32_t nr_active_ports;
 unsigned int i;
+int ret;
 
 /* The virtio device */
-virtio_save(s-vdev, f);
+ret = virtio_save(s-vdev, f);
+if (ret  0) {
+return ret;
+}
 
 /* The config space */
 qemu_put_be16s(f, s-config.cols);
diff --git a/hw/virtio.c b/hw/virtio.c
index fbef788..27b0e84 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -640,12 +640,16 @@ void virtio_notify_config(VirtIODevice *vdev)
 virtio_notify_vector(vdev, vdev-config_vector);
 }
 
-void virtio_save(VirtIODevice *vdev, QEMUFile *f)
+int virtio_save(VirtIODevice *vdev, QEMUFile *f)
 {
-int i;
+int i, ret;
 
-if (vdev-binding-save_config)
-vdev-binding-save_config(vdev-binding_opaque, f);
+if (vdev-binding-save_config) {
+ret = vdev-binding-save_config(vdev-binding_opaque, f);
+if (ret  0) {
+return ret;
+}
+}
 
 qemu_put_8s(f, vdev-status);
 qemu_put_8s(f, vdev-isr);
@@ -671,6 +675,8 @@ void virtio_save(VirtIODevice *vdev, QEMUFile *f)
 if (vdev-binding-save_queue)
 vdev-binding-save_queue(vdev-binding_opaque, i, f);
 }
+
+return 0;
 }
 
 int virtio_load(VirtIODevice *vdev, QEMUFile *f)
diff --git a/hw/virtio.h b/hw/virtio.h
index 96514e6..5c5da3a 100644
--- a/hw/virtio.h
+++ b/hw/virtio.h
@@ -88,7 +88,7 @@ typedef struct VirtQueueElement
 
 typedef struct {
 void (*notify)(void * opaque, uint16_t vector);
-void (*save_config)(void * opaque, QEMUFile *f);
+int (*save_config)(void * opaque, QEMUFile *f);
 void (*save_queue)(void * opaque, int n, QEMUFile *f);
 int (*load_config)(void * opaque, QEMUFile *f);
 int (*load_queue)(void * opaque, int n, QEMUFile *f);
@@ -150,7 +150,7 @@ int 

[PATCH 5/6] savevm: Allow set_params and save_live_state to error

2010-10-06 Thread Alex Williamson
This lets a save state handler NAK a migration or cancel if
it runs into problems.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 block-migration.c |4 +++-
 hw/hw.h   |2 +-
 savevm.c  |   18 +++---
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 0bfdb73..5fb3b72 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -628,13 +628,15 @@ static int block_load(QEMUFile *f, void *opaque, int 
version_id)
 return 0;
 }
 
-static void block_set_params(int blk_enable, int shared_base, void *opaque)
+static int block_set_params(int blk_enable, int shared_base, void *opaque)
 {
 block_mig_state.blk_enable = blk_enable;
 block_mig_state.shared_base = shared_base;
 
 /* shared base means that blk_enable = 1 */
 block_mig_state.blk_enable |= shared_base;
+
+return 0;
 }
 
 void blk_mig_init(void)
diff --git a/hw/hw.h b/hw/hw.h
index 91a60ca..95f2d52 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -239,7 +239,7 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
 int64_t qemu_ftell(QEMUFile *f);
 int64_t qemu_fseek(QEMUFile *f, int64_t pos, int whence);
 
-typedef void SaveSetParamsHandler(int blk_enable, int shared, void * opaque);
+typedef int SaveSetParamsHandler(int blk_enable, int shared, void * opaque);
 typedef int SaveStateHandler(QEMUFile *f, void *opaque);
 typedef int SaveLiveStateHandler(Monitor *mon, QEMUFile *f, int stage,
  void *opaque);
diff --git a/savevm.c b/savevm.c
index 89c5fac..ad3ab86 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1414,12 +1414,16 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, 
int blk_enable,
 int shared)
 {
 SaveStateEntry *se;
+int ret;
 
 QTAILQ_FOREACH(se, savevm_handlers, entry) {
 if(se-set_params == NULL) {
 continue;
}
-   se-set_params(blk_enable, shared, se-opaque);
+   ret = se-set_params(blk_enable, shared, se-opaque);
+if (ret  0) {
+return ret;
+}
 }
 
 qemu_put_be32(f, QEMU_VM_FILE_MAGIC);
@@ -1443,7 +1447,10 @@ int qemu_savevm_state_begin(Monitor *mon, QEMUFile *f, 
int blk_enable,
 qemu_put_be32(f, se-instance_id);
 qemu_put_be32(f, se-version_id);
 
-se-save_live_state(mon, f, QEMU_VM_SECTION_START, se-opaque);
+ret = se-save_live_state(mon, f, QEMU_VM_SECTION_START, se-opaque);
+if (ret  0) {
+return ret;
+}
 }
 
 if (qemu_file_has_error(f)) {
@@ -1474,6 +1481,8 @@ int qemu_savevm_state_iterate(Monitor *mon, QEMUFile *f)
and reduces the probability that a faster changing state is
synchronized over and over again. */
 break;
+} else if (ret  0) {
+return ret;
 }
 }
 
@@ -1503,7 +1512,10 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f)
 qemu_put_byte(f, QEMU_VM_SECTION_END);
 qemu_put_be32(f, se-section_id);
 
-se-save_live_state(mon, f, QEMU_VM_SECTION_END, se-opaque);
+r = se-save_live_state(mon, f, QEMU_VM_SECTION_END, se-opaque);
+if (r  0) {
+return r;
+}
 }
 
 QTAILQ_FOREACH(se, savevm_handlers, entry) {

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/6] savevm: Remove register_device_unmigratable()

2010-10-06 Thread Alex Williamson
Now that the save state handlers can return error, individual drivers
can cancel a migration if they hit an error or don't support it.  This
makes the unmigratable callback redundant.  Remove it and change the
only user to cancel the migration in a set_params callback, which
actually happens much earlier in the migration than the unmigratable
flag was checked.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 hw/hw.h  |2 --
 hw/ivshmem.c |   20 ++--
 savevm.c |   31 ---
 3 files changed, 14 insertions(+), 39 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index 95f2d52..6c0aefe 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -264,8 +264,6 @@ int register_savevm_live(DeviceState *dev,
  void *opaque);
 
 void unregister_savevm(DeviceState *dev, const char *idstr, void *opaque);
-void register_device_unmigratable(DeviceState *dev, const char *idstr,
-void *opaque);
 
 typedef void QEMUResetHandler(void *opaque);
 
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 3726a7f..4164861 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -616,6 +616,18 @@ static void ivshmem_setup_msi(IVShmemState * s) {
 s-eventfd_table = qemu_mallocz(s-vectors * sizeof(EventfdEntry));
 }
 
+static int ivshmem_set_param(int blk_enable, int shared, void *opaque)
+{
+IVShmemState *proxy = opaque;
+
+if (proxy-role_val == IVSHMEM_PEER) {
+fprintf(stderr,
+ivshmem device in peer role, cannot be migrated or saved\n);
+return -EINVAL;
+}
+return 0;
+}
+
 static int ivshmem_save(QEMUFile* f, void *opaque)
 {
 IVShmemState *proxy = opaque;
@@ -683,8 +695,8 @@ static int pci_ivshmem_init(PCIDevice *dev)
 s-ivshmem_size = ivshmem_get_size(s);
 }
 
-register_savevm(s-dev.qdev, ivshmem, 0, 0, ivshmem_save, ivshmem_load,
-dev);
+register_savevm_live(s-dev.qdev, ivshmem, 0, 0, ivshmem_set_param,
+ NULL, ivshmem_save, ivshmem_load, dev);
 
 /* IRQFD requires MSI */
 if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD) 
@@ -707,10 +719,6 @@ static int pci_ivshmem_init(PCIDevice *dev)
 s-role_val = IVSHMEM_MASTER; /* default */
 }
 
-if (s-role_val == IVSHMEM_PEER) {
-register_device_unmigratable(s-dev.qdev, ivshmem, s);
-}
-
 pci_conf = s-dev.config;
 pci_config_set_vendor_id(pci_conf, PCI_VENDOR_ID_REDHAT_QUMRANET);
 pci_conf[0x02] = 0x10;
diff --git a/savevm.c b/savevm.c
index ad3ab86..1b4ee08 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1018,7 +1018,6 @@ typedef struct SaveStateEntry {
 const VMStateDescription *vmsd;
 void *opaque;
 CompatEntry *compat;
-int no_migrate;
 } SaveStateEntry;
 
 
@@ -1082,7 +1081,6 @@ int register_savevm_live(DeviceState *dev,
 se-load_state = load_state;
 se-opaque = opaque;
 se-vmsd = NULL;
-se-no_migrate = 0;
 
 if (dev  dev-parent_bus  dev-parent_bus-info-get_dev_path) {
 char *id = dev-parent_bus-info-get_dev_path(dev);
@@ -1149,31 +1147,6 @@ void unregister_savevm(DeviceState *dev, const char 
*idstr, void *opaque)
 }
 }
 
-/* mark a device as not to be migrated, that is the device should be
-   unplugged before migration */
-void register_device_unmigratable(DeviceState *dev, const char *idstr,
-void *opaque)
-{
-SaveStateEntry *se;
-char id[256] = ;
-
-if (dev  dev-parent_bus  dev-parent_bus-info-get_dev_path) {
-char *path = dev-parent_bus-info-get_dev_path(dev);
-if (path) {
-pstrcpy(id, sizeof(id), path);
-pstrcat(id, sizeof(id), /);
-qemu_free(path);
-}
-}
-pstrcat(id, sizeof(id), idstr);
-
-QTAILQ_FOREACH(se, savevm_handlers, entry) {
-if (strcmp(se-idstr, id) == 0  se-opaque == opaque) {
-se-no_migrate = 1;
-}
-}
-}
-
 int vmstate_register_with_alias_id(DeviceState *dev, int instance_id,
const VMStateDescription *vmsd,
void *opaque, int alias_id,
@@ -1389,10 +1362,6 @@ static int vmstate_load(QEMUFile *f, SaveStateEntry *se, 
int version_id)
 
 static int vmstate_save(QEMUFile *f, SaveStateEntry *se)
 {
-if (se-no_migrate) {
-return -1;
-}
-
 if (!se-vmsd) { /* Old style */
 return se-save_state(f, se-opaque);
 }

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] qemu-kvm/vhost: fix up irqfd support

2010-10-06 Thread Michael S. Tsirkin
On Wed, Oct 06, 2010 at 11:24:24AM -0600, Alex Williamson wrote:
 You could always keep the functions as separate wrapper callers of the
 common function so you only need to keep true = unset, false = set
 straight in one place.  Thanks,


Just to show why it does not work, I did exactly this: as you see the
code is shorter but the true/false magic gets spread: it was in 2
places, (set/unset) now it is in 4 places and it is within the loop, in
code that is more complex.

So I think I'll stick to the original version and we can
patch it up later if there's a will.


diff --git a/hw/msix.c b/hw/msix.c
index 3d4dd61..4b705a0 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -583,40 +583,15 @@ void msix_unuse_all_vectors(PCIDevice *dev)
 msix_free_irq_entries(dev);
 }
 
-static int msix_set_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
+/* Invoke the notifier if vector entry is used and unmasked. */
+static int msix_notify_if_unmasked(PCIDevice *dev, unsigned vector, bool 
masked)
 {
 int r = 0;
-if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
+if (!dev-msix_entry_used[vector] || msix_is_masked(dev, vector)) {
 return 0;
-
-assert(dev-msix_mask_notifier);
-
-/* Unmask the new notifier unless vector is masked. */
-if (!msix_is_masked(dev, vector)) {
-r = dev-msix_mask_notifier(dev, vector, false);
-if (r  0) {
-return r;
-}
 }
-return r;
-}
-
-static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
-{
-int r = 0;
-if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
-return 0;
-
 assert(dev-msix_mask_notifier);
-
-/* Mask the old notifier unless it is already masked. */
-if (!msix_is_masked(dev, vector)) {
-r = dev-msix_mask_notifier(dev, vector, true);
-if (r  0) {
-return r;
-}
-}
-return r;
+return dev-msix_mask_notifier(dev, vector, masked);
 }
 
 int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f)
@@ -625,7 +600,7 @@ int msix_set_mask_notifier(PCIDevice *dev, 
msix_mask_notifier_func f)
 assert(!dev-msix_mask_notifier);
 dev-msix_mask_notifier = f;
 for (n = 0; n  dev-msix_entries_nr; ++n) {
-r = msix_set_mask_notifier_for_vector(dev, n);
+r = msix_notify_if_unmasked(dev, n, false);
 if (r  0) {
 goto undo;
 }
@@ -634,7 +609,7 @@ int msix_set_mask_notifier(PCIDevice *dev, 
msix_mask_notifier_func f)
 
 undo:
 while (--n = 0) {
-msix_unset_mask_notifier_for_vector(dev, n);
+msix_notify_if_unmasked(dev, n, true);
 }
 dev-msix_mask_notifier = NULL;
 return r;
@@ -645,7 +620,7 @@ int msix_unset_mask_notifier(PCIDevice *dev)
 int r, n;
 assert(dev-msix_mask_notifier);
 for (n = 0; n  dev-msix_entries_nr; ++n) {
-r = msix_unset_mask_notifier_for_vector(dev, n);
+r = msix_notify_if_unmasked(dev, n, true);
 if (r  0) {
 goto undo;
 }
@@ -655,7 +630,7 @@ int msix_unset_mask_notifier(PCIDevice *dev)
 
 undo:
 while (--n = 0) {
-msix_set_mask_notifier_for_vector(dev, n);
+msix_notify_if_unmasked(dev, n, false);
 }
 return r;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] qemu-kvm/vhost: fix up irqfd support

2010-10-06 Thread Alex Williamson
On Wed, 2010-10-06 at 23:44 +0200, Michael S. Tsirkin wrote:
 On Wed, Oct 06, 2010 at 11:24:24AM -0600, Alex Williamson wrote:
  You could always keep the functions as separate wrapper callers of the
  common function so you only need to keep true = unset, false = set
  straight in one place.  Thanks,
 
 
 Just to show why it does not work, I did exactly this: as you see the
 code is shorter but the true/false magic gets spread: it was in 2
 places, (set/unset) now it is in 4 places and it is within the loop, in
 code that is more complex.

You seem to have missed the wrapper function.  I'm simply suggesting
something like this:

static int __do_msix_mask_notifier_for_vector(PCIDevice *dev, unsigned vector, 
bool mask)
{
if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
return 0;

assert(dev-msix_mask_notifier);

/* Set the new notifier unless vector is masked. */
if (!msix_is_masked(dev, vector)) {
return dev-msix_mask_notifier(dev, vector, mask);
}
return 0;
}

static int msix_set_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
{
return __do_msix_mask_notifier_for_vector(dev, vector, false);
}

static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
{
return __do_msix_mask_notifier_for_vector(dev, vector, true);
}

Which then doesn't go on to complicate the callers like the below does.
Thanks,

Alex


 So I think I'll stick to the original version and we can
 patch it up later if there's a will.
 
 
 diff --git a/hw/msix.c b/hw/msix.c
 index 3d4dd61..4b705a0 100644
 --- a/hw/msix.c
 +++ b/hw/msix.c
 @@ -583,40 +583,15 @@ void msix_unuse_all_vectors(PCIDevice *dev)
  msix_free_irq_entries(dev);
  }
  
 -static int msix_set_mask_notifier_for_vector(PCIDevice *dev, unsigned vector)
 +/* Invoke the notifier if vector entry is used and unmasked. */
 +static int msix_notify_if_unmasked(PCIDevice *dev, unsigned vector, bool 
 masked)
  {
  int r = 0;
 -if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
 +if (!dev-msix_entry_used[vector] || msix_is_masked(dev, vector)) {
  return 0;
 -
 -assert(dev-msix_mask_notifier);
 -
 -/* Unmask the new notifier unless vector is masked. */
 -if (!msix_is_masked(dev, vector)) {
 -r = dev-msix_mask_notifier(dev, vector, false);
 -if (r  0) {
 -return r;
 -}
  }
 -return r;
 -}
 -
 -static int msix_unset_mask_notifier_for_vector(PCIDevice *dev, unsigned 
 vector)
 -{
 -int r = 0;
 -if (vector = dev-msix_entries_nr || !dev-msix_entry_used[vector])
 -return 0;
 -
  assert(dev-msix_mask_notifier);
 -
 -/* Mask the old notifier unless it is already masked. */
 -if (!msix_is_masked(dev, vector)) {
 -r = dev-msix_mask_notifier(dev, vector, true);
 -if (r  0) {
 -return r;
 -}
 -}
 -return r;
 +return dev-msix_mask_notifier(dev, vector, masked);
  }
  
  int msix_set_mask_notifier(PCIDevice *dev, msix_mask_notifier_func f)
 @@ -625,7 +600,7 @@ int msix_set_mask_notifier(PCIDevice *dev, 
 msix_mask_notifier_func f)
  assert(!dev-msix_mask_notifier);
  dev-msix_mask_notifier = f;
  for (n = 0; n  dev-msix_entries_nr; ++n) {
 -r = msix_set_mask_notifier_for_vector(dev, n);
 +r = msix_notify_if_unmasked(dev, n, false);
  if (r  0) {
  goto undo;
  }
 @@ -634,7 +609,7 @@ int msix_set_mask_notifier(PCIDevice *dev, 
 msix_mask_notifier_func f)
  
  undo:
  while (--n = 0) {
 -msix_unset_mask_notifier_for_vector(dev, n);
 +msix_notify_if_unmasked(dev, n, true);
  }
  dev-msix_mask_notifier = NULL;
  return r;
 @@ -645,7 +620,7 @@ int msix_unset_mask_notifier(PCIDevice *dev)
  int r, n;
  assert(dev-msix_mask_notifier);
  for (n = 0; n  dev-msix_entries_nr; ++n) {
 -r = msix_unset_mask_notifier_for_vector(dev, n);
 +r = msix_notify_if_unmasked(dev, n, true);
  if (r  0) {
  goto undo;
  }
 @@ -655,7 +630,7 @@ int msix_unset_mask_notifier(PCIDevice *dev)
  
  undo:
  while (--n = 0) {
 -msix_set_mask_notifier_for_vector(dev, n);
 +msix_notify_if_unmasked(dev, n, false);
  }
  return r;
  }
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 17/19] KVM test: vlan subtest - Replace extra_params '-snapshot' with image_snapshot

2010-10-06 Thread Lucas Meneghel Rodrigues
From: Amos Kong ak...@redhat.com

Framework could not totalise default extra_params and extra_params_vm1 in the
following condition, it's difficult to realise when parsing config file or
calling get_sub_dict*().

extra_params += ' str1'
- case:
extra_params_vm1 +=  str2

Signed-off-by: Amos Kong ak...@redhat.com
---
 client/tests/kvm/tests_base.cfg.sample |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/client/tests/kvm/tests_base.cfg.sample 
b/client/tests/kvm/tests_base.cfg.sample
index ceabbf1..e9cb1b4 100644
--- a/client/tests/kvm/tests_base.cfg.sample
+++ b/client/tests/kvm/tests_base.cfg.sample
@@ -467,8 +467,7 @@ variants:
 send_cmd = nc %s %s  %s
 nic_mode = tap
 vms +=  vm2
-extra_params_vm1 +=  -snapshot
-extra_params_vm2 +=  -snapshot
+image_snapshot = yes
 kill_vm_vm2 = yes
 kill_vm_gracefully_vm2 = no
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/19] KVM test: Add a subtest of load/unload nic driver

2010-10-06 Thread Lucas Meneghel Rodrigues
Repeatedly load/unload nic driver, try to transfer file between guest and host
by threads at the same time, and check the md5sum.

Changes from v4:
- Give some time for the interface to be present after
modprobe is executed.

Changes from v1:
- Use a new method to get nic driver name
- Use utils.hash_file() to get md5sum

Signed-off-by: Amos Kong ak...@redhat.com
Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/tests/nicdriver_unload.py |  115 
 client/tests/kvm/tests_base.cfg.sample |   10 ++-
 2 files changed, 124 insertions(+), 1 deletions(-)
 create mode 100644 client/tests/kvm/tests/nicdriver_unload.py

diff --git a/client/tests/kvm/tests/nicdriver_unload.py 
b/client/tests/kvm/tests/nicdriver_unload.py
new file mode 100644
index 000..47318ba
--- /dev/null
+++ b/client/tests/kvm/tests/nicdriver_unload.py
@@ -0,0 +1,115 @@
+import logging, threading, os
+from autotest_lib.client.common_lib import error
+from autotest_lib.client.bin import utils
+import kvm_utils, kvm_test_utils
+
+def run_nicdriver_unload(test, params, env):
+
+Test nic driver.
+
+1) Boot a VM.
+2) Get the NIC driver name.
+3) Repeatedly unload/load NIC driver.
+4) Multi-session TCP transfer on test interface.
+5) Check whether the test interface should still work.
+
+@param test: KVM test object.
+@param params: Dictionary with the test parameters.
+@param env: Dictionary with test environment.
+
+timeout = int(params.get(login_timeout, 360))
+vm = kvm_test_utils.get_living_vm(env, params.get(main_vm))
+session = kvm_test_utils.wait_for_login(vm, timeout=timeout)
+logging.info(Trying to log into guest '%s' by serial, vm.name)
+session2 = kvm_utils.wait_for(lambda: vm.serial_login(),
+  timeout, 0, step=2)
+if not session2:
+raise error.TestFail(Could not log into guest '%s' % vm.name)
+
+ethname = kvm_test_utils.get_linux_ifname(session, vm.get_mac_address(0))
+sys_path = /sys/class/net/%s/device/driver % (ethname)
+s, o = session.get_command_status_output('readlink -e %s' % sys_path)
+if s:
+raise error.TestError(Could not find driver name)
+driver = os.path.basename(o.strip())
+logging.info(driver is %s, driver)
+
+class ThreadScp(threading.Thread):
+def run(self):
+remote_file = '/tmp/' + self.getName()
+file_list.append(remote_file)
+ret = vm.copy_files_to(file_name, remote_file, timeout=scp_timeout)
+if ret:
+logging.debug(File %s was transfered successfuly, 
remote_file)
+else:
+logging.debug(Failed to transfer file %s, remote_file)
+
+def compare(origin_file, receive_file):
+cmd = md5sum %s
+check_sum1 = utils.hash_file(origin_file, method=md5)
+s, output2 = session.get_command_status_output(cmd % receive_file)
+if s != 0:
+logging.error(Could not get md5sum of receive_file)
+return False
+check_sum2 = output2.strip().split()[0]
+logging.debug(original file md5: %s, received file md5: %s,
+  check_sum1, check_sum2)
+if check_sum1 != check_sum2:
+logging.error(MD5 hash of origin and received files doesn't 
match)
+return False
+return True
+
+#produce sized file in host
+file_size = params.get(file_size)
+file_name = /tmp/nicdriver_unload_file
+cmd = dd if=/dev/urandom of=%s bs=%sM count=1
+utils.system(cmd % (file_name, file_size))
+
+file_list = []
+connect_time = params.get(connect_time)
+scp_timeout = int(params.get(scp_timeout))
+thread_num = int(params.get(thread_num))
+unload_load_cmd = (sleep %s  ifconfig %s down  modprobe -r %s  
+   sleep 1  modprobe %s  sleep 4  ifconfig %s up %
+   (connect_time, ethname, driver, driver, ethname))
+pid = os.fork()
+if pid != 0:
+logging.info(Unload/load NIC driver repeatedly in guest...)
+while True:
+logging.debug(Try to unload/load nic drive once)
+if session2.get_command_status(unload_load_cmd, timeout=120) != 0:
+session.get_command_output(rm -rf /tmp/Thread-*)
+raise error.TestFail(Unload/load nic driver failed)
+pid, s = os.waitpid(pid, os.WNOHANG)
+status = os.WEXITSTATUS(s)
+if (pid, status) != (0, 0):
+logging.debug(Child process ending)
+break
+else:
+logging.info(Multi-session TCP data transfer)
+threads = []
+for i in range(thread_num):
+t = ThreadScp()
+t.start()
+threads.append(t)
+for t in threads:
+t.join(timeout = scp_timeout)
+os._exit(0)
+
+session2.close()
+
+try:
+logging.info(Check MD5 hash for 

[PATCH 16/19] KVM test: Improve vlan subtest

2010-10-06 Thread Lucas Meneghel Rodrigues
From: Amos Kong ak...@redhat.com

This is an enhancement of existed vlan test. Rename the vlan_tag.py to vlan.py,
it is more reasonable.
. Setup arp from /proc/sys/net/ipv4/conf/all/arp_ignore
. Multiple vlans exist simultaneously
. Test ping between same and different vlans
. Test by TCP data transfer, floop ping between same vlan
. Maximal plumb/unplumb vlans

Changes from v4:
- Do not use hardcoded nw interfaces

Signed-off-by: Amos Kong ak...@redhat.com
---
 client/tests/kvm/tests/vlan.py |  185 
 client/tests/kvm/tests/vlan_tag.py |   68 
 client/tests/kvm/tests_base.cfg.sample |   16 ++-
 3 files changed, 195 insertions(+), 74 deletions(-)
 create mode 100644 client/tests/kvm/tests/vlan.py
 delete mode 100644 client/tests/kvm/tests/vlan_tag.py

diff --git a/client/tests/kvm/tests/vlan.py b/client/tests/kvm/tests/vlan.py
new file mode 100644
index 000..f41ea6a
--- /dev/null
+++ b/client/tests/kvm/tests/vlan.py
@@ -0,0 +1,185 @@
+import logging, time, re
+from autotest_lib.client.common_lib import error
+import kvm_test_utils, kvm_utils
+
+def run_vlan(test, params, env):
+
+Test 802.1Q vlan of NIC, config it by vconfig command.
+
+1) Create two VMs.
+2) Setup guests in 10 different vlans by vconfig and using hard-coded
+   ip address.
+3) Test by ping between same and different vlans of two VMs.
+4) Test by TCP data transfer, floop ping between same vlan of two VMs.
+5) Test maximal plumb/unplumb vlans.
+6) Recover the vlan config.
+
+@param test: KVM test object.
+@param params: Dictionary with the test parameters.
+@param env: Dictionary with test environment.
+
+
+vm = []
+session = []
+ifname = []
+vm_ip = []
+digest_origin = []
+vlan_ip = ['', '']
+ip_unit = ['1', '2']
+subnet = params.get(subnet)
+vlan_num = int(params.get(vlan_num))
+maximal = int(params.get(maximal))
+file_size = params.get(file_size)
+
+vm.append(kvm_test_utils.get_living_vm(env, params.get(main_vm)))
+vm.append(kvm_test_utils.get_living_vm(env, vm2))
+
+def add_vlan(session, id, iface=eth0):
+if session.get_command_status(vconfig add %s %s % (iface, id)) != 0:
+raise error.TestError(Fail to add %s.%s % (iface, id))
+
+def set_ip_vlan(session, id, ip, iface=eth0):
+iface = %s.%s % (iface, id)
+if session.get_command_status(ifconfig %s %s % (iface, ip)) != 0:
+raise error.TestError(Fail to configure ip for %s % iface)
+
+def set_arp_ignore(session, iface=eth0):
+ignore_cmd = echo 1  /proc/sys/net/ipv4/conf/all/arp_ignore
+if session.get_command_status(ignore_cmd) != 0:
+raise error.TestError(Fail to set arp_ignore of %s % session)
+
+def rem_vlan(session, id, iface=eth0):
+rem_vlan_cmd = if [[ -e /proc/net/vlan/%s ]];then vconfig rem %s;fi
+iface = %s.%s % (iface, id)
+s = session.get_command_status(rem_vlan_cmd % (iface, iface))
+return s
+
+def nc_transfer(src, dst):
+nc_port = kvm_utils.find_free_port(1025, 5334, vm_ip[dst])
+listen_cmd = params.get(listen_cmd)
+send_cmd = params.get(send_cmd)
+
+#listen in dst
+listen_cmd = listen_cmd % (nc_port, receive)
+session[dst].sendline(listen_cmd)
+time.sleep(2)
+#send file from src to dst
+send_cmd = send_cmd % (vlan_ip[dst], str(nc_port), file)
+if session[src].get_command_status(send_cmd, timeout = 60) != 0:
+raise error.TestFail (Fail to send file
+ from vm%s to vm%s % (src+1, dst+1))
+s, o = session[dst].read_up_to_prompt(timeout=60)
+if s != True:
+raise error.TestFail (Fail to receive file
+ from vm%s to vm%s % (src+1, dst+1))
+#check MD5 message digest of receive file in dst
+output = session[dst].get_command_output(md5sum receive).strip()
+digest_receive = re.findall(r'(\w+)', output)[0]
+if digest_receive == digest_origin[src]:
+logging.info(file succeed received in vm %s % vlan_ip[dst])
+else:
+logging.info(digest_origin is  %s % digest_origin[src])
+logging.info(digest_receive is %s % digest_receive)
+raise error.TestFail(File transfered differ from origin)
+session[dst].get_command_status(rm -f receive)
+
+for i in range(2):
+session.append(kvm_test_utils.wait_for_login(vm[i],
+   timeout=int(params.get(login_timeout, 360
+if not session[i] :
+raise error.TestError(Could not log into guest(vm%d) % i)
+logging.info(Logged in)
+
+ifname.append(kvm_test_utils.get_linux_ifname(session[i],
+  vm[i].get_mac_address()))
+#get guest ip
+vm_ip.append(vm[i].get_address())
+
+#produce sized file 

Re: 8 NIC limit - patch - places limit at 32

2010-10-06 Thread Chris Wright
* Anthony Liguori (anth...@codemonkey.ws) wrote:
 BTW, using -device, it should be possible to add a very high number of 
 nics because you can specify the PCI address including a function.  If 
 this doesn't Just Work today, we should make it work.

Should work...test...mostly[1], but I don't actually know of any tools that
make use of it.

thanks,
-chris

[1] 40 worked, 48 caused guest kernel stack corruption, didn't dig in to
see why yet.

Here's my simple wrapper to build up the command line:

QEMU=/home/chrisw/git/kvm/qemu-kvm/x86_64-softmmu/qemu-system-x86_64
BIOS=/home/chrisw/git/kvm/qemu-kvm/pc-bios
DISK=/home/chrisw/disk-snap1.img
SCRIPT=/home/chrisw/git/kvm/qemu-kvm/kvm/scripts/qemu-ifup

unset NETARGS
i=0
dev=4
func=0
max_dev=40
while [ $i -lt $max_dev ]
do
  unset MULTIFUNC
  if [ $(($i + 1)) -lt $max_dev -a $func -eq 0 ]; then
MULTIFUNC=,multifunction=on
  fi

  NETARGS=${NETARGS} -netdev type=tap,id=netdev$i,script=$SCRIPT -device 
virtio-net-pci,mac=52:54:00:12:34:$(printf %.2x\n 
$i),netdev=netdev$i,bus=pci.0,addr=$dev.$func$MULTIFUNC

  i=$(($i+1))
  func=$(($func+1))
  if [ $func -eq 8 ]; then
func=0
dev=$(($dev+1))
  fi
done

$QEMU -L $BIOS -m 1024 -drive file=$DISK,if=virtio,boot=on $NETARGS -vnc :0
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/18] Network Patchset v4

2010-10-06 Thread Lucas Meneghel Rodrigues
On Mon, 2010-09-27 at 18:43 -0400, Lucas Meneghel Rodrigues wrote:
 We are close to the end of this journey. Several little problems
 were fixed and we are down to some little problems:

Ok, all patches applied. Thanks to everyone that helped on this effort!

 1 - jumbo test - tap interface name shouldn't be used to stablish
 arp static entry, use bridge name instead - need validation
 from akong and/or jasonwang
 2 - vlan subtest - still has some problems with it
 3 - ethtool - find a way to install ethtool in guest using distro
 packages - this one can be easily postponed
 
 Please give me some feedback on it.
 
 Amos Kong (11):
   KVM test: Add a new macaddress pool algorithm
   KVM test: Add a new subtest ping
   KVM test: Add basic file transfer test
   KVM test: Add a subtest of nic promisc
   KVM test: Add a subtest of multicast
   KVM test: Add a subtest of pxe
   KVM test: Add a subtest of changing MAC address
   KVM test: Add a netperf subtest
   KVM test: kvm_utils - Add support of check if remote port free
   KVM test: Improve vlan subtest
   KVM test: vlan subtest - Replace extra_params '-snapshot' with
 image_snapshot
 
 Lucas Meneghel Rodrigues (7):
   KVM test: Make physical_resources_check to work with MAC management
   KVM test: Remove address_pools.cfg dependency
   KVM test: Add a get_ifname function
   KVM Test: Add nw related functions ping and get_linux_ifname
   KVM test: Add a subtest jumbo
   KVM test: Add a subtest of load/unload nic driver
   KVM test: Add subtest of testing offload by ethtool
 
  client/tests/kvm/address_pools.cfg.sample  |   65 --
  client/tests/kvm/control   |8 -
  client/tests/kvm/control.parallel  |9 -
  client/tests/kvm/get_started.py|4 +-
  client/tests/kvm/kvm_test_utils.py |  130 -
  client/tests/kvm/kvm_utils.py  |  139 -
  client/tests/kvm/kvm_vm.py |  104 +-
  client/tests/kvm/scripts/join_mcast.py |   37 
  client/tests/kvm/tests/ethtool.py  |  222 
 
  client/tests/kvm/tests/file_transfer.py|   58 +
  client/tests/kvm/tests/jumbo.py|  136 
  client/tests/kvm/tests/mac_change.py   |   68 ++
  client/tests/kvm/tests/multicast.py|   91 
  client/tests/kvm/tests/netperf.py  |   56 +
  client/tests/kvm/tests/nic_promisc.py  |  103 +
  client/tests/kvm/tests/nicdriver_unload.py |  115 ++
  client/tests/kvm/tests/physical_resources_check.py |7 +-
  client/tests/kvm/tests/ping.py |   72 +++
  client/tests/kvm/tests/pxe.py  |   31 +++
  client/tests/kvm/tests/vlan.py |  186 
  client/tests/kvm/tests/vlan_tag.py |   68 --
  client/tests/kvm/tests_base.cfg.sample |   97 -
  22 files changed, 1628 insertions(+), 178 deletions(-)
  delete mode 100644 client/tests/kvm/address_pools.cfg.sample
  create mode 100755 client/tests/kvm/scripts/join_mcast.py
  create mode 100644 client/tests/kvm/tests/ethtool.py
  create mode 100644 client/tests/kvm/tests/file_transfer.py
  create mode 100644 client/tests/kvm/tests/jumbo.py
  create mode 100644 client/tests/kvm/tests/mac_change.py
  create mode 100644 client/tests/kvm/tests/multicast.py
  create mode 100644 client/tests/kvm/tests/netperf.py
  create mode 100644 client/tests/kvm/tests/nic_promisc.py
  create mode 100644 client/tests/kvm/tests/nicdriver_unload.py
  create mode 100644 client/tests/kvm/tests/ping.py
  create mode 100644 client/tests/kvm/tests/pxe.py
  create mode 100644 client/tests/kvm/tests/vlan.py
  delete mode 100644 client/tests/kvm/tests/vlan_tag.py
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest

2010-10-06 Thread Hidetoshi Seto
(2010/10/07 3:10), Dean Nelson wrote:
 On 10/06/2010 11:05 AM, Marcelo Tosatti wrote:
 On Wed, Oct 06, 2010 at 10:58:36AM +0900, Hidetoshi Seto wrote:
 I got some more question:

 (2010/10/05 3:54), Marcelo Tosatti wrote:
 Index: qemu/target-i386/cpu.h
 ===
 --- qemu.orig/target-i386/cpu.h
 +++ qemu/target-i386/cpu.h
 @@ -250,16 +250,32 @@
   #define PG_ERROR_RSVD_MASK 0x08
   #define PG_ERROR_I_D_MASK  0x10

 -#define MCG_CTL_P(1UL8)   /* MCG_CAP register available */
 +#define MCG_CTL_P(1ULL8)   /* MCG_CAP register available */
 +#define MCG_SER_P(1ULL24) /* MCA recovery/new status bits */

 -#define MCE_CAP_DEFMCG_CTL_P
 +#define MCE_CAP_DEF(MCG_CTL_P|MCG_SER_P)
   #define MCE_BANKS_DEF10


 It seems that current kvm doesn't support SER_P, so injecting SRAO
 to guest will mean that guest receives VAL|UC|!PCC and RIPV event
 from virtual processor that doesn't have SER_P.

 Dean also noted this. I don't think it was deliberate choice to not
 expose SER_P. Huang?
 
 In my testing, I found that MCG_SER_P was not being set (and I was
 running on a Nehalem-EX system). Injecting a MCE resulted in the
 guest entering into panic() from mce_panic(). If crash_kexec()
 finds a kexec_crash_image the system ends up rebooting, otherwise,
 what happens next requires operator intervention.

Good to know.
What I'm concerning is that if memory scrubbing SRAO event is
injected when !SER_P, linux guest with certain mce tolerant level
might grade it as UC severity and continue running with none of
panicking, killing and poisoning because of !PCC and RIPV.

Could you provide the panic message of the guest in your test?
I think it can tell me why the mce handler decided to go panic.

 When I applied a patch to the guest's kernel which forces mce_ser to be
 set, as if MCG_SER_P was set (see __mcheck_cpu_cap_init()), I found
 that when the memory page was 'owned' by a guest process, the process
 would be killed (if the page was dirty), and the guest would stay
 running. The HWPoisoned page would be sidelined and not cause any more
 issues.

Excellent.
So while guest kernel knows which page is poisoned, guest processes
are controlled not to touch the page.

... Therefore rebooting the vm and renewing kernel will lost the
information where is poisoned.

 I think most OSes don't expect that it can receives MCE with !PCC
 on traditional x86 processor without SER_P.

 Q1: Is it safe to expect that guests can handle such !PCC event?
 
 This might be best answered by Huang, but as I mentioned above, without
 MCG_SER_P being set, the result was an orderly system panic on the
 guest.

Though I'll wait Huang (I think he is on holiday), I believe that
system panic is just a possible option for AO (Action Optional)
event, no matter how the SER_P is.

 Q2: What is the expected behavior on the guest?
 
 I think I answered this above.

Yeah, thanks.

 
 Q3: What happen if guest reboots itself in response to the MCE?
 
 That depends...
 
 And the following issue also holds for a guest that is rebooted at
 some point having successfully sidelined the bad page.
 
 After the guest has panic'd, a system_reset of the guest or a restart
 initiated by crash_kexec() (called by panic() on the guest), usually
 results in the guest hanging because the bad page still belongs
 to qemu-kvm and is now being referenced by the new guest in some way.

Yes. In other words my concern about reboot is that new guest kernel
including kdump kernel might try to read the bad page.  If there is
no AR-SIGBUS etc., we need some tricks to inhibit such accesses.

 (It actually may not hang, but successfully reboot and be runnable,
 with the bad page lurking in the background. It all seems to depend on
 where the bad page ends up, and whether it's ever referenced.)

I know some tough guys using their PC with buggy DIMMs :-)

 
 I believe there was an attempt to deal with this in kvm on the host.
 See kvm_handle_bad_page(). This function was suppose to result in the
 sending of a BUS_MCEERR_AR flavored SIGBUS by do_sigbus() to qemu-kvm
 which in theory would result in the right thing happening. But commit
 96054569190bdec375fe824e48ca1f4e3b53dd36 prevents the signal from being
 sent. So this mechanism needs to be re-worked, and the issue remains.

Definitely.
I guess Huang has some plan or hint for rework this point.

 
 I would think that if the the bad page can't be sidelined, such that
 the newly booting guest can't use it, then the new guest shouldn't be
 allowed to boot. But perhaps there is some merit in letting it try to
 boot and see if one gets 'lucky'.

In case of booting a real machine in real world, hardware and firmware
usually (or often) do self-test before passing control to OS.
Some platform can boot OS with degraded configuration (for example,
fewer memory) if it has trouble on its component.  Some BIOS may
stop booting and show messages like please reseat [component] on