[PATCH] kvm-intel: Auto-load on CPUs with VMX.
Enable x86 feature-based autoloading for the kvm-intel module on CPUs with X86_FEATURE_VMX. Signed-off-by: Josh Triplett j...@joshtriplett.org --- I've tested this and it works correctly. I can also supply the obvious equivalent patch for kvm-amd on AMD CPUs with SVM, but I don't have any such CPUs to test that on. arch/x86/kvm/vmx.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3b4c8d8..bb3bcff 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -27,6 +27,7 @@ #include linux/highmem.h #include linux/sched.h #include linux/moduleparam.h +#include linux/mod_devicetable.h #include linux/ftrace_event.h #include linux/slab.h #include linux/tboot.h @@ -51,6 +52,12 @@ MODULE_AUTHOR(Qumranet); MODULE_LICENSE(GPL); +static const struct x86_cpu_id vmx_cpu_id[] = { + X86_FEATURE_MATCH(X86_FEATURE_VMX), + {} +}; +MODULE_DEVICE_TABLE(x86cpu, vmx_cpu_id); + static bool __read_mostly enable_vpid = 1; module_param_named(vpid, enable_vpid, bool, 0444); -- 1.7.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Qemu-devel] [PATCH v4 4/7] RTC: Set internal millisecond register to 500ms when reset divider
-Original Message- From: Stefano Stabellini [mailto:stefano.stabell...@eu.citrix.com] Sent: Wednesday, March 21, 2012 1:39 AM To: Zhang, Yang Z Cc: qemu-de...@nongnu.org; Paolo Bonzini; aligu...@us.ibm.com; kvm@vger.kernel.org Subject: Re: [Qemu-devel] [PATCH v4 4/7] RTC: Set internal millisecond register to 500ms when reset divider On Mon, 19 Mar 2012, Zhang, Yang Z wrote: The first update cycle begins one - half seconds later when divider reset is removing. Signed-off-by: Yang Zhang yang.z.zh...@intel.com --- hw/mc146818rtc.c | 38 +- 1 files changed, 33 insertions(+), 5 deletions(-) diff --git a/hw/mc146818rtc.c b/hw/mc146818rtc.c index 6ebb8f6..5e7fbb5 100644 --- a/hw/mc146818rtc.c +++ b/hw/mc146818rtc.c @@ -110,6 +110,8 @@ static void rtc_set_time(RTCState *s); static void rtc_calibrate_time(RTCState *s); static void rtc_set_cmos(RTCState *s); +static int32_t divider_reset; this should be part of RTCState static uint64_t get_guest_rtc_us(RTCState *s) { int64_t host_usec, offset_usec, guest_usec; @@ -220,16 +222,24 @@ static void rtc_periodic_timer(void *opaque) } } -static void rtc_set_offset(RTCState *s) +static void rtc_set_offset(RTCState *s, int32_t start_usec) I noticed that you are always passing a positive number or 0 as start_usec: it might be worth turning start_usec into a uint32_t or uint64_t to avoid integer signedness errors. Agree. Also it is not clear to me what this start_usec is supposed to be: if it is a delay to be added to the guest time, then it is best to rename it to delay_usec and add a comment on top to explain what it is for. Actually, the start_usec only used when divider is changed from reset to an valid time base. When the changing happened, the first update cycle is 500ms later, so the start_usec equals to 500ms. When pass 0 as start_usec, it means the rtc internal millisecond is not changed, it is synchronized with host's time. struct tm *tm = s-current_tm; -int64_t host_usec, guest_sec, guest_usec; +int64_t host_usec, guest_sec, guest_usec, offset_usec, old_guest_usec; host_usec = qemu_get_clock_ns(host_clock) / NS_PER_USEC; +offset_usec = s-offset_sec * USEC_PER_SEC + s-offset_usec; +old_guest_usec = (host_usec + offset_usec) % USEC_PER_SEC; guest_sec = mktimegm(tm); -guest_usec = guest_sec * USEC_PER_SEC; +/* start_usec equal 0 means rtc internal millisecond is + * same with before */ +if (start_usec == 0) { +guest_usec = guest_sec * USEC_PER_SEC + old_guest_usec; +} else { +guest_usec = guest_sec * USEC_PER_SEC + start_usec; +} This looks like a mistake to me: before guest_usec was derived exclusively from mktimegm (take or leave USEC_PER_SEC), while now guest_usec is the sum of the value returned by mktimegm and old_guest_usec, even when start_usec is 0. Are you sure that this is correct? The logic is right. Before this patch, we assume the rtc internal millisecond register is same with host time, so we don't need to consider it and using mktimeis enough. Now, the rtc internal millisecond can be changed, so I use the old_guest_usec to record the current internal millisecond. When start_usec is 0, it means we don't need to change the internal millisecond register and the offset_sec is same as before. best regards yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Qemu-devel] [PATCH v4 2/7] RTC: Update the RTC clock only when reading it
-Original Message- From: Stefano Stabellini [mailto:stefano.stabell...@eu.citrix.com] Sent: Tuesday, March 20, 2012 10:16 PM To: Zhang, Yang Z Cc: qemu-de...@nongnu.org; Paolo Bonzini; aligu...@us.ibm.com; kvm@vger.kernel.org Subject: Re: [Qemu-devel] [PATCH v4 2/7] RTC: Update the RTC clock only when reading it On Mon, 19 Mar 2012, Zhang, Yang Z wrote: There has no need to use two periodic timer to update RTC time. In this patch, we only update it when guest reading it. So the basic idea here is that we don't need to two periodic timers because we are going to calculate the RTC guest time from QEMU's host_clock. I only have a couple of observations: - shouldn't we use QEMU rtc_clock, rather than host_clock? Right. It should be rtc_clock not host_clock - it would be better to use shifts rather than divisions whenever possible, they are much cheaper; Agree. - rtc_calibrate_time seems to be the new functions that updates the guest rtc time based on QEMU host_clock. Are you sure we are calling it all the times we need to call it? Could we just call it at the beginning of cmos_ioport_write and cmos_ioport_read? No. If the RTC_B_SET is set or divider reset is held, we should not call the rtc_calibrate_time. best regards yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v4 5/7] RTC:Add RTC update-ended interrupt support
Il 20/03/2012 19:35, Stefano Stabellini ha scritto: This is the function that is used to figure out whether we need the timers or not, the condition seems to be: (Not (REG_C_UF | REG_C_AF)) And (Not (REG_B_SET)) Shouldn't actually check for UIE being enabled? No, you need to set UF in case the code observes it without actually enabling interrupt delivery on the ISA bus. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
On Wed, Mar 21, 2012 at 08:56:03AM +0800, Wen Congyang wrote: At 03/20/2012 11:45 PM, Gleb Natapov Wrote: On Tue, Mar 20, 2012 at 05:59:16PM +0800, Wen Congyang wrote: At 03/19/2012 03:33 PM, Wen Congyang Wrote: At 03/08/2012 03:57 PM, Wen Congyang Wrote: We can know the guest is paniced when the guest runs on xen. But we do not have such feature on kvm. Another purpose of this feature is: management app(for example: libvirt) can do auto dump when the guest is crashed. If management app does not do auto dump, the guest's user can do dump by hand if he sees the guest is paniced. I touch the hypervisor instead of using virtio-serial, because 1. it is simple 2. the virtio-serial is an optional device, and the guest may not have such device. Changes from v2 to v3: 1. correct spelling Changes from v1 to v2: 1. split up host and guest-side changes 2. introduce new request flag to avoid changing return values. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ Hi all: we neet this feature, but we don't decide how to implement it. We have two solution: 1. use vmcall 2. use virtio-serial. I will not change this patch set before we decide how to do it. Can we make a decision recent days? Anybody can decide which solution to use? To make an informed decision we need to have at least raw idea how virtio-serial variant will look. Hmm, I think we can do this: 1. reset the virtio-serial device or reset the port we use to notice the host that guest is panicked. 2. write some specific messages to the port So the port should have fixed name. If this port is opened by the userspace before the guest is paniced, I am not sure whether we can use it(because a port only can be opened once at the same time). Yes, IMO we should dedicate one virtio-serial port for panic notifications. Just like we dedicate one for a console. We cannot call any function in the module, so we may need to write a simple driver for virtio-serial(like diskdump's disk driver). netconsole uses standard NIC drivers in polling mode to send OOPSes over the network and it mostly works. So I think using virtio-serial driver is not out of question, but with IRQ disabled of course. I donot know how to implement it now. But I guess that it may be complicated. Look at drivers/char/ipmi/ipmi_msghandler.c. It has code to send panic event over IMPI. The code is pretty complex. Of course if we a going to implement something more complex than simple hypercall for panic notification we better do something more interesting with it than just saying panic happened, like sending stack traces on all cpus for instance. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
At 03/21/2012 05:11 PM, Gleb Natapov Wrote: On Wed, Mar 21, 2012 at 08:56:03AM +0800, Wen Congyang wrote: At 03/20/2012 11:45 PM, Gleb Natapov Wrote: On Tue, Mar 20, 2012 at 05:59:16PM +0800, Wen Congyang wrote: At 03/19/2012 03:33 PM, Wen Congyang Wrote: At 03/08/2012 03:57 PM, Wen Congyang Wrote: We can know the guest is paniced when the guest runs on xen. But we do not have such feature on kvm. Another purpose of this feature is: management app(for example: libvirt) can do auto dump when the guest is crashed. If management app does not do auto dump, the guest's user can do dump by hand if he sees the guest is paniced. I touch the hypervisor instead of using virtio-serial, because 1. it is simple 2. the virtio-serial is an optional device, and the guest may not have such device. Changes from v2 to v3: 1. correct spelling Changes from v1 to v2: 1. split up host and guest-side changes 2. introduce new request flag to avoid changing return values. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ Hi all: we neet this feature, but we don't decide how to implement it. We have two solution: 1. use vmcall 2. use virtio-serial. I will not change this patch set before we decide how to do it. Can we make a decision recent days? Anybody can decide which solution to use? To make an informed decision we need to have at least raw idea how virtio-serial variant will look. Hmm, I think we can do this: 1. reset the virtio-serial device or reset the port we use to notice the host that guest is panicked. 2. write some specific messages to the port So the port should have fixed name. If this port is opened by the userspace before the guest is paniced, I am not sure whether we can use it(because a port only can be opened once at the same time). Yes, IMO we should dedicate one virtio-serial port for panic notifications. Just like we dedicate one for a console. We cannot call any function in the module, so we may need to write a simple driver for virtio-serial(like diskdump's disk driver). netconsole uses standard NIC drivers in polling mode to send OOPSes over the network and it mostly works. So I think using virtio-serial driver is not out of question, but with IRQ disabled of course. The code for netconsole is in which file? Another question: we cannot call the function in the module directly in the kernel. I donot know how to implement it now. But I guess that it may be complicated. Look at drivers/char/ipmi/ipmi_msghandler.c. It has code to send panic event over IMPI. The code is pretty complex. Of course if we a going to implement something more complex than simple hypercall for panic notification we better do something more interesting with it than just saying panic happened, like sending stack traces on all cpus for instance. If we implement it by virtio-serial, I agree with passing more useful message to host. Thanks Wen Congyang -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
On Wed, Mar 21, 2012 at 05:35:49PM +0800, Wen Congyang wrote: At 03/21/2012 05:11 PM, Gleb Natapov Wrote: On Wed, Mar 21, 2012 at 08:56:03AM +0800, Wen Congyang wrote: At 03/20/2012 11:45 PM, Gleb Natapov Wrote: On Tue, Mar 20, 2012 at 05:59:16PM +0800, Wen Congyang wrote: At 03/19/2012 03:33 PM, Wen Congyang Wrote: At 03/08/2012 03:57 PM, Wen Congyang Wrote: We can know the guest is paniced when the guest runs on xen. But we do not have such feature on kvm. Another purpose of this feature is: management app(for example: libvirt) can do auto dump when the guest is crashed. If management app does not do auto dump, the guest's user can do dump by hand if he sees the guest is paniced. I touch the hypervisor instead of using virtio-serial, because 1. it is simple 2. the virtio-serial is an optional device, and the guest may not have such device. Changes from v2 to v3: 1. correct spelling Changes from v1 to v2: 1. split up host and guest-side changes 2. introduce new request flag to avoid changing return values. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ Hi all: we neet this feature, but we don't decide how to implement it. We have two solution: 1. use vmcall 2. use virtio-serial. I will not change this patch set before we decide how to do it. Can we make a decision recent days? Anybody can decide which solution to use? To make an informed decision we need to have at least raw idea how virtio-serial variant will look. Hmm, I think we can do this: 1. reset the virtio-serial device or reset the port we use to notice the host that guest is panicked. 2. write some specific messages to the port So the port should have fixed name. If this port is opened by the userspace before the guest is paniced, I am not sure whether we can use it(because a port only can be opened once at the same time). Yes, IMO we should dedicate one virtio-serial port for panic notifications. Just like we dedicate one for a console. We cannot call any function in the module, so we may need to write a simple driver for virtio-serial(like diskdump's disk driver). netconsole uses standard NIC drivers in polling mode to send OOPSes over the network and it mostly works. So I think using virtio-serial driver is not out of question, but with IRQ disabled of course. The code for netconsole is in which file? drivers/net/netconsole.c naturally. Another question: we cannot call the function in the module directly in the kernel. True I think. netconsole and actual NIC driver have a layer of abstraction between them. Modules can call functions from other modules. Your module can register to panic_notifier_list (like IPMI does) and call functions from virtio-serial. Or panic handling can be added directly to virtio-serial since it will have to be modified anyway. Something like netpoll API will have to be added to it. I donot know how to implement it now. But I guess that it may be complicated. Look at drivers/char/ipmi/ipmi_msghandler.c. It has code to send panic event over IMPI. The code is pretty complex. Of course if we a going to implement something more complex than simple hypercall for panic notification we better do something more interesting with it than just saying panic happened, like sending stack traces on all cpus for instance. If we implement it by virtio-serial, I agree with passing more useful message to host. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V6 1/11] x86/spinlock: replace pv spinlocks with pv ticketlocks
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk Prevent Guests from Spinning Around http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/include/asm/paravirt.h | 32 arch/x86/include/asm/paravirt_types.h | 10 ++ arch/x86/include/asm/spinlock.h | 53 ++-- arch/x86/include/asm/spinlock_types.h |4 -- arch/x86/kernel/paravirt-spinlocks.c | 15 + arch/x86/xen/spinlock.c |8 - 6 files changed, 61 insertions(+), 61 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index a7d2db9..0fa4553 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -750,36 +750,16 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, #if defined(CONFIG_SMP) defined(CONFIG_PARAVIRT_SPINLOCKS) -static inline int arch_spin_is_locked(struct arch_spinlock *lock) +static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, + __ticket_t ticket) { - return PVOP_CALL1(int, pv_lock_ops.spin_is_locked, lock); + PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket); } -static inline int arch_spin_is_contended(struct arch_spinlock *lock) +static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, + __ticket_t ticket) { - return PVOP_CALL1(int, pv_lock_ops.spin_is_contended, lock); -} -#define arch_spin_is_contended arch_spin_is_contended - -static __always_inline void arch_spin_lock(struct arch_spinlock *lock) -{ - PVOP_VCALL1(pv_lock_ops.spin_lock, lock); -} - -static __always_inline void arch_spin_lock_flags(struct arch_spinlock *lock, - unsigned long flags) -{ - PVOP_VCALL2(pv_lock_ops.spin_lock_flags, lock, flags); -} - -static __always_inline int arch_spin_trylock(struct arch_spinlock *lock) -{ - return PVOP_CALL1(int, pv_lock_ops.spin_trylock, lock); -} - -static __always_inline void arch_spin_unlock(struct arch_spinlock *lock) -{ - PVOP_VCALL1(pv_lock_ops.spin_unlock, lock); + PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); } #endif diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 8e8b9a4..005e24d 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -327,13 +327,11 @@ struct pv_mmu_ops { }; struct arch_spinlock; +#include asm/spinlock_types.h + struct pv_lock_ops { - int (*spin_is_locked)(struct arch_spinlock *lock); - int (*spin_is_contended)(struct arch_spinlock *lock); - void (*spin_lock)(struct arch_spinlock *lock); - void (*spin_lock_flags)(struct arch_spinlock *lock, unsigned long flags); - int (*spin_trylock)(struct arch_spinlock *lock); - void (*spin_unlock)(struct arch_spinlock *lock); + void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket); + void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket); }; /* This contains all the paravirt structures: we get a convenient diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index a82c2bf..7e66b85 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -37,6 +37,35 @@ # define UNLOCK_LOCK_PREFIX #endif +/* How
[PATCH RFC V6 2/11] x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com The code size expands somewhat, and its probably better to just call a function rather than inline it. Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/Kconfig |3 +++ kernel/Kconfig.locks |2 +- 2 files changed, 4 insertions(+), 1 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5bed94e..10c28ec 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -623,6 +623,9 @@ config PARAVIRT_SPINLOCKS If you are unsure how to answer this question, answer N. +config ARCH_NOINLINE_SPIN_UNLOCK + def_bool PARAVIRT_SPINLOCKS + config PARAVIRT_CLOCK bool diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index 5068e2a..584637b 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -125,7 +125,7 @@ config INLINE_SPIN_LOCK_IRQSAVE ARCH_INLINE_SPIN_LOCK_IRQSAVE config INLINE_SPIN_UNLOCK - def_bool !DEBUG_SPINLOCK (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK) + def_bool !DEBUG_SPINLOCK (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK) !ARCH_NOINLINE_SPIN_UNLOCK config INLINE_SPIN_UNLOCK_BH def_bool !DEBUG_SPINLOCK ARCH_INLINE_SPIN_UNLOCK_BH -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V6 6/11] xen/pvticketlocks: add xen_nopvspin parameter to disable xen pv ticketlocks
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/xen/spinlock.c | 14 ++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 0e4d057..5dce49d 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -223,12 +223,26 @@ void xen_uninit_lock_cpu(int cpu) unbind_from_irqhandler(per_cpu(lock_kicker_irq, cpu), NULL); } +static bool xen_pvspin __initdata = true; + void __init xen_init_spinlocks(void) { + if (!xen_pvspin) { + printk(KERN_DEBUG xen: PV spinlocks disabled\n); + return; + } + pv_lock_ops.lock_spinning = xen_lock_spinning; pv_lock_ops.unlock_kick = xen_unlock_kick; } +static __init int xen_parse_nopvspin(char *arg) +{ + xen_pvspin = false; + return 0; +} +early_param(xen_nopvspin, xen_parse_nopvspin); + #ifdef CONFIG_XEN_DEBUG_FS static struct dentry *d_spin_debug; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V6 10/11] xen/pvticketlock: allow interrupts to be enabled while blocking
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com If interrupts were enabled when taking the spinlock, we can leave them enabled while blocking to get the lock. If we can enable interrupts while waiting for the lock to become available, and we take an interrupt before entering the poll, and the handler takes a spinlock which ends up going into the slow state (invalidating the per-cpu lock and want values), then when the interrupt handler returns the event channel will remain pending so the poll will return immediately, causing it to return out to the main spinlock loop. Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/xen/spinlock.c | 46 -- 1 files changed, 40 insertions(+), 6 deletions(-) diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index d4a9ec2..4926974 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -142,7 +142,20 @@ static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want) * partially setup state. */ local_irq_save(flags); - + /* +* We don't really care if we're overwriting some other +* (lock,want) pair, as that would mean that we're currently +* in an interrupt context, and the outer context had +* interrupts enabled. That has already kicked the VCPU out +* of xen_poll_irq(), so it will just return spuriously and +* retry with newly setup (lock,want). +* +* The ordering protocol on this is that the lock pointer +* may only be set non-NULL if the want ticket is correct. +* If we're updating want, we must first clear lock. +*/ + w-lock = NULL; + smp_wmb(); w-want = want; smp_wmb(); w-lock = lock; @@ -157,24 +170,43 @@ static void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want) /* Only check lock once pending cleared */ barrier(); - /* Mark entry to slowpath before doing the pickup test to make - sure we don't deadlock with an unlocker. */ + /* +* Mark entry to slowpath before doing the pickup test to make +* sure we don't deadlock with an unlocker. +*/ __ticket_enter_slowpath(lock); - /* check again make sure it didn't become free while - we weren't looking */ + /* +* check again make sure it didn't become free while +* we weren't looking +*/ if (ACCESS_ONCE(lock-tickets.head) == want) { add_stats(TAKEN_SLOW_PICKUP, 1); goto out; } + + /* Allow interrupts while blocked */ + local_irq_restore(flags); + + /* +* If an interrupt happens here, it will leave the wakeup irq +* pending, which will cause xen_poll_irq() to return +* immediately. +*/ + /* Block until irq becomes pending (or perhaps a spurious wakeup) */ xen_poll_irq(irq); add_stats(TAKEN_SLOW_SPURIOUS, !xen_test_irq_pending(irq)); + + local_irq_save(flags); + kstat_incr_irqs_this_cpu(irq, irq_to_desc(irq)); out: cpumask_clear_cpu(cpu, waiting_cpus); w-lock = NULL; + local_irq_restore(flags); + spin_time_accum_blocked(start); } PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning); @@ -188,7 +220,9 @@ static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next) for_each_cpu(cpu, waiting_cpus) { const struct xen_lock_waiting *w = per_cpu(lock_waiting, cpu); - if (w-lock == lock w-want == next) { + /* Make sure we read lock before want */ + if (ACCESS_ONCE(w-lock) == lock + ACCESS_ONCE(w-want) == next) { add_stats(RELEASED_SLOW_KICKED, 1); xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR); break; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V6 11/11] xen: enable PV ticketlocks on HVM Xen
From: Stefano Stabellini stefano.stabell...@eu.citrix.com Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/xen/smp.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index e85d3ee..8dc2574 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -568,4 +568,5 @@ void __init xen_hvm_smp_init(void) smp_ops.cpu_die = xen_hvm_cpu_die; smp_ops.send_call_func_ipi = xen_smp_send_call_function_ipi; smp_ops.send_call_func_single_ipi = xen_smp_send_call_function_single_ipi; + xen_init_spinlocks(); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V6 3/11] x86/ticketlock: collapse a layer of functions
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Now that the paravirtualization layer doesn't exist at the spinlock level any more, we can collapse the __ticket_ functions into the arch_ functions. Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/include/asm/spinlock.h | 35 +-- 1 files changed, 5 insertions(+), 30 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 7e66b85..f6442f4 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -79,7 +79,7 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, * in the high part, because a wide xadd increment of the low part would carry * up and contaminate the high part. */ -static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock) +static __always_inline void arch_spin_lock(struct arch_spinlock *lock) { register struct __raw_tickets inc = { .tail = 1 }; @@ -99,7 +99,7 @@ static __always_inline void __ticket_spin_lock(struct arch_spinlock *lock) out: barrier(); /* make sure nothing creeps before the lock is taken */ } -static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) +static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) { arch_spinlock_t old, new; @@ -113,7 +113,7 @@ static __always_inline int __ticket_spin_trylock(arch_spinlock_t *lock) return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == old.head_tail; } -static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) +static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) { __ticket_t next = lock-tickets.head + 1; @@ -121,46 +121,21 @@ static __always_inline void __ticket_spin_unlock(arch_spinlock_t *lock) __ticket_unlock_kick(lock, next); } -static inline int __ticket_spin_is_locked(arch_spinlock_t *lock) +static inline int arch_spin_is_locked(arch_spinlock_t *lock) { struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets); return !!(tmp.tail ^ tmp.head); } -static inline int __ticket_spin_is_contended(arch_spinlock_t *lock) +static inline int arch_spin_is_contended(arch_spinlock_t *lock) { struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets); return ((tmp.tail - tmp.head) TICKET_MASK) 1; } - -static inline int arch_spin_is_locked(arch_spinlock_t *lock) -{ - return __ticket_spin_is_locked(lock); -} - -static inline int arch_spin_is_contended(arch_spinlock_t *lock) -{ - return __ticket_spin_is_contended(lock); -} #define arch_spin_is_contended arch_spin_is_contended -static __always_inline void arch_spin_lock(arch_spinlock_t *lock) -{ - __ticket_spin_lock(lock); -} - -static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) -{ - return __ticket_spin_trylock(lock); -} - -static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) -{ - __ticket_spin_unlock(lock); -} - static __always_inline void arch_spin_lock_flags(arch_spinlock_t *lock, unsigned long flags) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V6 9/11] x86/ticketlock: add slowpath logic
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Maintain a flag in the LSB of the ticket lock tail which indicates whether anyone is in the lock slowpath and may need kicking when the current holder unlocks. The flags are set when the first locker enters the slowpath, and cleared when unlocking to an empty queue (ie, no contention). In the specific implementation of lock_spinning(), make sure to set the slowpath flags on the lock just before blocking. We must do this before the last-chance pickup test to prevent a deadlock with the unlocker: UnlockerLocker test for lock pickup - fail unlock test slowpath - false set slowpath flags block Whereas this works in any ordering: UnlockerLocker set slowpath flags test for lock pickup - fail block unlock test slowpath - true, kick If the unlocker finds that the lock has the slowpath flag set but it is actually uncontended (ie, head == tail, so nobody is waiting), then it clears the slowpath flag. The unlock code uses a locked add to update the head counter. This also acts as a full memory barrier so that its safe to subsequently read back the slowflag state, knowing that the updated lock is visible to the other CPUs. If it were an unlocked add, then the flag read may just be forwarded from the store buffer before it was visible to the other CPUs, which could result in a deadlock. Unfortunately this means we need to do a locked instruction when unlocking with PV ticketlocks. However, if PV ticketlocks are not enabled, then the old non-locked add is the only unlocking code. Note: this code relies on gcc making sure that unlikely() code is out of line of the fastpath, which only happens when OPTIMIZE_SIZE=n. If it doesn't the generated code isn't too bad, but its definitely suboptimal. Thanks to Srivatsa Vaddagiri for providing a bugfix to the original version of this change, which has been folded in. Thanks to Stephan Diestelhorst for commenting on some code which relied on an inaccurate reading of the x86 memory ordering rules. Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Stephan Diestelhorst stephan.diestelho...@amd.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/include/asm/paravirt.h |2 +- arch/x86/include/asm/spinlock.h | 86 +++- arch/x86/include/asm/spinlock_types.h |2 + arch/x86/kernel/paravirt-spinlocks.c |3 + arch/x86/xen/spinlock.c |6 ++ 5 files changed, 74 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 4343419..f2f1700 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -756,7 +756,7 @@ static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket); } -static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, +static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket) { PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index c14263a..e98cfa8 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -1,11 +1,14 @@ #ifndef _ASM_X86_SPINLOCK_H #define _ASM_X86_SPINLOCK_H +#include linux/jump_label.h #include linux/atomic.h #include asm/page.h #include asm/processor.h #include linux/compiler.h #include asm/paravirt.h +#include asm/bitops.h + /* * Your basic SMP spinlocks, allowing only a single CPU anywhere * @@ -40,32 +43,28 @@ /* How long a lock should spin before we consider blocking */ #define SPIN_THRESHOLD (1 11) -#ifndef CONFIG_PARAVIRT_SPINLOCKS +extern struct jump_label_key paravirt_ticketlocks_enabled; +static __always_inline bool static_branch(struct jump_label_key *key); -static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, - __ticket_t ticket) +#ifdef CONFIG_PARAVIRT_SPINLOCKS + +static inline void __ticket_enter_slowpath(arch_spinlock_t *lock) { + set_bit(0, (volatile unsigned long *)lock-tickets.tail); } -static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, -__ticket_t ticket) +#else /* !CONFIG_PARAVIRT_SPINLOCKS */ +static __always_inline
[PATCH RFC V6 7/11] x86/pvticketlock: use callee-save for lock_spinning
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Although the lock_spinning calls in the spinlock code are on the uncommon path, their presence can cause the compiler to generate many more register save/restores in the function pre/postamble, which is in the fast path. To avoid this, convert it to using the pvops callee-save calling convention, which defers all the save/restores until the actual function is called, keeping the fastpath clean. Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/include/asm/paravirt.h |2 +- arch/x86/include/asm/paravirt_types.h |2 +- arch/x86/kernel/paravirt-spinlocks.c |2 +- arch/x86/xen/spinlock.c |3 ++- 4 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 0fa4553..4343419 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -753,7 +753,7 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, __ticket_t ticket) { - PVOP_VCALL2(pv_lock_ops.lock_spinning, lock, ticket); + PVOP_VCALLEE2(pv_lock_ops.lock_spinning, lock, ticket); } static __always_inline void ticket_unlock_kick(struct arch_spinlock *lock, diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 005e24d..5e0c138 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -330,7 +330,7 @@ struct arch_spinlock; #include asm/spinlock_types.h struct pv_lock_ops { - void (*lock_spinning)(struct arch_spinlock *lock, __ticket_t ticket); + struct paravirt_callee_save lock_spinning; void (*unlock_kick)(struct arch_spinlock *lock, __ticket_t ticket); }; diff --git a/arch/x86/kernel/paravirt-spinlocks.c b/arch/x86/kernel/paravirt-spinlocks.c index c2e010e..4251c1d 100644 --- a/arch/x86/kernel/paravirt-spinlocks.c +++ b/arch/x86/kernel/paravirt-spinlocks.c @@ -9,7 +9,7 @@ struct pv_lock_ops pv_lock_ops = { #ifdef CONFIG_SMP - .lock_spinning = paravirt_nop, + .lock_spinning = __PV_IS_CALLEE_SAVE(paravirt_nop), .unlock_kick = paravirt_nop, #endif }; diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 5dce49d..176a554 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -173,6 +173,7 @@ out: local_irq_restore(flags); spin_time_accum_blocked(start); } +PV_CALLEE_SAVE_REGS_THUNK(xen_lock_spinning); static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next) { @@ -232,7 +233,7 @@ void __init xen_init_spinlocks(void) return; } - pv_lock_ops.lock_spinning = xen_lock_spinning; + pv_lock_ops.lock_spinning = PV_CALLEE_SAVE(xen_lock_spinning); pv_lock_ops.unlock_kick = xen_unlock_kick; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V6 8/11] x86/pvticketlock: when paravirtualizing ticket locks, increment by 2
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Increment ticket head/tails by 2 rather than 1 to leave the LSB free to store a is in slowpath state bit. This halves the number of possible CPUs for a given ticket size, but this shouldn't matter in practice - kernels built for 32k+ CPU systems are probably specially built for the hardware rather than a generic distro kernel. Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/include/asm/spinlock.h | 10 +- arch/x86/include/asm/spinlock_types.h | 10 +- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index f6442f4..c14263a 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -81,7 +81,7 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, */ static __always_inline void arch_spin_lock(struct arch_spinlock *lock) { - register struct __raw_tickets inc = { .tail = 1 }; + register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC }; inc = xadd(lock-tickets, inc); @@ -107,7 +107,7 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) if (old.tickets.head != old.tickets.tail) return 0; - new.head_tail = old.head_tail + (1 TICKET_SHIFT); + new.head_tail = old.head_tail + (TICKET_LOCK_INC TICKET_SHIFT); /* cmpxchg is a full barrier, so nothing can move before it */ return cmpxchg(lock-head_tail, old.head_tail, new.head_tail) == old.head_tail; @@ -115,9 +115,9 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) { - __ticket_t next = lock-tickets.head + 1; + __ticket_t next = lock-tickets.head + TICKET_LOCK_INC; - __add(lock-tickets.head, 1, UNLOCK_LOCK_PREFIX); + __add(lock-tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX); __ticket_unlock_kick(lock, next); } @@ -132,7 +132,7 @@ static inline int arch_spin_is_contended(arch_spinlock_t *lock) { struct __raw_tickets tmp = ACCESS_ONCE(lock-tickets); - return ((tmp.tail - tmp.head) TICKET_MASK) 1; + return ((tmp.tail - tmp.head) TICKET_MASK) TICKET_LOCK_INC; } #define arch_spin_is_contended arch_spin_is_contended diff --git a/arch/x86/include/asm/spinlock_types.h b/arch/x86/include/asm/spinlock_types.h index dbe223d..aa9a205 100644 --- a/arch/x86/include/asm/spinlock_types.h +++ b/arch/x86/include/asm/spinlock_types.h @@ -3,7 +3,13 @@ #include linux/types.h -#if (CONFIG_NR_CPUS 256) +#ifdef CONFIG_PARAVIRT_SPINLOCKS +#define __TICKET_LOCK_INC 2 +#else +#define __TICKET_LOCK_INC 1 +#endif + +#if (CONFIG_NR_CPUS (256 / __TICKET_LOCK_INC)) typedef u8 __ticket_t; typedef u16 __ticketpair_t; #else @@ -11,6 +17,8 @@ typedef u16 __ticket_t; typedef u32 __ticketpair_t; #endif +#define TICKET_LOCK_INC((__ticket_t)__TICKET_LOCK_INC) + #define TICKET_SHIFT (sizeof(__ticket_t) * 8) #define TICKET_MASK((__ticket_t)((1 TICKET_SHIFT) - 1)) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to add feature policy='disable' name='hypervisor'/ to cpu definition in XML and check command line. ok I try this but I can't use cpu model to map the host cpu (my libvirt is 0.9.8) so I use : cpu match='exact' modelOpteron_G3/model feature policy='disable' name='hypervisor'/ /cpu (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cpu.txt.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. David. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC V6 5/11] xen/pvticketlock: Xen implementation for PV ticket locks
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Replace the old Xen implementation of PV spinlocks with and implementation of xen_lock_spinning and xen_unlock_kick. xen_lock_spinning simply registers the cpu in its entry in lock_waiting, adds itself to the waiting_cpus set, and blocks on an event channel until the channel becomes pending. xen_unlock_kick searches the cpus in waiting_cpus looking for the one which next wants this lock with the next ticket, if any. If found, it kicks it by making its event channel pending, which wakes it up. We need to make sure interrupts are disabled while we're relying on the contents of the per-cpu lock_waiting values, otherwise an interrupt handler could come in, try to take some other lock, block, and overwrite our values. Raghu: use function + enum instead of macro, cmpxchg for zero status reset Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- arch/x86/xen/spinlock.c | 344 +++ 1 files changed, 77 insertions(+), 267 deletions(-) diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 1258514..0e4d057 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -16,45 +16,46 @@ #include xen-ops.h #include debugfs.h +enum xen_contention_stat { + TAKEN_SLOW, + TAKEN_SLOW_PICKUP, + TAKEN_SLOW_SPURIOUS, + RELEASED_SLOW, + RELEASED_SLOW_KICKED, + NR_CONTENTION_STATS +}; + + #ifdef CONFIG_XEN_DEBUG_FS static struct xen_spinlock_stats { - u64 taken; - u32 taken_slow; - u32 taken_slow_nested; - u32 taken_slow_pickup; - u32 taken_slow_spurious; - u32 taken_slow_irqenable; - - u64 released; - u32 released_slow; - u32 released_slow_kicked; + u32 contention_stats[NR_CONTENTION_STATS]; #define HISTO_BUCKETS 30 - u32 histo_spin_total[HISTO_BUCKETS+1]; - u32 histo_spin_spinning[HISTO_BUCKETS+1]; u32 histo_spin_blocked[HISTO_BUCKETS+1]; - u64 time_total; - u64 time_spinning; u64 time_blocked; } spinlock_stats; static u8 zero_stats; -static unsigned lock_timeout = 1 10; -#define TIMEOUT lock_timeout - static inline void check_zero(void) { - if (unlikely(zero_stats)) { - memset(spinlock_stats, 0, sizeof(spinlock_stats)); - zero_stats = 0; + u8 ret; + u8 old = ACCESS_ONCE(zero_stats); + if (unlikely(old)) { + ret = cmpxchg(zero_stats, old, 0); + /* This ensures only one fellow resets the stat */ + if (ret == old) + memset(spinlock_stats, 0, sizeof(spinlock_stats)); } } -#define ADD_STATS(elem, val) \ - do { check_zero(); spinlock_stats.elem += (val); } while(0) +static inline void add_stats(enum xen_contention_stat var, u32 val) +{ + check_zero(); + spinlock_stats.contention_stats[var] += val; +} static inline u64 spin_time_start(void) { @@ -73,22 +74,6 @@ static void __spin_time_accum(u64 delta, u32 *array) array[HISTO_BUCKETS]++; } -static inline void spin_time_accum_spinning(u64 start) -{ - u32 delta = xen_clocksource_read() - start; - - __spin_time_accum(delta, spinlock_stats.histo_spin_spinning); - spinlock_stats.time_spinning += delta; -} - -static inline void spin_time_accum_total(u64 start) -{ - u32 delta = xen_clocksource_read() - start; - - __spin_time_accum(delta, spinlock_stats.histo_spin_total); - spinlock_stats.time_total += delta; -} - static inline void spin_time_accum_blocked(u64 start) { u32 delta = xen_clocksource_read() - start; @@ -98,19 +83,15 @@ static inline void spin_time_accum_blocked(u64 start) } #else /* !CONFIG_XEN_DEBUG_FS */ #define TIMEOUT(1 10) -#define ADD_STATS(elem, val) do { (void)(val); } while(0) +static inline void add_stats(enum xen_contention_stat var, u32 val) +{ +} static inline u64 spin_time_start(void) { return 0; } -static inline void spin_time_accum_total(u64 start) -{ -} -static inline void spin_time_accum_spinning(u64 start) -{ -} static inline void spin_time_accum_blocked(u64 start) { } @@ -133,230 +114,83 @@ typedef u16 xen_spinners_t; asm(LOCK_PREFIX decw %0 : +m ((xl)-spinners) : : memory); #endif -struct xen_spinlock { - unsigned char lock; /* 0 - free; 1 - locked */ - xen_spinners_t spinners;/* count of waiting cpus */ +struct xen_lock_waiting { + struct arch_spinlock *lock; + __ticket_t want; }; static DEFINE_PER_CPU(int, lock_kicker_irq) = -1; +static DEFINE_PER_CPU(struct xen_lock_waiting, lock_waiting); +static cpumask_t waiting_cpus; -#if 0 -static int xen_spin_is_locked(struct arch_spinlock *lock) +static void
[PATCH RFC V6 0/11] Paravirtualized ticketlocks
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Changes since last posting: (Raghavendra K T) [ - Rebased to linux-3.3-rc6. - used function+enum in place of macro (better type checking) - use cmpxchg while resetting zero status for possible race [suggested by Dave Hansen for KVM patches ] ] This series replaces the existing paravirtualized spinlock mechanism with a paravirtualized ticketlock mechanism. Ticket locks have an inherent problem in a virtualized case, because the vCPUs are scheduled rather than running concurrently (ignoring gang scheduled vCPUs). This can result in catastrophic performance collapses when the vCPU scheduler doesn't schedule the correct next vCPU, and ends up scheduling a vCPU which burns its entire timeslice spinning. (Note that this is not the same problem as lock-holder preemption, which this series also addresses; that's also a problem, but not catastrophic). (See Thomas Friebel's talk Prevent Guests from Spinning Around http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) Currently we deal with this by having PV spinlocks, which adds a layer of indirection in front of all the spinlock functions, and defining a completely new implementation for Xen (and for other pvops users, but there are none at present). PV ticketlocks keeps the existing ticketlock implemenentation (fastpath) as-is, but adds a couple of pvops for the slow paths: - If a CPU has been waiting for a spinlock for SPIN_THRESHOLD iterations, then call out to the __ticket_lock_spinning() pvop, which allows a backend to block the vCPU rather than spinning. This pvop can set the lock into slowpath state. - When releasing a lock, if it is in slowpath state, the call __ticket_unlock_kick() to kick the next vCPU in line awake. If the lock is no longer in contention, it also clears the slowpath flag. The slowpath state is stored in the LSB of the within the lock tail ticket. This has the effect of reducing the max number of CPUs by half (so, a small ticket can deal with 128 CPUs, and large ticket 32768). This series provides a Xen implementation, but it should be straightforward to add a KVM implementation as well. Overall, it results in a large reduction in code, it makes the native and virtualized cases closer, and it removes a layer of indirection around all the spinlock functions. The fast path (taking an uncontended lock which isn't in slowpath state) is optimal, identical to the non-paravirtualized case. The inner part of ticket lock code becomes: inc = xadd(lock-tickets, inc); inc.tail = ~TICKET_SLOWPATH_FLAG; if (likely(inc.head == inc.tail)) goto out; for (;;) { unsigned count = SPIN_THRESHOLD; do { if (ACCESS_ONCE(lock-tickets.head) == inc.tail) goto out; cpu_relax(); } while (--count); __ticket_lock_spinning(lock, inc.tail); } out:barrier(); which results in: push %rbp mov%rsp,%rbp mov$0x200,%eax lock xadd %ax,(%rdi) movzbl %ah,%edx cmp%al,%dl jne1f # Slowpath if lock in contention pop%rbp retq ### SLOWPATH START 1: and$-2,%edx movzbl %dl,%esi 2: mov$0x800,%eax jmp4f 3: pause sub$0x1,%eax je 5f 4: movzbl (%rdi),%ecx cmp%cl,%dl jne3b pop%rbp retq 5: callq *__ticket_lock_spinning jmp2b ### SLOWPATH END with CONFIG_PARAVIRT_SPINLOCKS=n, the code has changed slightly, where the fastpath case is straight through (taking the lock without contention), and the spin loop is out of line: push %rbp mov%rsp,%rbp mov$0x100,%eax lock xadd %ax,(%rdi) movzbl %ah,%edx cmp%al,%dl jne1f pop%rbp retq ### SLOWPATH START 1: pause movzbl (%rdi),%eax cmp%dl,%al jne1b pop%rbp retq ### SLOWPATH END The unlock code is complicated by the need to both add to the lock's head and fetch the slowpath flag from tail. This version of the patch uses a locked add to do this, followed by a test to see if the slowflag is set. The lock prefix acts as a full memory barrier, so we can be sure that other CPUs will have seen the unlock before we read the flag (without the barrier the read could be fetched from the store queue before it hits memory, which could result in a deadlock). This is is all unnecessary complication if you're not using PV ticket locks, it also uses the jump-label machinery to use the standard add-based unlock in the non-PV case. if (TICKET_SLOWPATH_FLAG
[PATCH RFC V6 4/11] xen: defer spinlock setup until boot CPU setup
From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com There's no need to do it at very early init, and doing it there makes it impossible to use the jump_label machinery. Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com --- arch/x86/xen/smp.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index 501d4e0..e85d3ee 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -201,6 +201,7 @@ static void __init xen_smp_prepare_boot_cpu(void) xen_filter_cpu_maps(); xen_setup_vcpu_info_placement(); + xen_init_spinlocks(); } static void __init xen_smp_prepare_cpus(unsigned int max_cpus) @@ -530,7 +531,6 @@ void __init xen_smp_init(void) { smp_ops = xen_smp_ops; xen_fill_possible_map(); - xen_init_spinlocks(); } static void __init xen_hvm_smp_prepare_cpus(unsigned int max_cpus) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH uq/master] kvm: Drop unused kvm_pit_in_kernel
This is now implied by kvm_irqchip_in_kernel. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c |6 -- kvm-stub.c |6 -- kvm.h |2 -- 3 files changed, 0 insertions(+), 14 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 21c7dd2..7f8c188 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -75,7 +75,6 @@ struct KVMState #ifdef KVM_CAP_SET_GUEST_DEBUG struct kvm_sw_breakpoint_head kvm_sw_breakpoints; #endif -int pit_in_kernel; int pit_state2; int xsave, xcrs; int many_ioeventfds; @@ -195,11 +194,6 @@ static void kvm_reset_vcpu(void *opaque) kvm_arch_reset_vcpu(env); } -int kvm_pit_in_kernel(void) -{ -return kvm_state-pit_in_kernel; -} - int kvm_init_vcpu(CPUState *env) { KVMState *s = kvm_state; diff --git a/kvm-stub.c b/kvm-stub.c index 1f1c686..fab3ab1 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -16,12 +16,6 @@ #include gdbstub.h #include kvm.h -int kvm_pit_in_kernel(void) -{ -return 0; -} - - int kvm_init_vcpu(CPUState *env) { return -ENOSYS; diff --git a/kvm.h b/kvm.h index 8ef4476..128cc8f 100644 --- a/kvm.h +++ b/kvm.h @@ -83,8 +83,6 @@ int kvm_update_guest_debug(CPUState *env, unsigned long reinject_trap); int kvm_set_signal_mask(CPUState *env, const sigset_t *sigset); #endif -int kvm_pit_in_kernel(void); - int kvm_on_sigbus_vcpu(CPUState *env, int code, void *addr); int kvm_on_sigbus(int code, void *addr); -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V6 1/11] x86/spinlock: replace pv spinlocks with pv ticketlocks
On 21/03/12 10:20, Raghavendra K T wrote: From: Jeremy Fitzhardingejeremy.fitzhardi...@citrix.com Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk Prevent Guests from Spinning Around http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. I've made some real world benchmarks based on this serie of patches applied on top of a vanilla Linux-3.3-rc6 (commit 4704fe65e55fb088fbcb1dc0b15ff7cc8bff3685), with both CONFIG_PARAVIRT_SPINLOCK=y and n, which means essentially 4 versions compared: * vanilla - CONFIG_PARAVIRT_SPINLOCK - patch * vanilla + CONFIG_PARAVIRT_SPINLOCK - patch * vanilla - CONFIG_PARAVIRT_SPINLOCK + patch * vanilla + CONFIG_PARAVIRT_SPINLOCK + patch (you can check out the monolithic kernel configuration I used, and verify the sole difference, here): http://xenbits.xen.org/people/attilio/jeremy-spinlock/kernel-configs/ Tests, information and results are summarized below. == System used information: * Machine is a XEON x3450, 2.6GHz, 8-ways system: http://xenbits.xen.org/people/attilio/jeremy-spinlock/dmesg * System version, a Debian Squeeze 6.0.4: http://xenbits.xen.org/people/attilio/jeremy-spinlock/debian-version * gcc version, 4.4.5: http://xenbits.xen.org/people/attilio/jeremy-spinlock/gcc-version == Tests performed * pgbench based on PostgreSQL 9.2 (development version) as it has a lot of scalability improvements in it: http://www.postgresql.org/docs/devel/static/install-getsource.html I used a stock installation, with only this simple configuration change: http://xenbits.xen.org/people/attilio/jeremy-spinlock/postsgresql.conf.patch For collecting data I used this simple scripts, which runs the test 10 times every time with a different set of threads (from 1 to 64). Please note that the first 8 runs cache all the data in memory in order to avoid subsequent I/O, thus they are discarded in sampling and calculation: http://xenbits.xen.org/people/attilio/jeremy-spinlock/pgbench_script Here is the crude data (please remind this is tps, thus the higher the better): http://xenbits.xen.org/people/attilio/jeremy-spinlock/pgbench-crude-datas/ And here are data chartered with ministat tool, comparing all the 4 kernel configuration for every thread configuration: http://xenbits.xen.org/people/attilio/jeremy-spinlock/pgbench-9.2-total.bench As you can see, the patch doesn't really show a statistically meaningful difference for this workload, excluding the single-thread run for the patched + CONFIG_PARAVIRT_SPINLOCK=y case, which seems 5% faster. * pbzip2, which is a parallel version of bzip2, supposed to reproduce a CPU-intensive, multithreaded, application. The file choosen for compression is 1GB sized, got from /dev/urandom (this is not published but I may have it, so if you need it for more tests please just ask), and all the I/O is done on a tmpfs volume in order to avoid I/O floaty effects. For collecting data I used this simple scripts, which runs the test 10 times every time with a different set of threads (from 1 to 64): http://xenbits.xen.org/people/attilio/jeremy-spinlock/pbzip2bench_script Here is the crude data (please remind this is time(1) output, thus the lower the better): http://xenbits.xen.org/people/attilio/jeremy-spinlock/pbzip2-crude-datas/ And here are data chartered with ministat tool, comparing all the 4 kernel configuration for every thread configuration: http://xenbits.xen.org/people/attilio/jeremy-spinlock/pbzip2-1.1.1-total.bench As you can see, the patch doesn't really show a statistically meaningful difference for this workload. * kernbench-0.50 run, doing I/O on a 10GB tmpfs volume (thus no actual I/O involved), with the following invokation: ./kernbench -n10 -s -c16 -M -f
Re: [PATCH 1/1] Fixed integer constant is too large for ‘long’ type warning
On 03/20/2012 01:49 PM, Sasikantha babu wrote: --- arch/x86/kvm/pmu.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) Missing signoff. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PATCH: nVMX: Better MSR_IA32_FEATURE_CONTROL handling
On 03/19/2012 06:53 PM, Nadav Har'El wrote: Hi, in a minute I'll send a new version of the MSR_IA32_FEATURE_CONTROL patch for nested VMX; I just wanted to reply first to your comments so you'll know what to expect: On Wed, Mar 07, 2012, Avi Kivity wrote about Re: PATCH: nVMX: Better MSR_IA32_FEATURE_CONTROL handling: On 03/07/2012 05:58 PM, Nadav Har'El wrote: + u64 msr_ia32_feature_control; }; Need to add to the list of exported MSRs so it can be live migrated (msrs_to_save). Did this. The variable itself should live in vcpu-arch, even if some bits are vendor specific. But not this. I understand what you explained about vmx.c being for Intel *hosts*, not about emulating Intel *guests*, but I do think that since none of the bits in this MSR are relevant on AMD hosts (which don't do nested VMX), it isn't useful to support this MSR outside vmx.c. So I left this variable it in vmx-nested. As I noted earlier, svm.c did exactly the same thing (nested.vm_cr_msr), so at least there's symmetry here. Okay. @@ -1999,7 +2000,7 @@ static int vmx_get_vmx_msr(struct kvm_vc switch (msr_index) { case MSR_IA32_FEATURE_CONTROL: - *pdata = 0; + *pdata = to_vmx(vcpu)-nested.msr_ia32_feature_control; break; In a separate patch, please move this outside vmx_get_vmx_msr(). It's not a vmx msr. Done, but not split into two patches: The patch removes the old case in vmx_get_vmx_msr() (and also removes vmx_set_vmx_msr() entirely) and instead adds the case in vmx_get_msr() and vmx_set_msr(). +#define VMXON_NEEDED_FEATURES \ + (FEATURE_CONTROL_LOCKED | FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX) Use const u64 instead of #define please, it jars my eyes. I would, if Linux coding style allowed to declare variables in the middle of blocks. Unfortunately it doesn't, so I left this #define. I don't think it's that bad. Move it to the top of the file, or as a variable at the top of the function please. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] qemu-kvm: Prepare kvm PIT for upstream merge
On 03/19/2012 05:29 PM, Jan Kiszka wrote: Some preparation patches to arrange qemu-kvm for merging in latest qemu with its own in-kernel PIT support. Later on, we can switch to that version without losing features on the way, even just temporarily. Thanks, applied. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pci-assign: Fall back to host-side MSI if INTx sharing fails
On 03/19/2012 11:56 AM, Jan Kiszka wrote: If the host or the device does not support INTx sharing, retry the IRQ assignment with host-side MSI support enabled but warn about potential consequences. This allows to preserve the previous behavior where we defaulted to MSI and did not support INTx sharing at all. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Drop installation of self-built optionroms
On 03/19/2012 06:58 PM, Jan Kiszka wrote: All corresponding binaries are now in pc-bios, so we can remove this diff to upstream. Thanks, applied. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fixed integer constant is too large for 'long' type warning
Signed-off-by: Sasikantha babu sasikanth@gmail.com --- arch/x86/kvm/pmu.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index 7aad544..9395230 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -365,7 +365,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data) case MSR_CORE_PERF_FIXED_CTR_CTRL: if (pmu-fixed_ctr_ctrl == data) return 0; - if (!(data 0xf444)) { + if (!(data 0xf444ull)) { reprogram_fixed_counters(pmu, data); return 0; } -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V6 1/11] x86/spinlock: replace pv spinlocks with pv ticketlocks
On Wednesday 21 March 2012, 13:04:25 Attilio Rao wrote: On 21/03/12 10:20, Raghavendra K T wrote: From: Jeremy Fitzhardingejeremy.fitzhardi...@citrix.com Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk Prevent Guests from Spinning Around http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. I've made some real world benchmarks based on this serie of patches applied on top of a vanilla Linux-3.3-rc6 (commit 4704fe65e55fb088fbcb1dc0b15ff7cc8bff3685), with both CONFIG_PARAVIRT_SPINLOCK=y and n, which means essentially 4 versions compared: * vanilla - CONFIG_PARAVIRT_SPINLOCK - patch * vanilla + CONFIG_PARAVIRT_SPINLOCK - patch * vanilla - CONFIG_PARAVIRT_SPINLOCK + patch * vanilla + CONFIG_PARAVIRT_SPINLOCK + patch [...] == Results This test points in the direction that Jeremy's rebased patches don't introduce a peformance penalty at all, but also that we could likely consider CONFIG_PARAVIRT_SPINLOCK option removal, or turn it on by default and suggest disabling just on very old CPUs (assuming a performance regression can be proven there). Very interesting results, in particular knowing that in the one guest case things do not get (significantly) slower due to the added logic and LOCKed RMW in the unlock path. AFAICR, the problem really became apparent when running multiple guests time sharing the physical CPUs, i.e., two guests with eight vCPUs each on an eight core machine. Did you look at this setup with your tests? Thanks, Stephan -- Stephan Diestelhorst, AMD Operating System Research Center stephan.diestelho...@amd.com, Tel. +49 (0)351 448 356 719 (AMD: 29-719) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: User question: balloon memory, currentMemory and Linux/FreeBSD guests
On 03/14/2012 01:51 PM, Andy Smith wrote: Hi, I'm a KVM user, using it for the first time and so far very happy with it. I wonder if someone can help me get my head round the issue of balloon memory and overallocation. Basically I have a Debian wheezy/sid KVM server, on which I have a couple of Linux guests and several FreeBSD guests and I see different behaviour between the two guest types. On FreeBSD I have installed the virtio and balloon drivers (disk and network working great!), there is definately a balloon process running. The behaviour I see is that both Linux and FreeBSD guests always show the currentMemory number when quiered with qemu-monitor-command --hmp ClientName --cmd 'info balloon'. But from a top in the debian KVM host the RES size of the Linux guest processes varies (presumably depending on the demand of the guest OS) but the FreeBSD kvm processes always sit at exactly the currentMemory size. Basically I'm interested to know what is going on, whether what I see is normal and whether balloon currently dynamically manages memory usage or if its still the case that it requires administrator intervention to re-release memory after the OS has previously been allocated it. The RES column indicates how much memory guests are actively using. Linux touches memory when it first uses it, so if your guests are idle, RES will remain low. It seems FreeBSD touches memory immediately (or perhaps those guests are not idle), so it's high. RES (or rather, RSS - resident set size) can decrease when the host swaps guest pages, or if the guest gives up memory via to ballooning. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH uq/master] kvm: Drop unused kvm_pit_in_kernel
On 03/21/2012 02:36 PM, Jan Kiszka wrote: This is now implied by kvm_irqchip_in_kernel. So we can't have -no-kvm-pit? No huge loss, but unexpected. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fixed integer constant is too large for 'long' type warning
On 03/21/2012 03:19 PM, Sasikantha babu wrote: Signed-off-by: Sasikantha babu sasikanth@gmail.com Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH uq/master] kvm: Drop unused kvm_pit_in_kernel
On 2012-03-21 14:36, Avi Kivity wrote: On 03/21/2012 02:36 PM, Jan Kiszka wrote: This is now implied by kvm_irqchip_in_kernel. So we can't have -no-kvm-pit? No huge loss, but unexpected. See e81dda195556e72f8cd294998296c1051aab30a8. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can VMX provide real mode support?
On 2012-03-21 13:38, GaoYi wrote: Hi Jan, Since the newest Intel-VT supports the guest OS under the real mode, which was already supported in AMD-V, can the VMX in the latest KVM support that case? Yes, both with our without that unrestricted guest support (as Intel called it), real mode will generally work. Without that CPU feature, I think to recall that there were some limitations for big real mode, not sure. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-intel: Auto-load on CPUs with VMX.
On 03/21/2012 08:33 AM, Josh Triplett wrote: Enable x86 feature-based autoloading for the kvm-intel module on CPUs with X86_FEATURE_VMX. Signed-off-by: Josh Triplett j...@joshtriplett.org --- I've tested this and it works correctly. I can also supply the obvious equivalent patch for kvm-amd on AMD CPUs with SVM, but I don't have any such CPUs to test that on. On the one hand this makes sense and is consistent with how other modules are loaded. On the other hand this will waste memory for non-virt users running on distro kernels (but distro kernels can override this new behaviour anyway). I'm inclined to apply, but will wait for more comments. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH uq/master] kvm: Drop unused kvm_pit_in_kernel
On Wed, Mar 21, 2012 at 02:39:47PM +0100, Jan Kiszka wrote: On 2012-03-21 14:36, Avi Kivity wrote: On 03/21/2012 02:36 PM, Jan Kiszka wrote: This is now implied by kvm_irqchip_in_kernel. So we can't have -no-kvm-pit? No huge loss, but unexpected. See e81dda195556e72f8cd294998296c1051aab30a8. I am curious what is the reason for upstream to not supporting disabling the in-kernel PIT separately? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can VMX provide real mode support?
On 03/21/2012 03:40 PM, Jan Kiszka wrote: On 2012-03-21 13:38, GaoYi wrote: Hi Jan, Since the newest Intel-VT supports the guest OS under the real mode, which was already supported in AMD-V, can the VMX in the latest KVM support that case? Yes, both with our without that unrestricted guest support (as Intel called it), real mode will generally work. Without that CPU feature, I think to recall that there were some limitations for big real mode, not sure. Yes, big real mode will not work without unrestricted guest. There was some work to emulate it (module option emulate_invalid_guest_state), but it is not complete. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH uq/master] kvm: Drop unused kvm_pit_in_kernel
On 2012-03-21 14:41, Gleb Natapov wrote: On Wed, Mar 21, 2012 at 02:39:47PM +0100, Jan Kiszka wrote: On 2012-03-21 14:36, Avi Kivity wrote: On 03/21/2012 02:36 PM, Jan Kiszka wrote: This is now implied by kvm_irqchip_in_kernel. So we can't have -no-kvm-pit? No huge loss, but unexpected. See e81dda195556e72f8cd294998296c1051aab30a8. I am curious what is the reason for upstream to not supporting disabling the in-kernel PIT separately? It was considered no longer relevant: http://thread.gmane.org/gmane.comp.emulators.kvm.devel/85393 Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V6 1/11] x86/spinlock: replace pv spinlocks with pv ticketlocks
On 21/03/12 13:22, Stephan Diestelhorst wrote: On Wednesday 21 March 2012, 13:04:25 Attilio Rao wrote: On 21/03/12 10:20, Raghavendra K T wrote: From: Jeremy Fitzhardingejeremy.fitzhardi...@citrix.com Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk Prevent Guests from Spinning Around http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. I've made some real world benchmarks based on this serie of patches applied on top of a vanilla Linux-3.3-rc6 (commit 4704fe65e55fb088fbcb1dc0b15ff7cc8bff3685), with both CONFIG_PARAVIRT_SPINLOCK=y and n, which means essentially 4 versions compared: * vanilla - CONFIG_PARAVIRT_SPINLOCK - patch * vanilla + CONFIG_PARAVIRT_SPINLOCK - patch * vanilla - CONFIG_PARAVIRT_SPINLOCK + patch * vanilla + CONFIG_PARAVIRT_SPINLOCK + patch [...] == Results This test points in the direction that Jeremy's rebased patches don't introduce a peformance penalty at all, but also that we could likely consider CONFIG_PARAVIRT_SPINLOCK option removal, or turn it on by default and suggest disabling just on very old CPUs (assuming a performance regression can be proven there). Very interesting results, in particular knowing that in the one guest case things do not get (significantly) slower due to the added logic and LOCKed RMW in the unlock path. AFAICR, the problem really became apparent when running multiple guests time sharing the physical CPUs, i.e., two guests with eight vCPUs each on an eight core machine. Did you look at this setup with your tests? Please note that my tests are made on native Linux, without XEN involvement. You maybe meant that the spinlock paravirtualization became generally useful in the case you mentioned? (2 guests, 8vpcu + 8vcpu)? Attilio -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V6 1/11] x86/spinlock: replace pv spinlocks with pv ticketlocks
On Wednesday 21 March 2012, 13:49:28 Attilio Rao wrote: On 21/03/12 13:22, Stephan Diestelhorst wrote: On Wednesday 21 March 2012, 13:04:25 Attilio Rao wrote: On 21/03/12 10:20, Raghavendra K T wrote: From: Jeremy Fitzhardingejeremy.fitzhardi...@citrix.com Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk Prevent Guests from Spinning Around http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. I've made some real world benchmarks based on this serie of patches applied on top of a vanilla Linux-3.3-rc6 (commit 4704fe65e55fb088fbcb1dc0b15ff7cc8bff3685), with both CONFIG_PARAVIRT_SPINLOCK=y and n, which means essentially 4 versions compared: * vanilla - CONFIG_PARAVIRT_SPINLOCK - patch * vanilla + CONFIG_PARAVIRT_SPINLOCK - patch * vanilla - CONFIG_PARAVIRT_SPINLOCK + patch * vanilla + CONFIG_PARAVIRT_SPINLOCK + patch [...] == Results This test points in the direction that Jeremy's rebased patches don't introduce a peformance penalty at all, but also that we could likely consider CONFIG_PARAVIRT_SPINLOCK option removal, or turn it on by default and suggest disabling just on very old CPUs (assuming a performance regression can be proven there). Very interesting results, in particular knowing that in the one guest case things do not get (significantly) slower due to the added logic and LOCKed RMW in the unlock path. AFAICR, the problem really became apparent when running multiple guests time sharing the physical CPUs, i.e., two guests with eight vCPUs each on an eight core machine. Did you look at this setup with your tests? Please note that my tests are made on native Linux, without XEN involvement. You maybe meant that the spinlock paravirtualization became generally useful in the case you mentioned? (2 guests, 8vpcu + 8vcpu)? Yes, that is what I meant. Just to clarify why you do not see any speed-ups, and were wondering why. If the whole point of the exercise was to see that there are no perforamnce regressions, fine. In that case I misunderstood. Stephan -- Stephan Diestelhorst, AMD Operating System Research Center stephan.diestelho...@amd.com Tel. +49 (0)351 448 356 719 Advanced Micro Devices GmbH Einsteinring 24 85609 Aschheim Germany Geschaeftsfuehrer: Alberto Bozzo Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632, WEEE-Reg-Nr: DE 12919551 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V6 1/11] x86/spinlock: replace pv spinlocks with pv ticketlocks
On 21/03/12 14:25, Stephan Diestelhorst wrote: On Wednesday 21 March 2012, 13:49:28 Attilio Rao wrote: On 21/03/12 13:22, Stephan Diestelhorst wrote: On Wednesday 21 March 2012, 13:04:25 Attilio Rao wrote: On 21/03/12 10:20, Raghavendra K T wrote: From: Jeremy Fitzhardingejeremy.fitzhardi...@citrix.com Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk Prevent Guests from Spinning Around http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. I've made some real world benchmarks based on this serie of patches applied on top of a vanilla Linux-3.3-rc6 (commit 4704fe65e55fb088fbcb1dc0b15ff7cc8bff3685), with both CONFIG_PARAVIRT_SPINLOCK=y and n, which means essentially 4 versions compared: * vanilla - CONFIG_PARAVIRT_SPINLOCK - patch * vanilla + CONFIG_PARAVIRT_SPINLOCK - patch * vanilla - CONFIG_PARAVIRT_SPINLOCK + patch * vanilla + CONFIG_PARAVIRT_SPINLOCK + patch [...] == Results This test points in the direction that Jeremy's rebased patches don't introduce a peformance penalty at all, but also that we could likely consider CONFIG_PARAVIRT_SPINLOCK option removal, or turn it on by default and suggest disabling just on very old CPUs (assuming a performance regression can be proven there). Very interesting results, in particular knowing that in the one guest case things do not get (significantly) slower due to the added logic and LOCKed RMW in the unlock path. AFAICR, the problem really became apparent when running multiple guests time sharing the physical CPUs, i.e., two guests with eight vCPUs each on an eight core machine. Did you look at this setup with your tests? Please note that my tests are made on native Linux, without XEN involvement. You maybe meant that the spinlock paravirtualization became generally useful in the case you mentioned? (2 guests, 8vpcu + 8vcpu)? Yes, that is what I meant. Just to clarify why you do not see any speed-ups, and were wondering why. If the whole point of the exercise was to see that there are no perforamnce regressions, fine. In that case I misunderstood. Yes, that's right, I just wanted to measure (possible) overhead in native Linux and the cost of leaving CONFIG_PARAVIRT_SPINLOCK on. Thanks, Attilio -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Improve iteration through sptes from rmap
By removing sptep from rmap_iterator, I could achieve 15% performance improvement without inlining. Takuya Yoshikawa (3): KVM: MMU: Make pte_list_desc fit cache lines well KVM: MMU: Improve iteration through sptes from rmap -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: MMU: Make pte_list_desc fit cache lines well
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp We have PTE_LIST_EXT + 1 pointers in this structure and these 40/20 bytes do not fit cache lines well. Furthermore, some allocators may use 64/32-byte objects for the pte_list_desc cache. This patch solves this problem by changing PTE_LIST_EXT from 4 to 3. For shadow paging, the new size is still large enough to hold both the kernel and process mappings for usual anonymous pages. For file mappings, there may be a slight change in the cache usage. Note: with EPT/NPT we almost always have a single spte in each reverse mapping and we will not see any change by this. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/mmu.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index dc5f245..3213348 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -135,8 +135,6 @@ module_param(dbg, bool, 0644); #define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \ | PT64_NX_MASK) -#define PTE_LIST_EXT 4 - #define ACC_EXEC_MASK1 #define ACC_WRITE_MASK PT_WRITABLE_MASK #define ACC_USER_MASKPT_USER_MASK @@ -151,6 +149,9 @@ module_param(dbg, bool, 0644); #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) +/* make pte_list_desc fit well in cache line */ +#define PTE_LIST_EXT 3 + struct pte_list_desc { u64 *sptes[PTE_LIST_EXT]; struct pte_list_desc *more; -- 1.7.5.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V6 1/11] x86/spinlock: replace pv spinlocks with pv ticketlocks
On 03/21/2012 08:03 PM, Attilio Rao wrote: On 21/03/12 14:25, Stephan Diestelhorst wrote: On Wednesday 21 March 2012, 13:49:28 Attilio Rao wrote: On 21/03/12 13:22, Stephan Diestelhorst wrote: On Wednesday 21 March 2012, 13:04:25 Attilio Rao wrote: On 21/03/12 10:20, Raghavendra K T wrote: From: Jeremy Fitzhardingejeremy.fitzhardi...@citrix.com Rather than outright replacing the entire spinlock implementation in order to paravirtualize it, keep the ticket lock implementation but add a couple of pvops hooks on the slow patch (long spin on lock, unlocking a contended lock). Ticket locks have a number of nice properties, but they also have some surprising behaviours in virtual environments. They enforce a strict FIFO ordering on cpus trying to take a lock; however, if the hypervisor scheduler does not schedule the cpus in the correct order, the system can waste a huge amount of time spinning until the next cpu can take the lock. (See Thomas Friebel's talk Prevent Guests from Spinning Around http://www.xen.org/files/xensummitboston08/LHP.pdf for more details.) To address this, we add two hooks: - __ticket_spin_lock which is called after the cpu has been spinning on the lock for a significant number of iterations but has failed to take the lock (presumably because the cpu holding the lock has been descheduled). The lock_spinning pvop is expected to block the cpu until it has been kicked by the current lock holder. - __ticket_spin_unlock, which on releasing a contended lock (there are more cpus with tail tickets), it looks to see if the next cpu is blocked and wakes it if so. When compiled with CONFIG_PARAVIRT_SPINLOCKS disabled, a set of stub functions causes all the extra code to go away. I've made some real world benchmarks based on this serie of patches applied on top of a vanilla Linux-3.3-rc6 (commit 4704fe65e55fb088fbcb1dc0b15ff7cc8bff3685), with both CONFIG_PARAVIRT_SPINLOCK=y and n, which means essentially 4 versions compared: * vanilla - CONFIG_PARAVIRT_SPINLOCK - patch * vanilla + CONFIG_PARAVIRT_SPINLOCK - patch * vanilla - CONFIG_PARAVIRT_SPINLOCK + patch * vanilla + CONFIG_PARAVIRT_SPINLOCK + patch [...] == Results This test points in the direction that Jeremy's rebased patches don't introduce a peformance penalty at all, but also that we could likely consider CONFIG_PARAVIRT_SPINLOCK option removal, or turn it on by default and suggest disabling just on very old CPUs (assuming a performance regression can be proven there). Very interesting results, in particular knowing that in the one guest case things do not get (significantly) slower due to the added logic and LOCKed RMW in the unlock path. AFAICR, the problem really became apparent when running multiple guests time sharing the physical CPUs, i.e., two guests with eight vCPUs each on an eight core machine. Did you look at this setup with your tests? Please note that my tests are made on native Linux, without XEN involvement. You maybe meant that the spinlock paravirtualization became generally useful in the case you mentioned? (2 guests, 8vpcu + 8vcpu)? Yes, that is what I meant. Just to clarify why you do not see any speed-ups, and were wondering why. If the whole point of the exercise was to see that there are no perforamnce regressions, fine. In that case I misunderstood. Yes, that's right, I just wanted to measure (possible) overhead in native Linux and the cost of leaving CONFIG_PARAVIRT_SPINLOCK on. True. Even my result was only revolved around native overhead. Till now main concern in the community was native overhead. So this time we have the results that proves CONFG_PARAVRT_SPINLOCK is now in par with corresponding vanilla because of ticketlock improvements. Coming to Guest scenario, I intend to post KVM counterpart of the patches with results where we see huge improvement (around 90%) in contention scenario and almost zero overhead in normal case. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: MMU: Improve iteration through sptes from rmap
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Iteration using rmap_next(), the actual body is pte_list_next(), is inefficient: every time we call it we start from checking whether rmap holds a single spte or points to a descriptor which links more sptes. In the case of shadow paging, this quadratic total iteration cost is a problem. Even for two dimensional paging, with EPT/NPT on, in which we almost always have a single mapping, the extra checks at the end of the iteration should be eliminated. This patch fixes this by introducing rmap_iterator which keeps the iteration context for the next search. Furthermore the implementation of rmap_next() is splitted into two functions, rmap_get_first() and rmap_get_next(), to avoid repeatedly checking whether the rmap being iterated on has only one spte. Although there seemed to be only a slight change for EPT/NPT, the actual improvement was significant: we observed that GET_DIRTY_LOG for 1GB dirty memory became 15% faster than before. This is probably because the new code is easy to make branch predictions. Note: we just remove pte_list_next() because we can think of parent_ptes as a reverse mapping. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/mmu.c | 196 -- arch/x86/kvm/mmu_audit.c | 10 +- 2 files changed, 124 insertions(+), 82 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 3213348..29ad6f9 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -842,32 +842,6 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte, return count; } -static u64 *pte_list_next(unsigned long *pte_list, u64 *spte) -{ - struct pte_list_desc *desc; - u64 *prev_spte; - int i; - - if (!*pte_list) - return NULL; - else if (!(*pte_list 1)) { - if (!spte) - return (u64 *)*pte_list; - return NULL; - } - desc = (struct pte_list_desc *)(*pte_list ~1ul); - prev_spte = NULL; - while (desc) { - for (i = 0; i PTE_LIST_EXT desc-sptes[i]; ++i) { - if (prev_spte == spte) - return desc-sptes[i]; - prev_spte = desc-sptes[i]; - } - desc = desc-more; - } - return NULL; -} - static void pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc, int i, struct pte_list_desc *prev_desc) @@ -988,11 +962,6 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) return pte_list_add(vcpu, spte, rmapp); } -static u64 *rmap_next(unsigned long *rmapp, u64 *spte) -{ - return pte_list_next(rmapp, spte); -} - static void rmap_remove(struct kvm *kvm, u64 *spte) { struct kvm_mmu_page *sp; @@ -1005,6 +974,67 @@ static void rmap_remove(struct kvm *kvm, u64 *spte) pte_list_remove(spte, rmapp); } +/* + * Used by the following functions to iterate through the sptes linked by a + * rmap. All fields are private and not assumed to be used outside. + */ +struct rmap_iterator { + /* private fields */ + struct pte_list_desc *desc; /* holds the sptep if not NULL */ + int pos;/* index of the sptep */ +}; + +/* + * Iteration must be started by this function. This should also be used after + * removing/dropping sptes from the rmap link because in such cases the + * information in the itererator may not be valid. + * + * Returns sptep if found, NULL otherwise. + */ +static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter) +{ + if (!rmap) + return NULL; + + if (!(rmap 1)) { + iter-desc = NULL; + return (u64 *)rmap; + } + + iter-desc = (struct pte_list_desc *)(rmap ~1ul); + iter-pos = 0; + return iter-desc-sptes[iter-pos]; +} + +/* + * Must be used with a valid iterator: e.g. after rmap_get_first(). + * + * Returns sptep if found, NULL otherwise. + */ +static u64 *rmap_get_next(struct rmap_iterator *iter) +{ + if (iter-desc) { + if (iter-pos PTE_LIST_EXT - 1) { + u64 *sptep; + + ++iter-pos; + sptep = iter-desc-sptes[iter-pos]; + if (sptep) + return sptep; + } + + iter-desc = iter-desc-more; + + if (iter-desc) { + iter-pos = 0; + /* desc-sptes[0] cannot be NULL */ + return iter-desc-sptes[iter-pos]; + } + } + + return NULL; +} + static void drop_spte(struct kvm *kvm, u64 *sptep) { if (mmu_spte_clear_track_bits(sptep)) @@ -1013,23 +1043,27 @@ static void drop_spte(struct kvm *kvm, u64 *sptep) static int
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
Look at drivers/char/ipmi/ipmi_msghandler.c. It has code to send panic event over IMPI. The code is pretty complex. Of course if we a going to implement something more complex than simple hypercall for panic notification we better do something more interesting with it than just saying panic happened, like sending stack traces on all cpus for instance. I doubt that's the best example, unfortunately. The IPMI event log has limited space and it has to be send a little piece at a time since each log entry is 14 bytes. It just prints the panic string, nothing else. Not that it isn't useful, it has saved my butt before. You have lots of interesting options with paravirtualization. You could, for instance, create a console driver that delivered all console output efficiently through a hypercall. That would be really easy. Or, as you mention, a custom way to deliver panic information. Collecting information like stack traces would be harder to accomplish, as I don't think there is currently a way to get it except by sending it to printk. -corey -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
On Wed, Mar 21, 2012 at 11:18:16AM -0500, Corey Minyard wrote: Look at drivers/char/ipmi/ipmi_msghandler.c. It has code to send panic event over IMPI. The code is pretty complex. Of course if we a going to implement something more complex than simple hypercall for panic notification we better do something more interesting with it than just saying panic happened, like sending stack traces on all cpus for instance. I doubt that's the best example, unfortunately. The IPMI event log has limited space and it has to be send a little piece at a time since each log entry is 14 bytes. It just prints the panic string, nothing else. Not that it isn't useful, it has saved my butt before. I gave ipmi example just to show that others do complex things on panic, not as an example of what we should do on a guest panic. You have lots of interesting options with paravirtualization. You could, for instance, create a console driver that delivered all console output efficiently through a hypercall. That would be really easy. Or, as you mention, a custom way to deliver panic information. Collecting information like stack traces would be harder to accomplish, as I don't think there is currently a way to get it except by sending it to printk. Why using hypercall for that though? You can do that with virtio-console. Make it zero config: virtio-console detected - send console output there. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
On 03/21/2012 06:18 PM, Corey Minyard wrote: Look at drivers/char/ipmi/ipmi_msghandler.c. It has code to send panic event over IMPI. The code is pretty complex. Of course if we a going to implement something more complex than simple hypercall for panic notification we better do something more interesting with it than just saying panic happened, like sending stack traces on all cpus for instance. I doubt that's the best example, unfortunately. The IPMI event log has limited space and it has to be send a little piece at a time since each log entry is 14 bytes. It just prints the panic string, nothing else. Not that it isn't useful, it has saved my butt before. You have lots of interesting options with paravirtualization. You could, for instance, create a console driver that delivered all console output efficiently through a hypercall. That would be really easy. Or, as you mention, a custom way to deliver panic information. Collecting information like stack traces would be harder to accomplish, as I don't think there is currently a way to get it except by sending it to printk. That already exists; virtio-console (or serial console emulation) can do the job. In fact the feature can be implemented 100% host side by searching for a panic string signature in the console logs. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [Qemu-devel] [PATCH v4 4/7] RTC: Set internal millisecond register to 500ms when reset divider
On Wed, 21 Mar 2012, Zhang, Yang Z wrote: struct tm *tm = s-current_tm; -int64_t host_usec, guest_sec, guest_usec; +int64_t host_usec, guest_sec, guest_usec, offset_usec, old_guest_usec; host_usec = qemu_get_clock_ns(host_clock) / NS_PER_USEC; +offset_usec = s-offset_sec * USEC_PER_SEC + s-offset_usec; +old_guest_usec = (host_usec + offset_usec) % USEC_PER_SEC; guest_sec = mktimegm(tm); -guest_usec = guest_sec * USEC_PER_SEC; +/* start_usec equal 0 means rtc internal millisecond is + * same with before */ +if (start_usec == 0) { +guest_usec = guest_sec * USEC_PER_SEC + old_guest_usec; +} else { +guest_usec = guest_sec * USEC_PER_SEC + start_usec; +} This looks like a mistake to me: before guest_usec was derived exclusively from mktimegm (take or leave USEC_PER_SEC), while now guest_usec is the sum of the value returned by mktimegm and old_guest_usec, even when start_usec is 0. Are you sure that this is correct? The logic is right. Before this patch, we assume the rtc internal millisecond register is same with host time, so we don't need to consider it and using mktimeis enough. Now, the rtc internal millisecond can be changed, so I use the old_guest_usec to record the current internal millisecond. When start_usec is 0, it means we don't need to change the internal millisecond register and the offset_sec is same as before. I am starting to understand the intention of this code, but I am still unconvinced that the start_usec == 0 case is correct. If I am not mistaken you are calculating: offset = guest + old_guest - host while it should be: offset = guest + old_start - host where old_start is start_usec as it was set last time, correct? Am I missing something? In any case please add a comment to explain what the change is trying to accomplish. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v4 5/7] RTC:Add RTC update-ended interrupt support
On Wed, 21 Mar 2012, Paolo Bonzini wrote: Il 20/03/2012 19:35, Stefano Stabellini ha scritto: This is the function that is used to figure out whether we need the timers or not, the condition seems to be: (Not (REG_C_UF | REG_C_AF)) And (Not (REG_B_SET)) Shouldn't actually check for UIE being enabled? No, you need to set UF in case the code observes it without actually enabling interrupt delivery on the ISA bus. Well, if it is just about updating UF, can we do it only when the user reads REG_C? I am just trying to limit the need for the update_timers... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC V6 2/11] x86/ticketlock: don't inline _spin_unlock when using paravirt spinlocks
On Wed, Mar 21, 2012 at 3:21 AM, Raghavendra K T raghavendra...@linux.vnet.ibm.com wrote: From: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com The code size expands somewhat, and its probably better to just call a function rather than inline it. Signed-off-by: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com ---  arch/x86/Kconfig   |   3 +++  kernel/Kconfig.locks |   2 +-  2 files changed, 4 insertions(+), 1 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5bed94e..10c28ec 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -623,6 +623,9 @@ config PARAVIRT_SPINLOCKS      If you are unsure how to answer this question, answer N. +config ARCH_NOINLINE_SPIN_UNLOCK +    def_bool PARAVIRT_SPINLOCKS +  config PARAVIRT_CLOCK     bool diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index 5068e2a..584637b 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -125,7 +125,7 @@ config INLINE_SPIN_LOCK_IRQSAVE         ARCH_INLINE_SPIN_LOCK_IRQSAVE  config INLINE_SPIN_UNLOCK -    def_bool !DEBUG_SPINLOCK (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK) +    def_bool !DEBUG_SPINLOCK (!PREEMPT || ARCH_INLINE_SPIN_UNLOCK) !ARCH_NOINLINE_SPIN_UNLOCK  config INLINE_SPIN_UNLOCK_BH     def_bool !DEBUG_SPINLOCK ARCH_INLINE_SPIN_UNLOCK_BH Ugh. This is getting really ugly. Can we just fix it by - getting rid of INLINE_SPIN_UNLOCK entirely - replacing it with UNINLINE_SPIN_UNLOCK instead with the reverse meaning, and no def_bool at all, just a simple config UNINLINE_SPIN_UNLOCK bool - make the various people who want to uninline the spinlocks (like spinlock debugging, paravirt etc) all just do select UNINLINE_SPIN_UNLOCK because quite frankly, the whole spinunlock inlining logic is *already* unreadable, and you just made it worse. Linus -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
On Wed, Mar 21, 2012 at 06:25:16PM +0200, Avi Kivity wrote: On 03/21/2012 06:18 PM, Corey Minyard wrote: Look at drivers/char/ipmi/ipmi_msghandler.c. It has code to send panic event over IMPI. The code is pretty complex. Of course if we a going to implement something more complex than simple hypercall for panic notification we better do something more interesting with it than just saying panic happened, like sending stack traces on all cpus for instance. I doubt that's the best example, unfortunately. The IPMI event log has limited space and it has to be send a little piece at a time since each log entry is 14 bytes. It just prints the panic string, nothing else. Not that it isn't useful, it has saved my butt before. You have lots of interesting options with paravirtualization. You could, for instance, create a console driver that delivered all console output efficiently through a hypercall. That would be really easy. Or, as you mention, a custom way to deliver panic information. Collecting information like stack traces would be harder to accomplish, as I don't think there is currently a way to get it except by sending it to printk. That already exists; virtio-console (or serial console emulation) can do the job. In fact the feature can be implemented 100% host side by searching for a panic string signature in the console logs. You can even go one better and search for the panic string in the guest memory directly, which is what virt-dmesg does :-) http://people.redhat.com/~rjones/virt-dmesg/ Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance trouble
On 21.03.2012 12:10, David Cure wrote: hello, Le Tue, Mar 20, 2012 at 02:38:22PM +0200, Gleb Natapov ecrivait : Try to addfeature policy='disable' name='hypervisor'/ to cpu definition in XML and check command line. ok I try this but I can't usecpu model to map the host cpu (my libvirt is 0.9.8) so I use : cpu match='exact' modelOpteron_G3/model feature policy='disable' name='hypervisor'/ /cpu (the physical server use Opteron CPU). The log is here : http://www.roullier.net/Report/report-3.2-vhost-net-1vcpu-cpu.txt.gz And now with only 1 vcpu, the response time is 8.5s, great improvment. We keep this configuration for production : we check the response time when some other users are connected. please keep in mind, that setting -hypervisor, disabling hpet and only one vcpu makes windows use tsc as clocksource. you have to make sure, that your vm is not switching between physical sockets on your system and that you have constant_tsc feature to have a stable tsc between the cores in the same socket. its also likely that the vm will crash when live migrated. @gleb: do you know whats the state of in-kernel hyper-v timers? peter David. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
On 03/21/2012 07:04 PM, Daniel P. Berrange wrote: In fact the feature can be implemented 100% host side by searching for a panic string signature in the console logs. You can even go one better and search for the panic string in the guest memory directly, which is what virt-dmesg does :-) http://people.redhat.com/~rjones/virt-dmesg/ -ETOOHACKY Any guest change will break this, no? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2 v3] kvm: notify host when guest panicked
On 2012-03-21 18:34, Avi Kivity wrote: On 03/21/2012 07:04 PM, Daniel P. Berrange wrote: In fact the feature can be implemented 100% host side by searching for a panic string signature in the console logs. You can even go one better and search for the panic string in the guest memory directly, which is what virt-dmesg does :-) http://people.redhat.com/~rjones/virt-dmesg/ -ETOOHACKY Any guest change will break this, no? /me has a simple python script (a few ten lines) to do this during runtime (kgdb, qemu gdbstub) or post-portem (gdb -c vmcore). Ideally, that will once come with the Linux sources where it can be kept in sync with the kernel data structures. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/38] powerpc/booke: Provide exception macros with interrupt name
On 03/21/2012 01:04 PM, Kumar Gala wrote: On Feb 28, 2012, at 6:09 PM, Alexander Graf wrote: From: Scott Wood scottw...@freescale.com DO_KVM will need to identify the particular exception type. There is an existing set of arbitrary numbers that Linux passes, but it's an undocumented mess that sort of corresponds to server/classic exception vectors but not really. So what do the new names correspond to? The names are Linux-defined. The values are IVOR numbers. What header is defining MACHINE_CHECK, BOOKE_INTERRUPT_EXTERNAL, etc (asm/kvm_asm.h)? Yes, it's asm/kvm_asm.h at the moment. It's actually BOOKE_INTERRUPT_MACHINE_CHECK, etc. The exception macros paste on BOOKE_INTERRUPT_ when they use it, to keep the macro users from having to wrap lines even more often. If so we really should move these out of asm/kvm_asm.h and into something a bit more appropriate. Yes, that was one of the things I had been planning to fix post-RFC. Any preference what header it should be in? asm/reg_booke.h seems to be the least bad option of the existing headers -- or perhaps move head_booke.h to arch/powerpc/include/asm. Long-term it would also be nice for this to replace, rather than supplement, the current numbers, though as Ben pointed out there are a number of places throughout the code that will have to be fixed for that to happen. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42779] KVM domain hangs after loading initrd with Xenomai kernel
https://bugzilla.kernel.org/show_bug.cgi?id=42779 --- Comment #20 from Avi Kivity a...@redhat.com 2012-03-21 18:21:33 --- Please try git://git.kernel.org/pub/scm/virt/kvm/kvm.git emulator-movq this emulates the two movq instructions used. There may be other unemulated instructions though. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Emulate MMX MOVQ
This patchset emulates two instructions: MOVQ mm, mm/m64 MOVQ mm/m64, mm Needed by https://bugzilla.kernel.org/show_bug.cgi?id=42779. However it seems to fail sometimes in my unit test, generating a math fault at the fwait instruction (which is duly trapped). Hints appreciated. Avi Kivity (2): KVM: x86 emulator: MMX support KVM: x86 emulator: implement MMX MOVQ (opcodes 0f 6f, 0f 7f) arch/x86/include/asm/kvm_emulate.h |4 +- arch/x86/kvm/emulate.c | 98 ++-- 2 files changed, 96 insertions(+), 6 deletions(-) -- 1.7.9 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: x86 emulator: MMX support
General support for the MMX instruction set. Special care is taken to trap pending x87 exceptions so that they are properly reflected to the guest. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |4 +- arch/x86/kvm/emulate.c | 90 ++-- 2 files changed, 89 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index c222e1a..1ac46c22 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -200,7 +200,7 @@ typedef u32 __attribute__((vector_size(16))) sse128_t; /* Type, address-of, and value of an instruction's operand. */ struct operand { - enum { OP_REG, OP_MEM, OP_IMM, OP_XMM, OP_NONE } type; + enum { OP_REG, OP_MEM, OP_IMM, OP_XMM, OP_MM, OP_NONE } type; unsigned int bytes; union { unsigned long orig_val; @@ -213,12 +213,14 @@ struct operand { unsigned seg; } mem; unsigned xmm; + unsigned mm; } addr; union { unsigned long val; u64 val64; char valptr[sizeof(unsigned long) + 2]; sse128_t vec_val; + u64 mm_val; }; }; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 8375622..19aec76 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -142,6 +142,7 @@ #define Src2FS (OpFS Src2Shift) #define Src2GS (OpGS Src2Shift) #define Src2Mask(OpMask Src2Shift) +#define Mmx (1ull40) /* MMX Vector instruction */ #define X2(x...) x, x #define X3(x...) X2(x), x @@ -859,6 +860,40 @@ static void write_sse_reg(struct x86_emulate_ctxt *ctxt, sse128_t *data, ctxt-ops-put_fpu(ctxt); } +static void read_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg) +{ + ctxt-ops-get_fpu(ctxt); + switch (reg) { + case 0: asm(movq %%mm0, %0 : =m(*data)); break; + case 1: asm(movq %%mm1, %0 : =m(*data)); break; + case 2: asm(movq %%mm2, %0 : =m(*data)); break; + case 3: asm(movq %%mm3, %0 : =m(*data)); break; + case 4: asm(movq %%mm4, %0 : =m(*data)); break; + case 5: asm(movq %%mm5, %0 : =m(*data)); break; + case 6: asm(movq %%mm6, %0 : =m(*data)); break; + case 7: asm(movq %%mm7, %0 : =m(*data)); break; + default: BUG(); + } + ctxt-ops-put_fpu(ctxt); +} + +static void write_mmx_reg(struct x86_emulate_ctxt *ctxt, u64 *data, int reg) +{ + ctxt-ops-get_fpu(ctxt); + switch (reg) { + case 0: asm(movq %0, %%mm0 : : m(*data)); break; + case 1: asm(movq %0, %%mm1 : : m(*data)); break; + case 2: asm(movq %0, %%mm2 : : m(*data)); break; + case 3: asm(movq %0, %%mm3 : : m(*data)); break; + case 4: asm(movq %0, %%mm4 : : m(*data)); break; + case 5: asm(movq %0, %%mm5 : : m(*data)); break; + case 6: asm(movq %0, %%mm6 : : m(*data)); break; + case 7: asm(movq %0, %%mm7 : : m(*data)); break; + default: BUG(); + } + ctxt-ops-put_fpu(ctxt); +} + static void decode_register_operand(struct x86_emulate_ctxt *ctxt, struct operand *op) { @@ -875,6 +910,14 @@ static void decode_register_operand(struct x86_emulate_ctxt *ctxt, read_sse_reg(ctxt, op-vec_val, reg); return; } + if (ctxt-d Mmx) { + reg = 7; + op-type = OP_MM; + op-bytes = 8; + op-addr.mm = reg; + read_mmx_reg(ctxt, op-mm_val, reg); + return; + } op-type = OP_REG; if (ctxt-d ByteOp) { @@ -920,6 +963,13 @@ static int decode_modrm(struct x86_emulate_ctxt *ctxt, read_sse_reg(ctxt, op-vec_val, ctxt-modrm_rm); return rc; } + if (ctxt-d Mmx) { + op-type = OP_MM; + op-bytes = 8; + op-addr.xmm = ctxt-modrm_rm 7; + read_mmx_reg(ctxt, op-mm_val, ctxt-modrm_rm); + return rc; + } fetch_register_operand(op); return rc; } @@ -1387,6 +1437,9 @@ static int writeback(struct x86_emulate_ctxt *ctxt) case OP_XMM: write_sse_reg(ctxt, ctxt-dst.vec_val, ctxt-dst.addr.xmm); break; + case OP_MM: + write_mmx_reg(ctxt, ctxt-dst.mm_val, ctxt-dst.addr.mm); + break; case OP_NONE: /* no writeback */ break; @@ -3960,6 +4013,8 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len) if (ctxt-d Sse) ctxt-op_bytes = 16; + else if (ctxt-d Mmx) + ctxt-op_bytes = 8; /* ModRM and
[PATCH 2/2] KVM: x86 emulator: implement MMX MOVQ (opcodes 0f 6f, 0f 7f)
Needed by some framebuffer drivers. See https://bugzilla.kernel.org/show_bug.cgi?id=42779 Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/kvm/emulate.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 19aec76..5997513 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2929,6 +2929,12 @@ static int em_movdqu(struct x86_emulate_ctxt *ctxt) return X86EMUL_CONTINUE; } +static int em_movq(struct x86_emulate_ctxt *ctxt) +{ + ctxt-dst.mm_val = ctxt-src.mm_val; + return X86EMUL_CONTINUE; +} + static int em_invlpg(struct x86_emulate_ctxt *ctxt) { int rc; @@ -3468,7 +3474,7 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt) }; static struct gprefix pfx_0f_6f_0f_7f = { - N, N, N, I(Sse, em_movdqu), + I(Mmx, em_movq), N, N, I(Sse, em_movdqu), }; static struct opcode opcode_table[256] = { -- 1.7.9 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RESEND][PATCH 2/2 v3] deal with guest panicked event
On 03/09/2012 04:22 PM, Marcelo Tosatti wrote: On Thu, Mar 08, 2012 at 11:56:56AM +, Daniel P. Berrange wrote: On Thu, Mar 08, 2012 at 01:52:45PM +0200, Avi Kivity wrote: On 03/08/2012 01:36 PM, Daniel P. Berrange wrote: On Thu, Mar 08, 2012 at 01:28:56PM +0200, Avi Kivity wrote: On 03/08/2012 12:15 PM, Wen Congyang wrote: When the host knows the guest is panicked, it will set exit_reason to KVM_EXIT_GUEST_PANICKED. So if qemu receive this exit_reason, we can send a event to tell management application that the guest is panicked and set the guest status to RUN_STATE_PANICKED. Signed-off-by: Wen Congyangwe...@cn.fujitsu.com --- kvm-all.c|5 + monitor.c|3 +++ monitor.h|1 + qapi-schema.json |2 +- qmp.c|3 ++- vl.c |1 + 6 files changed, 13 insertions(+), 2 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 77eadf6..b3c9a83 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1290,6 +1290,11 @@ int kvm_cpu_exec(CPUState *env) (uint64_t)run-hw.hardware_exit_reason); ret = -1; break; +case KVM_EXIT_GUEST_PANICKED: +monitor_protocol_event(QEVENT_GUEST_PANICKED, NULL); +vm_stop(RUN_STATE_PANICKED); +ret = -1; +break; If the management application is not aware of this event, then it will never resume the guest, so it will appear hung. Even if the mgmt app doesn't know about the QEVENT_GUEST_PANICKED, it should still see a QEVENT_STOP event emitted by vm_stop() surely ? So it will know the guest CPUs have been stopped, even if it isn't aware of the reason why, which seems fine to me. No. The guest is stopped, and there's no reason to suppose that the management app will restart it. Behaviour has changed. Suppose the guest has reboot_on_panic set; now the behaviour change is even more visible - service will stop completely instead of being interrupted for a bit while the guest reboots. Hmm, so this calls for a new command line argument to control behaviour, similar to what we do for disk werror, eg something like --onpanic report|pause|stop|... where report - emit QEVENT_GUEST_PANICKED only Should be the default. Should we just have a mechanism to stop the guest on certain types of QMP events? For instance: -stop-on guest-panicked,block-ioerror Likewise, we could have a: -quit-on guest-panicked. In the very least, we should make what we use for rerror,werror an enumeration that's shared here. Regards, Anthony Liguori pause - emit QEVENT_GUEST_PANICKED and pause VM stop - emit QEVENT_GUEST_PANICKED and quit VM quit is a better name than stop. This would map fairly well into libvirt, where we already have config parameters for controlling what todo with a guest when it panics. Regards, Daniel -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 0/2 v3] kvm: notify host when guest panicked
On 03/13/2012 05:47 AM, Avi Kivity wrote: On 03/13/2012 11:18 AM, Daniel P. Berrange wrote: On Mon, Mar 12, 2012 at 12:33:33PM +0200, Avi Kivity wrote: On 03/12/2012 11:04 AM, Wen Congyang wrote: Do you have any other comments about this patch? Not really, but I'm not 100% convinced the patch is worthwhile. It's likely to only be used by Linux, which has kexec facilities, and you can put talk to management via virtio-serial and describe the crash in more details than a simple hypercall. As mentioned before, I don't think virtio-serial is a good fit for this. We want something that is simple guaranteed always available. Using virtio-serial requires significant setup work on both the host and guest. So what? It needs to be done anyway for the guest agent. Many management application won't know to make a vioserial device available to all guests they create. Then they won't know to deal with the panic event either. Most administrators won't even configure kexec, let alone virtio serial on top of it. It should be done by the OS vendor, not the individual admin. The hypercall requires zero host side config, and zero guest side config, which IMHO is what we need for this feature. If it was this one feature, yes. But we keep getting more and more features like that and we bloat the hypervisor. There's a reason we have a host-to-guest channel, we should use it. The problem is that virtio-serial sucks for something like this. We have two options I think: 1) We could reserve a portion of the hypercall space to be deferred to userspace for stuff like this. 2) We could invent a new hypercall like facility that was less bloated than virtio-serial for stuff like this using MMIO/PIO. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3][Autotest][virt] autotest.base_utils: Move virt.utils.Thread to common_lib
Move the class virt.utils.Thread to base_utils.InterruptedThread thus it can be used in general utils. Signed-off-by: Jiřà Župka jzu...@redhat.com --- client/common_lib/base_utils.py| 65 ++ .../kvm/tests/migration_with_file_transfer.py |6 +- client/tests/kvm/tests/migration_with_reboot.py|4 +- client/tests/kvm/tests/nic_bonding.py |9 ++- client/tests/kvm/tests/vmstop.py |6 +- client/virt/tests/nic_promisc.py |5 +- client/virt/tests/nicdriver_unload.py |4 +- client/virt/tests/ntttcp.py|2 +- client/virt/virt_test_utils.py |2 +- client/virt/virt_utils.py | 69 +--- 10 files changed, 87 insertions(+), 85 deletions(-) diff --git a/client/common_lib/base_utils.py b/client/common_lib/base_utils.py index 972d18a..c40e5dc 100644 --- a/client/common_lib/base_utils.py +++ b/client/common_lib/base_utils.py @@ -817,6 +817,71 @@ def run_parallel(commands, timeout=None, ignore_status=False, return [bg_job.result for bg_job in bg_jobs] +class InterruptedThread(Thread): + +Run a function in a background thread. + +def __init__(self, target, args=(), kwargs={}): + +Initialize the instance. + +@param target: Function to run in the thread. +@param args: Arguments to pass to target. +@param kwargs: Keyword arguments to pass to target. + +Thread.__init__(self) +self._target = target +self._args = args +self._kwargs = kwargs + + +def run(self): + +Run target (passed to the constructor). No point in calling this +function directly. Call start() to make this function run in a new +thread. + +self._e = None +self._retval = None +try: +try: +self._retval = self._target(*self._args, **self._kwargs) +except Exception: +self._e = sys.exc_info() +raise +finally: +# Avoid circular references (start() may be called only once so +# it's OK to delete these) +del self._target, self._args, self._kwargs + + +def join(self, timeout=None, suppress_exception=False): + +Join the thread. If target raised an exception, re-raise it. +Otherwise, return the value returned by target. + +@param timeout: Timeout value to pass to threading.Thread.join(). +@param suppress_exception: If True, don't re-raise the exception. + +Thread.join(self, timeout) +try: +if self._e: +if not suppress_exception: +# Because the exception was raised in another thread, we +# need to explicitly insert the current context into it +s = error.exception_context(self._e[1]) +s = error.join_contexts(error.get_context(), s) +error.set_exception_context(self._e[1], s) +raise self._e[0], self._e[1], self._e[2] +else: +return self._retval +finally: +# Avoid circular references (join() may be called multiple times +# so we can't delete these) +self._e = None +self._retval = None + + @deprecated def run_bg(command): Function deprecated. Please use BgJob class instead. diff --git a/client/tests/kvm/tests/migration_with_file_transfer.py b/client/tests/kvm/tests/migration_with_file_transfer.py index 075148d..073b87e 100644 --- a/client/tests/kvm/tests/migration_with_file_transfer.py +++ b/client/tests/kvm/tests/migration_with_file_transfer.py @@ -56,13 +56,13 @@ def run_migration_with_file_transfer(test, params, env): error.context(transferring file to guest while migrating, logging.info) -bg = virt_utils.Thread(vm.copy_files_to, (host_path, guest_path), - dict(verbose=True, timeout=transfer_timeout)) +bg = utils.InterruptedThread(vm.copy_files_to, (host_path, guest_path), + dict(verbose=True, timeout=transfer_timeout)) run_and_migrate(bg) error.context(transferring file back to host while migrating, logging.info) -bg = virt_utils.Thread(vm.copy_files_from, +bg = utils.InterruptedThread(vm.copy_files_from, (guest_path, host_path_returned), dict(verbose=True, timeout=transfer_timeout)) run_and_migrate(bg) diff --git a/client/tests/kvm/tests/migration_with_reboot.py b/client/tests/kvm/tests/migration_with_reboot.py index b291a83..6ee2da5 100644 --- a/client/tests/kvm/tests/migration_with_reboot.py +++
Re: [Qemu-devel] [PATCH 0/2 v3] kvm: notify host when guest panicked
On 03/15/2012 06:46 AM, Avi Kivity wrote: On 03/15/2012 01:25 PM, Jan Kiszka wrote: There was such vm exit (KVM_EXIT_HYPERCALL), but it was deemed to be a bad idea. BTW, this would help a lot in emulating hypercalls of other hypervisors (or of KVM's VAPIC in the absence of in-kernel irqchip - I had to jump through hoops therefore) in user space. Not all those hypercall handlers actually have to reside in the KVM module. That is true. On the other hand the hypercall ABI might go to pieces if there was no central implementation. Just declare that outl 0x505 is a megaultracall and s/vmcall/outb/g and call it a day. The performance difference between vmcall and outl is so tiny compared to the cost of dropping to userspace that it really doesn't matter which instruction is used. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 0/2 v3] kvm: notify host when guest panicked
On 03/21/2012 11:25 AM, Avi Kivity wrote: On 03/21/2012 06:18 PM, Corey Minyard wrote: Look at drivers/char/ipmi/ipmi_msghandler.c. It has code to send panic event over IMPI. The code is pretty complex. Of course if we a going to implement something more complex than simple hypercall for panic notification we better do something more interesting with it than just saying panic happened, like sending stack traces on all cpus for instance. I doubt that's the best example, unfortunately. The IPMI event log has limited space and it has to be send a little piece at a time since each log entry is 14 bytes. It just prints the panic string, nothing else. Not that it isn't useful, it has saved my butt before. You have lots of interesting options with paravirtualization. You could, for instance, create a console driver that delivered all console output efficiently through a hypercall. That would be really easy. Or, as you mention, a custom way to deliver panic information. Collecting information like stack traces would be harder to accomplish, as I don't think there is currently a way to get it except by sending it to printk. That already exists; virtio-console (or serial console emulation) can do the job. I think the use case here is pretty straight forward: if the guest finds itself in bad place, it wants to indicate that to the host. We shouldn't rely on any device drivers or complex code. It should be as close to a single instruction as possible that can run even if interrupts are disabled. An out instruction fits this very well. I think a simple protocol like: inl PORT - returns a magic number indicating the presence of qemucalls inl PORT+1 - returns a bitmap of supported features outl PORT+1 - data reg1 outl PORT+2 - data reg2 outl PORT+N - data regN outl PORT - qemucall of index value with arguments 1..N Regards, Anthony Liguori In fact the feature can be implemented 100% host side by searching for a panic string signature in the console logs. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/38] powerpc/booke: Provide exception macros with interrupt name
On Mar 21, 2012, at 1:19 PM, Scott Wood wrote: On 03/21/2012 01:04 PM, Kumar Gala wrote: On Feb 28, 2012, at 6:09 PM, Alexander Graf wrote: From: Scott Wood scottw...@freescale.com DO_KVM will need to identify the particular exception type. There is an existing set of arbitrary numbers that Linux passes, but it's an undocumented mess that sort of corresponds to server/classic exception vectors but not really. So what do the new names correspond to? The names are Linux-defined. The values are IVOR numbers. What header is defining MACHINE_CHECK, BOOKE_INTERRUPT_EXTERNAL, etc (asm/kvm_asm.h)? Yes, it's asm/kvm_asm.h at the moment. It's actually BOOKE_INTERRUPT_MACHINE_CHECK, etc. The exception macros paste on BOOKE_INTERRUPT_ when they use it, to keep the macro users from having to wrap lines even more often. If so we really should move these out of asm/kvm_asm.h and into something a bit more appropriate. Yes, that was one of the things I had been planning to fix post-RFC. Any preference what header it should be in? asm/reg_booke.h seems to be the least bad option of the existing headers -- or perhaps move head_booke.h to arch/powerpc/include/asm. asm/reg_booke.h seems the least painful right now. head_booke.h is only used on 32-bit so that's not the best choice at this point. We could create an asm/head_booke.h in addition to the one in arch/powerpc/kernel/ Long-term it would also be nice for this to replace, rather than supplement, the current numbers, though as Ben pointed out there are a number of places throughout the code that will have to be fixed for that to happen. Yeah, not too worried about that. More just wanting it to be clear what one has to 'update' if adding support for a new exception. - k-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Isolation groups
On Sat, 2012-03-17 at 15:57 +1100, David Gibson wrote: On Fri, Mar 16, 2012 at 01:31:18PM -0600, Alex Williamson wrote: On Fri, 2012-03-16 at 14:45 +1100, David Gibson wrote: On Thu, Mar 15, 2012 at 02:15:01PM -0600, Alex Williamson wrote: On Wed, 2012-03-14 at 20:58 +1100, David Gibson wrote: On Tue, Mar 13, 2012 at 10:49:47AM -0600, Alex Williamson wrote: On Wed, 2012-03-14 at 01:33 +1100, David Gibson wrote: On Mon, Mar 12, 2012 at 04:32:54PM -0600, Alex Williamson wrote: +/* + * Add a device to an isolation group. Isolation groups start empty and + * must be told about the devices they contain. Expect this to be called + * from isolation group providers via notifier. + */ Doesn't necessarily have to be from a notifier, particularly if the provider is integrated into host bridge code. Sure, a provider could do this on it's own if it wants. This just provides some infrastructure for a common path. Also note that this helps to eliminate all the #ifdef CONFIG_ISOLATION in the provider. Yet to be seen whether that can reasonably be the case once isolation groups are added to streaming DMA paths. Right, but other than the #ifdef safety, which could be achieved more simply, I'm not seeing what benefit the infrastructure provides over directly calling the bus notifier function. The infrastructure groups the notifiers by bus type internally, but AFAICT exactly one bus notifier call would become exactly one isolation notifier call, and the notifier callback itself would be almost identical. I guess I don't see this as a fundamental design point of the proposal, it's just a convenient way to initialize groups as a side-band addition until isolation groups become a more fundamental part of the iommu infrastructure. If you want to do that level of integration in your provider, do it and make the callbacks w/o the notifier. If nobody ends up using them, we'll remove them. Maybe it will just end up being a bootstrap. In the typical case, yes, one bus notifier is one isolation notifier. It does however also allow one bus notifier to become multiple isolation notifiers, and includes some filtering that would just be duplicated if every provider decided to implement their own bus notifier. Uh.. I didn't notice any filtering? That's why I'm asking. Not much, but a little: + switch (action) { + case BUS_NOTIFY_ADD_DEVICE: + if (!dev-isolation_group) + blocking_notifier_call_chain(notifier-notifier, + ISOLATION_NOTIFY_ADD_DEVICE, dev); + break; + case BUS_NOTIFY_DEL_DEVICE: + if (dev-isolation_group) + blocking_notifier_call_chain(notifier-notifier, + ISOLATION_NOTIFY_DEL_DEVICE, dev); + break; + } Ah, I see, fair enough. A couple of tangential observations. First, I suspect using BUS_NOTIFY_DEL_DEVICE is a very roundabout way of handling hot-unplug, it might be better to have an unplug callback in the group instead. There's really one already. Assuming the device is attached to a group driver, the .remove entry point on the driver will get called first. It doesn't get to return error, but it can block. The DEL_DEVICE will only happen after the group driver has relinquished the device, or at least returned from remove(). For a device with no driver, the group would only learn about it through the notifier, but it probably doesn't need anything more direct. Second, I don't think aborting the call chain early for hot-plug is actually a good idea. I can't see a clear guarantee on the order, so individual providers couldn't rely on that short-cut behaviour. Which means that if two providers would have attempted to claim the same device, something is seriously wrong and we should probably report that. Yeah, that seems like a reasonable safety check. ... So, somewhere, I think we need a fallback path, but I'm not sure exactly where. If an isolation provider doesn't explicitly put a device into a group, the device should go into the group of its parent bridge. This covers the case of a bus with IOMMU which has below it a bridge to a different type of DMA capable bus (which the IOMMU isn't explicitly aware of). DMAs from devices on the subordinate bus can be translated by the top-level IOMMU (assuming it sees them as coming from the bridge), but they all need to be treated as one group. Why would the top level IOMMU provider not set the isolation group in this case. Because it knows
Re: [Qemu-devel] [PATCH v4 5/7] RTC:Add RTC update-ended interrupt support
Il 21/03/2012 17:54, Stefano Stabellini ha scritto: No, you need to set UF in case the code observes it without actually enabling interrupt delivery on the ISA bus. Well, if it is just about updating UF, can we do it only when the user reads REG_C? Perhaps, but for AF it is much harder. Note that if the guest does not care, this will run just once. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: x86 emulator: MMX support
On 03/21/2012 08:40 PM, Avi Kivity wrote: +static int flush_pending_x87_faults(struct x86_emulate_ctxt *ctxt) +{ + bool fault = false; + + ctxt-ops-get_fpu(ctxt); + asm volatile(1: fwait \n\t + 2: \n\t + .pushsection .fixup,\ax\ \n\t + 3: \n\t + movb $1, %[fault] \n\t + jmp 2b \n\t + .popsection \n\t + _ASM_EXTABLE(1b, 3b) + : [fault]=rm(fault)); = tells gcc that the asm section unconditionally writes 'fault', so it skips the initialization. Use + instead. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 uq/master] kvm: Drop unused kvm_pit_in_kernel
From: Jan Kiszka jan.kis...@siemens.com This is now implied by kvm_irqchip_in_kernel. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Rebased over latest uq/master. kvm-all.c |6 -- kvm-stub.c |6 -- kvm.h |2 -- 3 files changed, 0 insertions(+), 14 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index ba2cee1..3a1fa40 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -75,7 +75,6 @@ struct KVMState #ifdef KVM_CAP_SET_GUEST_DEBUG struct kvm_sw_breakpoint_head kvm_sw_breakpoints; #endif -int pit_in_kernel; int pit_state2; int xsave, xcrs; int many_ioeventfds; @@ -198,11 +197,6 @@ static void kvm_reset_vcpu(void *opaque) kvm_arch_reset_vcpu(env); } -int kvm_pit_in_kernel(void) -{ -return kvm_state-pit_in_kernel; -} - int kvm_init_vcpu(CPUArchState *env) { KVMState *s = kvm_state; diff --git a/kvm-stub.c b/kvm-stub.c index 69a1228..ec8fe74 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -16,12 +16,6 @@ #include gdbstub.h #include kvm.h -int kvm_pit_in_kernel(void) -{ -return 0; -} - - int kvm_init_vcpu(CPUArchState *env) { return -ENOSYS; diff --git a/kvm.h b/kvm.h index 330f17b..55f107d 100644 --- a/kvm.h +++ b/kvm.h @@ -83,8 +83,6 @@ int kvm_update_guest_debug(CPUArchState *env, unsigned long reinject_trap); int kvm_set_signal_mask(CPUArchState *env, const sigset_t *sigset); #endif -int kvm_pit_in_kernel(void); - int kvm_on_sigbus_vcpu(CPUArchState *env, int code, void *addr); int kvm_on_sigbus(int code, void *addr); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 0/2] uq/master: Basic MSI support for in-kernel irqchip mode
Some half a year ago when I posted my first attempt to refactor MSI for KVM support, we came to the conclusion that it might suffice to do transparent dynamic routing for user-space injected MSI messages. These two patches now implement such an approach for upstream. As QEMU does not yet include irqfd support (for vhost) or pci device assignment, this is already enough to enable MSI over the in-kernel irqchip. Still, this is only RFC as it is just lightly tested and should primarily collect feedback regarding the direction. If it's fine, I'd like to base further qemu-kvm refactorings and upstream preparations on top of such a series. Also, I'd like to reanimate my KVM patch to provide direct MSI injection in future kernels so that we do not need to take this long path here forever. Jan Kiszka (2): kvm: Introduce basic MSI support in-kernel irqchips KVM: x86: Wire up MSI support for in-kernel irqchip hw/apic.c |3 + hw/kvm/apic.c | 33 ++- hw/pc.c |5 -- kvm-all.c | 171 - kvm.h |1 + 5 files changed, 205 insertions(+), 8 deletions(-) -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 2/2] KVM: x86: Wire up MSI support for in-kernel irqchip
From: Jan Kiszka jan.kis...@siemens.com Catch writes to the MSI MMIO region in the KVM APIC and forward them to the kernel. Provide the kernel support GSI routing, this allows to enable MSI support also for in-kernel irqchip mode. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- hw/apic.c |3 +++ hw/kvm/apic.c | 33 +++-- hw/pc.c |5 - 3 files changed, 34 insertions(+), 7 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index 4eeaf88..5fbf01c 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -19,6 +19,7 @@ #include apic_internal.h #include apic.h #include ioapic.h +#include msi.h #include host-utils.h #include trace.h #include pc.h @@ -862,6 +863,8 @@ static void apic_init(APICCommonState *s) s-timer = qemu_new_timer_ns(vm_clock, apic_timer, s); local_apics[s-idx] = s; + +msi_supported = true; } static void apic_class_init(ObjectClass *klass, void *data) diff --git a/hw/kvm/apic.c b/hw/kvm/apic.c index ffe7a52..7d83b1a 100644 --- a/hw/kvm/apic.c +++ b/hw/kvm/apic.c @@ -10,6 +10,7 @@ * See the COPYING file in the top-level directory. */ #include hw/apic_internal.h +#include hw/msi.h #include kvm.h static inline void kvm_apic_set_reg(struct kvm_lapic_state *kapic, @@ -145,10 +146,38 @@ static void kvm_apic_external_nmi(APICCommonState *s) run_on_cpu(s-cpu_env, do_inject_external_nmi, s); } +static uint64_t kvm_apic_mem_read(void *opaque, target_phys_addr_t addr, + unsigned size) +{ +return -1U; +} + +static void kvm_apic_mem_write(void *opaque, target_phys_addr_t addr, + uint64_t data, unsigned size) +{ +int ret; + +ret = kvm_irqchip_send_msi(kvm_state, addr, data); +if (ret 0) { +fprintf(stderr, KVM: injection failed, MSI lost (%s)\n, +strerror(-ret)); +} +} + +static const MemoryRegionOps kvm_apic_io_ops = { +.read = kvm_apic_mem_read, +.write = kvm_apic_mem_write, +.endianness = DEVICE_NATIVE_ENDIAN, +}; + static void kvm_apic_init(APICCommonState *s) { -memory_region_init_reservation(s-io_memory, kvm-apic-msi, - MSI_SPACE_SIZE); +memory_region_init_io(s-io_memory, kvm_apic_io_ops, s, kvm-apic-msi, + MSI_SPACE_SIZE); + +if (kvm_has_gsi_routing()) { +msi_supported = true; +} } static void kvm_apic_class_init(ObjectClass *klass, void *data) diff --git a/hw/pc.c b/hw/pc.c index 83a1b5b..fab620a 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -907,11 +907,6 @@ static DeviceState *apic_init(void *env, uint8_t apic_id) apic_mapped = 1; } -/* KVM does not support MSI yet. */ -if (!kvm_irqchip_in_kernel()) { -msi_supported = true; -} - return dev; } -- 1.7.3.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC][PATCH 1/2] kvm: Introduce basic MSI support in-kernel irqchips
From: Jan Kiszka jan.kis...@siemens.com This patch basically adds kvm_irqchip_send_msi, a service for sending arbitrary MSI messages to KVM's in-kernel irqchip models. As the current KVI API requires us to establish a static route from a pseudo GSI to the target MSI message and inject the MSI via toggling that GSI, we need to play some tricks to make this unfortunately interface transparent. We create those routes on demand and keep them in a hash table. Succeeding messages can then search for an existing route in the table first and reuse it whenever possible. If we should run out of limited GSIs, we simply flush the table and rebuild it as messages are sent. This approach is rather simple and could be optimized further. However, it is more efficient to enhance the KVM API so that we do not need this clumsy dynamic routing over futures kernels. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- kvm-all.c | 171 - kvm.h |1 + 2 files changed, 171 insertions(+), 1 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 3a1fa40..11ed19c 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -48,6 +48,8 @@ do { } while (0) #endif +#define KVM_MSI_HASHTAB_SIZE256 + typedef struct KVMSlot { target_phys_addr_t start_addr; @@ -59,6 +61,11 @@ typedef struct KVMSlot typedef struct kvm_dirty_log KVMDirtyLog; +typedef struct KVMMSIRoute { +struct kvm_irq_routing_entry kroute; +QTAILQ_ENTRY(KVMMSIRoute) entry; +} KVMMSIRoute; + struct KVMState { KVMSlot slots[32]; @@ -87,6 +94,7 @@ struct KVMState int nr_allocated_irq_routes; uint32_t *used_gsi_bitmap; unsigned int max_gsi; +QTAILQ_HEAD(msi_hashtab, KVMMSIRoute) msi_hashtab[KVM_MSI_HASHTAB_SIZE]; #endif }; @@ -862,9 +870,16 @@ static void set_gsi(KVMState *s, unsigned int gsi) s-used_gsi_bitmap[gsi / 32] |= 1U (gsi % 32); } +static void clear_gsi(KVMState *s, unsigned int gsi) +{ +assert(gsi s-max_gsi); + +s-used_gsi_bitmap[gsi / 32] = ~(1U (gsi % 32)); +} + static void kvm_init_irq_routing(KVMState *s) { -int gsi_count; +int gsi_count, i; gsi_count = kvm_check_extension(s, KVM_CAP_IRQ_ROUTING); if (gsi_count 0) { @@ -884,6 +899,10 @@ static void kvm_init_irq_routing(KVMState *s) s-irq_routes = g_malloc0(sizeof(*s-irq_routes)); s-nr_allocated_irq_routes = 0; +for (i = 0; i KVM_MSI_HASHTAB_SIZE; i++) { +QTAILQ_INIT(s-msi_hashtab[i]); +} + kvm_arch_init_irq_routing(s); } @@ -914,6 +933,54 @@ static void kvm_add_routing_entry(KVMState *s, set_gsi(s, entry-gsi); } +static void kvm_remove_routing_entry(KVMState *s, + struct kvm_irq_routing_entry *entry) +{ +struct kvm_irq_routing_entry *e; +int gsi = entry-gsi; +int i; + +for (i = 0; i s-irq_routes-nr; ++i) { +e = s-irq_routes-entries[i]; +if (e-type == entry-type e-gsi == gsi) { +switch (e-type) { +case KVM_IRQ_ROUTING_IRQCHIP: +if (e-u.irqchip.irqchip == entry-u.irqchip.irqchip +e-u.irqchip.pin == entry-u.irqchip.pin) { +goto found; +} +break; +case KVM_IRQ_ROUTING_MSI: +if (e-u.msi.address_lo == entry-u.msi.address_lo +e-u.msi.address_hi == entry-u.msi.address_hi +e-u.msi.data == entry-u.msi.data) { +goto found; +} +break; +default: +break; +} +} +} +/* route not found */ +return; + +found: +s-irq_routes-nr--; +*e = s-irq_routes-entries[s-irq_routes-nr]; + +/* If there are no other users of this GSI, release it. */ +for (i = 0; i s-irq_routes-nr; i++) { +e = s-irq_routes-entries[i]; +if (e-gsi == gsi) { +break; +} +} +if (i == s-irq_routes-nr) { +clear_gsi(s, gsi); +} +} + void kvm_irqchip_add_route(KVMState *s, int irq, int irqchip, int pin) { struct kvm_irq_routing_entry e; @@ -932,11 +999,113 @@ int kvm_irqchip_commit_routes(KVMState *s) return kvm_vm_ioctl(s, KVM_SET_GSI_ROUTING, s-irq_routes); } +static unsigned int kvm_hash_msi(uint32_t data) +{ +/* This is optimized for IA32 MSI layout. However, no other arch shall + * repeat the mistake of not providing a direct MSI injection API. */ +return data 0xff; +} + +static void kvm_flush_dynamic_msi_routes(KVMState *s) +{ +KVMMSIRoute *route, *next; +unsigned int hash; + +for (hash = 0; hash KVM_MSI_HASHTAB_SIZE; hash++) { +QTAILQ_FOREACH_SAFE(route, s-msi_hashtab[hash], entry, next) { +kvm_remove_routing_entry(s, route-kroute); +QTAILQ_REMOVE(s-msi_hashtab[hash], route, entry); +} +} +} + +static int kvm_get_pseudo_gsi(KVMState *s) +{ +uint32_t
Re: [PATCH v4 1/2] qemu-socket: change inet_connect() to to support nonblock socket
On Mon, Mar 19, 2012 at 10:11:54PM +0800, Amos Kong wrote: Change inet_connect(const char *str, int socktype) to inet_connect(const char *str, bool block, int *sock_err), socktype is unused, block is used to assign if set socket to block/nonblock, sock_err is used to restore socket error. Connect's successful for nonblock socket when following errors are returned: -EINPROGRESS or -EWOULDBLOCK -WSAEALREADY or -WSAEINVAL (win32) Also change the wrap function inet_connect_opts(QemuOpts *opts) to inet_connect_opts(QemuOpts *opts, int *sock_err). Add a bool entry(block) for dummy_opts to tag block type. Change nbd, vnc to use new interface. Signed-off-by: Amos Kong ak...@redhat.com Looks good, and non-blocking support for qemu-socket interfaces is pretty useful in and of itself. One general suggestion I would make is to use Error to pass back errors to the callers rather than an int*. Some additional comments below. --- nbd.c |2 +- qemu-char.c|2 +- qemu-sockets.c | 73 qemu_socket.h |4 ++- ui/vnc.c |2 +- 5 files changed, 62 insertions(+), 21 deletions(-) diff --git a/nbd.c b/nbd.c index 567e94e..ad4de06 100644 --- a/nbd.c +++ b/nbd.c @@ -146,7 +146,7 @@ int tcp_socket_outgoing(const char *address, uint16_t port) int tcp_socket_outgoing_spec(const char *address_and_port) { -return inet_connect(address_and_port, SOCK_STREAM); +return inet_connect(address_and_port, true, NULL); } int tcp_socket_incoming(const char *address, uint16_t port) diff --git a/qemu-char.c b/qemu-char.c index bb9e3f5..d3543ea 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -2443,7 +2443,7 @@ static CharDriverState *qemu_chr_open_socket(QemuOpts *opts) if (is_listen) { fd = inet_listen_opts(opts, 0); } else { -fd = inet_connect_opts(opts); +fd = inet_connect_opts(opts, NULL); } } if (fd 0) { diff --git a/qemu-sockets.c b/qemu-sockets.c index 6bcb8e3..8ed45f8 100644 --- a/qemu-sockets.c +++ b/qemu-sockets.c @@ -51,6 +51,9 @@ static QemuOptsList dummy_opts = { },{ .name = ipv6, .type = QEMU_OPT_BOOL, +},{ +.name = block, +.type = QEMU_OPT_BOOL, }, { /* end if list */ } }, @@ -194,14 +197,15 @@ listen: return slisten; } -int inet_connect_opts(QemuOpts *opts) +int inet_connect_opts(QemuOpts *opts, int *sock_err) { struct addrinfo ai,*res,*e; const char *addr; const char *port; char uaddr[INET6_ADDRSTRLEN+1]; char uport[33]; -int sock,rc; +int sock, rc, err; +bool block; memset(ai,0, sizeof(ai)); ai.ai_flags = AI_CANONNAME | AI_ADDRCONFIG; @@ -210,9 +214,11 @@ int inet_connect_opts(QemuOpts *opts) addr = qemu_opt_get(opts, host); port = qemu_opt_get(opts, port); +block = qemu_opt_get_bool(opts, block, 0); if (addr == NULL || port == NULL) { fprintf(stderr, inet_connect: host and/or port not specified\n); -return -1; +err = -EINVAL; +goto err; } if (qemu_opt_get_bool(opts, ipv4, 0)) @@ -224,7 +230,8 @@ int inet_connect_opts(QemuOpts *opts) if (0 != (rc = getaddrinfo(addr, port, ai, res))) { fprintf(stderr,getaddrinfo(%s,%s): %s\n, addr, port, gai_strerror(rc)); - return -1; +err = -EINVAL; +goto err; } for (e = res; e != NULL; e = e-ai_next) { @@ -241,21 +248,52 @@ int inet_connect_opts(QemuOpts *opts) continue; } setsockopt(sock,SOL_SOCKET,SO_REUSEADDR,(void*)on,sizeof(on)); - +if (!block) { +socket_set_nonblock(sock); +} /* connect to peer */ -if (connect(sock,e-ai_addr,e-ai_addrlen) 0) { -if (NULL == e-ai_next) -fprintf(stderr, %s: connect(%s,%s,%s,%s): %s\n, __FUNCTION__, -inet_strfamily(e-ai_family), -e-ai_canonname, uaddr, uport, strerror(errno)); -closesocket(sock); -continue; +do { +err = 0; +if (connect(sock, e-ai_addr, e-ai_addrlen) 0) { +err = -socket_error(); +if (block) { +break; +} Can still get an EINTR even in blocking mode, is breaking in that case intentional? +} +} while (err == -EINTR); + +if (err = 0) { +goto success; +} else if (!block (err == -EINPROGRESS || err == -EWOULDBLOCK)) { +goto success; EWOULDBLOCK should actually be an error: EWOULDBLOCK == EAGAIN: No more free local ports or insufficient entries in the routing cache. For AF_INET see the
RE: [Qemu-devel] [PATCH v4 5/7] RTC:Add RTC update-ended interrupt support
-Original Message- From: Stefano Stabellini [mailto:stefano.stabell...@eu.citrix.com] Sent: Wednesday, March 21, 2012 2:04 AM On Mon, 19 Mar 2012, Zhang, Yang Z wrote: Use a timer to emulate update cycle. When update cycle ended and UIE is setting, then raise an interrupt. The timer runs only when UF or AF is cleared. The idea is that if the user requests the update-ended interrupt (UIE) we setup a timer to inject it at the right time into the guest and another timer to update the UIP bit, correct? No, the timer runs whenever the UF and AF is cleared, not only UIE or AIE is set. But do we actually need the second timer? Why can't we just update the UIP bit whenever the user tries to read reg A, as we do when UIE is not set? The purpose of using two timer is trying to keep the UF, AF and UIP synchronous. User can poll UIP to check UF and AF bit. If we use timer for UF/AF bit track and check UIP by another way, since the timer will be fired with delay, then the problem is encountered: the UIP is cleared, but due to the delay of timer, the UF/AF bit is not set. So we need to check them on a same level. Although we can update UF/AF when reading it, the logic is too complicated, especially for AF bit. Also I am not sure it is a great idea to rename rtc_timer_update into periodic_timer_update in the same patch where you introduce this new logic. Ok, it make sense. best regards yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 0/7] RTC: New logic to emulate RTC
-Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] I attach a patch that fixes some problems with divider reset and in general simplifies the logic. Even with the patch, however, I still see failures in my test case unfortunately. Probably there are rounding errors somewhere. Actually, I also see some failures during testing. And most of them are fail to pass the 244us update cycle checking. Since we are in emulation environment, we cannot ensure it always greater than 244us. So I think it should not be a bug. Also, maybe there are some other bugs, I will do more testing. Also (minor change) you need to use rtc_clock instead of host_clock. I'm afraid that we're hitting a wall. :( The problem I have is that the logic in your patch is very complex and, without understanding it well, I'm afraid we'll never be able to fix it completely (while the old inefficient one is buggy but it _can_ be fixed). By the way in general the SET bit and the divider should be handled in the same way, because both stop the updates. That is, in general there should be a function like static inline bool rtc_running(RTCState *s) { return (!(s-cmos_data[RTC_REG_B] REG_B_SET) (s-cmos_data[RTC_REG_A] 0x70) == 0x20); } that is used instead of testing REG_B_SET. I pushed some work along these lines at git://github.com/bonzini/qemu.git (branch rtc-intel), but it does not really improve the situation with respect to rounding errors so the bug must be elsewhere. You are right, I missed this logic. Will add it in next version. Best regrads yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/7] RTC: New logic to emulate RTC
Il 22/03/2012 01:23, Zhang, Yang Z ha scritto: Actually, I also see some failures during testing. And most of them are fail to pass the 244us update cycle checking. Since we are in emulation environment, we cannot ensure it always greater than 244us. So I think it should not be a bug. Yes, that's fine. But I also see some cases in which the first upgrade cycle after divider reset is not reported. The second, 1.5 seconds after divider reset, is fine. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 0/2 v3] kvm: notify host when guest panicked
At 03/22/2012 03:19 AM, Anthony Liguori Wrote: On 03/21/2012 11:25 AM, Avi Kivity wrote: On 03/21/2012 06:18 PM, Corey Minyard wrote: Look at drivers/char/ipmi/ipmi_msghandler.c. It has code to send panic event over IMPI. The code is pretty complex. Of course if we a going to implement something more complex than simple hypercall for panic notification we better do something more interesting with it than just saying panic happened, like sending stack traces on all cpus for instance. I doubt that's the best example, unfortunately. The IPMI event log has limited space and it has to be send a little piece at a time since each log entry is 14 bytes. It just prints the panic string, nothing else. Not that it isn't useful, it has saved my butt before. You have lots of interesting options with paravirtualization. You could, for instance, create a console driver that delivered all console output efficiently through a hypercall. That would be really easy. Or, as you mention, a custom way to deliver panic information. Collecting information like stack traces would be harder to accomplish, as I don't think there is currently a way to get it except by sending it to printk. That already exists; virtio-console (or serial console emulation) can do the job. I think the use case here is pretty straight forward: if the guest finds itself in bad place, it wants to indicate that to the host. We shouldn't rely on any device drivers or complex code. It should be as close to a single instruction as possible that can run even if interrupts are disabled. An out instruction fits this very well. I think a simple protocol like: This solution is more simple than using virtio-serial. inl PORT - returns a magic number indicating the presence of qemucalls I donot understantd this instruction's purpose. inl PORT+1 - returns a bitmap of supported features Hmm, we can execute this instruction when guest starts. If the userspace does not process panicked event, there is no need to notify it. outl PORT+1 - data reg1 outl PORT+2 - data reg2 outl PORT+N - data regN We can get the register value from vmcs. So there is no need to tell the register value to the host. If we decide to avoid touching hypervisor, I agree with this solution. Thanks Wen Congyang outl PORT - qemucall of index value with arguments 1..N Regards, Anthony Liguori In fact the feature can be implemented 100% host side by searching for a panic string signature in the console logs. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost question
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Just some further information concerning my earlier question concerning vhost and virtio. I'm using virtio to implement an emulated mac80211 device in the guest. A simple network simulation will be used to control delivery of frames between guests and for this I am using the vhost approach. A simple first-cut attempt at the tx and rx kick handlers are given below. When the guest transmits frames the vhost's TX kick handler is executed and copies the buffers onto a queue for the intended recipient(s). When the vhost's RX kick handler is run it copies the buffer from the queue and notifies the client that the buffers have been used. The problem is that if there are no frames in the queue when the guest rx kick handler runs then it has to exit and I have to arrange that it runs again. That's done in the current prototype by having the guests poll using a timer - which is ugly and inefficient. Can I get the vhost tx kick handler to wake the appropriate vhost rx kick handler? How can I achieve this? Many thanks, Steve static void handle_rx(struct vhost_work *work) { int n; unsigned out, in, frames; struct transmission *t; struct vhost_poll *p = container_of(work, struct vhost_poll, work); struct vhost_virtqueue *vq = container_of(p, struct vhost_virtqueue, poll); struct vhost_node *node = container_of(vq, struct vhost_node, vqs[WLAN_VQ_RX]); struct vhost_dev *dev = node-vdev; mutex_lock(vq-mutex); vhost_disable_notify(dev, vq); while (!queue_empty(node-rxq)) { n = vhost_get_vq_desc(dev, vq, vq-iov, ARRAY_SIZE(vq-iov), out, in, NULL, NULL); if (0 n || n == vq-num) break; t = queue_pop(node-rxq); BUG_ON(copy_to_user(vq-iov[0].iov_base, t-buf, t-buf_sz)); vq-iov[0].iov_len = t-buf_sz; vhost_add_used(vq, n, out); transmission_free(t); ++frames; } if (frames) vhost_signal(dev, vq); vhost_enable_notify(dev, vq); mutex_unlock(vq-mutex); } static void handle_tx(struct vhost_work *work) { int n; unsigned out, in; struct transmission *t; struct vhost_node *receiver; struct vhost_poll *p = container_of(work, struct vhost_poll, work); struct vhost_virtqueue *vq = container_of(p, struct vhost_virtqueue, poll); struct vhost_node *w = container_of(vq, struct vhost_node, vqs[WLAN_VQ_TX]); struct vhost_dev *dev = w-vdev; mutex_lock(vq-mutex); do { vhost_disable_notify(dev, vq); n = vhost_get_vq_desc(dev, vq, vq-iov, ARRAY_SIZE(vq-iov), out, in, NULL, NULL); while (n = 0 n != vq-num) { receiver = net_get_receiver(w); if ((receiver) (t = transmission_alloc())) { BUG_ON(copy_from_user(t-buf, vq-iov[1].iov_base, vq-iov[1].iov_len)); t-buf_sz = vq-iov[1].iov_len; queue_push(receiver-rxq, t); // ToDo: kick receiver's handle_rx } vhost_add_used(vq, n, out); n = vhost_get_vq_desc(dev, vq, vq-iov, ARRAY_SIZE(vq-iov), out, in, NULL, NULL); } vhost_signal(dev, vq); } while (vhost_enable_notify(dev, vq)); mutex_unlock(vq-mutex); } -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9qhO4ACgkQW7aAm65EWy7w4wCgrzGB2Zit4rWUzMjwpJEJnIfj xDsAoLBDMj+4MVrjPS5upgDSIGOi4IzL =Ms/+ -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe kvm in the body of
RE: [PATCH v4 0/7] RTC: New logic to emulate RTC
-Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] Il 22/03/2012 01:23, Zhang, Yang Z ha scritto: Actually, I also see some failures during testing. And most of them are fail to pass the 244us update cycle checking. Since we are in emulation environment, we cannot ensure it always greater than 244us. So I think it should not be a bug. Yes, that's fine. But I also see some cases in which the first upgrade cycle after divider reset is not reported. The second, 1.5 seconds after divider reset, is fine. I think your patch can fix it. Right? Best regards yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 0/7] RTC: New logic to emulate RTC
Il 22/03/2012 04:03, Zhang, Yang Z ha scritto: Yes, that's fine. But I also see some cases in which the first upgrade cycle after divider reset is not reported. The second, 1.5 seconds after divider reset, is fine. I think your patch can fix it. Right? No. :( It makes it a lot less likely, but still occurs every 2-3 runs. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v4 0/7] RTC: New logic to emulate RTC
-Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] Sent: Thursday, March 22, 2012 11:05 AM To: Zhang, Yang Z Cc: qemu-de...@nongnu.org; aligu...@us.ibm.com; kvm@vger.kernel.org Subject: Re: [PATCH v4 0/7] RTC: New logic to emulate RTC Il 22/03/2012 04:03, Zhang, Yang Z ha scritto: Yes, that's fine. But I also see some cases in which the first upgrade cycle after divider reset is not reported. The second, 1.5 seconds after divider reset, is fine. I think your patch can fix it. Right? No. :( It makes it a lot less likely, but still occurs every 2-3 runs. Ok, I will look into it. best regards yang -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 1/2] qemu-socket: change inet_connect() to to support nonblock socket
- Original Message - On Mon, Mar 19, 2012 at 10:11:54PM +0800, Amos Kong wrote: Change inet_connect(const char *str, int socktype) to inet_connect(const char *str, bool block, int *sock_err), socktype is unused, block is used to assign if set socket to block/nonblock, sock_err is used to restore socket error. Connect's successful for nonblock socket when following errors are returned: -EINPROGRESS or -EWOULDBLOCK -WSAEALREADY or -WSAEINVAL (win32) Also change the wrap function inet_connect_opts(QemuOpts *opts) to inet_connect_opts(QemuOpts *opts, int *sock_err). Add a bool entry(block) for dummy_opts to tag block type. Change nbd, vnc to use new interface. Signed-off-by: Amos Kong ak...@redhat.com Looks good, and non-blocking support for qemu-socket interfaces is pretty useful in and of itself. One general suggestion I would make is to use Error to pass back errors to the callers rather than an int*. I have implemented it, thanks Some additional comments below. --- nbd.c |2 +- qemu-char.c|2 +- qemu-sockets.c | 73 qemu_socket.h |4 ++- ui/vnc.c |2 +- 5 files changed, 62 insertions(+), 21 deletions(-) diff --git a/nbd.c b/nbd.c index 567e94e..ad4de06 100644 --- a/nbd.c +++ b/nbd.c @@ -146,7 +146,7 @@ int tcp_socket_outgoing(const char *address, uint16_t port) int tcp_socket_outgoing_spec(const char *address_and_port) { -return inet_connect(address_and_port, SOCK_STREAM); +return inet_connect(address_and_port, true, NULL); } int tcp_socket_incoming(const char *address, uint16_t port) diff --git a/qemu-char.c b/qemu-char.c index bb9e3f5..d3543ea 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -2443,7 +2443,7 @@ static CharDriverState *qemu_chr_open_socket(QemuOpts *opts) if (is_listen) { fd = inet_listen_opts(opts, 0); } else { -fd = inet_connect_opts(opts); +fd = inet_connect_opts(opts, NULL); } } if (fd 0) { diff --git a/qemu-sockets.c b/qemu-sockets.c index 6bcb8e3..8ed45f8 100644 --- a/qemu-sockets.c +++ b/qemu-sockets.c @@ -51,6 +51,9 @@ static QemuOptsList dummy_opts = { },{ .name = ipv6, .type = QEMU_OPT_BOOL, +},{ +.name = block, +.type = QEMU_OPT_BOOL, }, { /* end if list */ } }, @@ -194,14 +197,15 @@ listen: return slisten; } -int inet_connect_opts(QemuOpts *opts) +int inet_connect_opts(QemuOpts *opts, int *sock_err) { struct addrinfo ai,*res,*e; const char *addr; const char *port; char uaddr[INET6_ADDRSTRLEN+1]; char uport[33]; -int sock,rc; +int sock, rc, err; +bool block; memset(ai,0, sizeof(ai)); ai.ai_flags = AI_CANONNAME | AI_ADDRCONFIG; @@ -210,9 +214,11 @@ int inet_connect_opts(QemuOpts *opts) addr = qemu_opt_get(opts, host); port = qemu_opt_get(opts, port); +block = qemu_opt_get_bool(opts, block, 0); if (addr == NULL || port == NULL) { fprintf(stderr, inet_connect: host and/or port not specified\n); -return -1; +err = -EINVAL; +goto err; } if (qemu_opt_get_bool(opts, ipv4, 0)) @@ -224,7 +230,8 @@ int inet_connect_opts(QemuOpts *opts) if (0 != (rc = getaddrinfo(addr, port, ai, res))) { fprintf(stderr,getaddrinfo(%s,%s): %s\n, addr, port, gai_strerror(rc)); - return -1; +err = -EINVAL; +goto err; } for (e = res; e != NULL; e = e-ai_next) { @@ -241,21 +248,52 @@ int inet_connect_opts(QemuOpts *opts) continue; } setsockopt(sock,SOL_SOCKET,SO_REUSEADDR,(void*)on,sizeof(on)); - +if (!block) { +socket_set_nonblock(sock); +} /* connect to peer */ -if (connect(sock,e-ai_addr,e-ai_addrlen) 0) { -if (NULL == e-ai_next) -fprintf(stderr, %s: connect(%s,%s,%s,%s): %s\n, __FUNCTION__, -inet_strfamily(e-ai_family), -e-ai_canonname, uaddr, uport, strerror(errno)); -closesocket(sock); -continue; +do { +err = 0; +if (connect(sock, e-ai_addr, e-ai_addrlen) 0) { +err = -socket_error(); +if (block) { +break; Above code is used to keep behavior same as original, return this error(connect will be failed) analysis about process EINTR : http://www.madore.org/~david/computers/connect-intr.html If connect() is interrupted by a signal that is caught while blocked waiting to establish a connection,
[PATCH v5 0/2] support to migrate with IPv6 address
Those patches make tcp migration use the help functions in qemu-socket.c for support IPv6 migration. Changes from v1: - split different changes to small patches, it will be easier to review - fixed some problem according to Kevin's comment Changes from v2: - fix issue of returning real error - set s-fd to -1 when parse fails, won't call migrate_fd_error() Changes from v3: - try to use help functions in qemu-socket.c Changes from v4: - introduce set_socket_error() to restore real errno - fix connect error process --- Amos Kong (4): sockets: introduce set_socket_error() qemu-socket: change inet_connect() to to support nonblock socket sockets: pass back errors in inet_listen() use inet_listen()/inet_connect() to support ipv6 migration migration-tcp.c | 74 +-- nbd.c |2 + qemu-sockets.c | 87 ++- qemu_socket.h |4 ++- ui/vnc.c|2 + 5 files changed, 92 insertions(+), 77 deletions(-) -- Amos Kong -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 1/4] sockets: introduce set_socket_error()
Introduce set_socket_error() to set the errno, use WSASetLastError() for win32. Sometimes, clean work would rewrite errno in error path, we can use this function to restore real errno. Signed-off-by: Amos Kong ak...@redhat.com --- qemu_socket.h |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/qemu_socket.h b/qemu_socket.h index fe4cf6c..a4c5170 100644 --- a/qemu_socket.h +++ b/qemu_socket.h @@ -8,6 +8,7 @@ #include ws2tcpip.h #define socket_error() WSAGetLastError() +#define set_socket_error(e) WSASetLastError(e) #undef EINTR #define EWOULDBLOCK WSAEWOULDBLOCK #define EINTR WSAEINTR @@ -26,6 +27,7 @@ int inet_aton(const char *cp, struct in_addr *ia); #include sys/un.h #define socket_error() errno +#define set_socket_error(e) errno = e #define closesocket(s) close(s) #endif /* !_WIN32 */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 2/4] qemu-socket: change inet_connect() to to support nonblock socket
Change inet_connect(const char *str, int socktype) to inet_connect(const char *str, bool block), socktype is unused, block is used to assign if set socket to block/nonblock. Retry to connect when -EINTR/-EWOULDBLOCK is got. Connect's successful for nonblock socket when following errors are got: -EINPROGRESS -WSAEALREADY/-WSAEINVAL (win32) Add a bool entry(block) for dummy_opts to tag block type. Use set_socket_error() to set real errno, set errno to EINVAL for parse error. Change nbd, vnc to use new interface. Signed-off-by: Amos Kong ak...@redhat.com --- nbd.c |2 +- qemu-sockets.c | 66 +++- qemu_socket.h |2 +- ui/vnc.c |2 +- 4 files changed, 54 insertions(+), 18 deletions(-) diff --git a/nbd.c b/nbd.c index 567e94e..3618344 100644 --- a/nbd.c +++ b/nbd.c @@ -146,7 +146,7 @@ int tcp_socket_outgoing(const char *address, uint16_t port) int tcp_socket_outgoing_spec(const char *address_and_port) { -return inet_connect(address_and_port, SOCK_STREAM); +return inet_connect(address_and_port, true); } int tcp_socket_incoming(const char *address, uint16_t port) diff --git a/qemu-sockets.c b/qemu-sockets.c index 6bcb8e3..908479e 100644 --- a/qemu-sockets.c +++ b/qemu-sockets.c @@ -51,6 +51,9 @@ static QemuOptsList dummy_opts = { },{ .name = ipv6, .type = QEMU_OPT_BOOL, +},{ +.name = block, +.type = QEMU_OPT_BOOL, }, { /* end if list */ } }, @@ -201,7 +204,8 @@ int inet_connect_opts(QemuOpts *opts) const char *port; char uaddr[INET6_ADDRSTRLEN+1]; char uport[33]; -int sock,rc; +int sock, rc, err; +bool block; memset(ai,0, sizeof(ai)); ai.ai_flags = AI_CANONNAME | AI_ADDRCONFIG; @@ -210,9 +214,11 @@ int inet_connect_opts(QemuOpts *opts) addr = qemu_opt_get(opts, host); port = qemu_opt_get(opts, port); +block = qemu_opt_get_bool(opts, block, 0); if (addr == NULL || port == NULL) { fprintf(stderr, inet_connect: host and/or port not specified\n); -return -1; +err = -EINVAL; +goto err; } if (qemu_opt_get_bool(opts, ipv4, 0)) @@ -224,7 +230,8 @@ int inet_connect_opts(QemuOpts *opts) if (0 != (rc = getaddrinfo(addr, port, ai, res))) { fprintf(stderr,getaddrinfo(%s,%s): %s\n, addr, port, gai_strerror(rc)); - return -1; +err = -EINVAL; +goto err; } for (e = res; e != NULL; e = e-ai_next) { @@ -241,21 +248,44 @@ int inet_connect_opts(QemuOpts *opts) continue; } setsockopt(sock,SOL_SOCKET,SO_REUSEADDR,(void*)on,sizeof(on)); - +if (!block) { +socket_set_nonblock(sock); +} /* connect to peer */ -if (connect(sock,e-ai_addr,e-ai_addrlen) 0) { -if (NULL == e-ai_next) -fprintf(stderr, %s: connect(%s,%s,%s,%s): %s\n, __FUNCTION__, -inet_strfamily(e-ai_family), -e-ai_canonname, uaddr, uport, strerror(errno)); -closesocket(sock); -continue; +do { +err = 0; +if (connect(sock, e-ai_addr, e-ai_addrlen) 0) { +err = -socket_error(); +} +} while (err == -EINTR || err == -EWOULDBLOCK); + +if (err = 0) { +goto success; +} else if (!block err == -EINPROGRESS) { +goto success; +#ifdef _WIN32 +} else if (!block (err == -WSAEALREADY || err == -WSAEINVAL)) { +goto success; +#endif } -freeaddrinfo(res); -return sock; + +if (NULL == e-ai_next) { +fprintf(stderr, %s: connect(%s,%s,%s,%s): %s\n, __func__, +inet_strfamily(e-ai_family), +e-ai_canonname, uaddr, uport, strerror(errno)); +} +closesocket(sock); } freeaddrinfo(res); + +err: +set_socket_error(-err); return -1; + +success: +freeaddrinfo(res); +set_socket_error(-err); +return sock; } int inet_dgram_opts(QemuOpts *opts) @@ -449,14 +479,20 @@ int inet_listen(const char *str, char *ostr, int olen, return sock; } -int inet_connect(const char *str, int socktype) +int inet_connect(const char *str, bool block) { QemuOpts *opts; int sock = -1; opts = qemu_opts_create(dummy_opts, NULL, 0); -if (inet_parse(opts, str) == 0) +if (inet_parse(opts, str) == 0) { +if (block) { +qemu_opt_set(opts, block, on); +} sock = inet_connect_opts(opts); +} else { +set_socket_error(EINVAL); +} qemu_opts_del(opts); return sock; } diff --git a/qemu_socket.h b/qemu_socket.h index a4c5170..f86cd3f 100644 --- a/qemu_socket.h +++ b/qemu_socket.h @@ -47,7 +47,7 @@ int inet_listen_opts(QemuOpts *opts, int port_offset); int
[PATCH v5 3/4] sockets: pass back errors in inet_listen()
Use set_socket_error() to restore real erron, set errno to EINVAL for parse error. Signed-off-by: Amos Kong ak...@redhat.com --- qemu-sockets.c | 21 - 1 files changed, 16 insertions(+), 5 deletions(-) diff --git a/qemu-sockets.c b/qemu-sockets.c index 908479e..f1c6524 100644 --- a/qemu-sockets.c +++ b/qemu-sockets.c @@ -110,7 +110,7 @@ int inet_listen_opts(QemuOpts *opts, int port_offset) char port[33]; char uaddr[INET6_ADDRSTRLEN+1]; char uport[33]; -int slisten, rc, to, port_min, port_max, p; +int slisten, rc, to, port_min, port_max, p, err; memset(ai,0, sizeof(ai)); ai.ai_flags = AI_PASSIVE | AI_ADDRCONFIG; @@ -120,7 +120,8 @@ int inet_listen_opts(QemuOpts *opts, int port_offset) if ((qemu_opt_get(opts, host) == NULL) || (qemu_opt_get(opts, port) == NULL)) { fprintf(stderr, %s: host and/or port not specified\n, __FUNCTION__); -return -1; +err = -EINVAL; +goto err; } pstrcpy(port, sizeof(port), qemu_opt_get(opts, port)); addr = qemu_opt_get(opts, host); @@ -138,7 +139,8 @@ int inet_listen_opts(QemuOpts *opts, int port_offset) if (rc != 0) { fprintf(stderr,getaddrinfo(%s,%s): %s\n, addr, port, gai_strerror(rc)); -return -1; +err = -EINVAL; +goto err; } /* create socket + bind */ @@ -150,6 +152,7 @@ int inet_listen_opts(QemuOpts *opts, int port_offset) if (slisten 0) { fprintf(stderr,%s: socket(%s): %s\n, __FUNCTION__, inet_strfamily(e-ai_family), strerror(errno)); +err = -socket_error(); continue; } @@ -169,6 +172,7 @@ int inet_listen_opts(QemuOpts *opts, int port_offset) if (bind(slisten, e-ai_addr, e-ai_addrlen) == 0) { goto listen; } +err = -socket_error(); if (p == port_max) { fprintf(stderr,%s: bind(%s,%s,%d): %s\n, __FUNCTION__, inet_strfamily(e-ai_family), uaddr, inet_getport(e), @@ -179,14 +183,15 @@ int inet_listen_opts(QemuOpts *opts, int port_offset) } fprintf(stderr, %s: FAILED\n, __FUNCTION__); freeaddrinfo(res); -return -1; +goto err; listen: if (listen(slisten,1) != 0) { +err = -socket_error(); perror(listen); closesocket(slisten); freeaddrinfo(res); -return -1; +goto err; } snprintf(uport, sizeof(uport), %d, inet_getport(e) - port_offset); qemu_opt_set(opts, host, uaddr); @@ -195,6 +200,10 @@ listen: qemu_opt_set(opts, ipv4, (e-ai_family != PF_INET6) ? on : off); freeaddrinfo(res); return slisten; + +err: +set_socket_error(-err); +return -1; } int inet_connect_opts(QemuOpts *opts) @@ -474,6 +483,8 @@ int inet_listen(const char *str, char *ostr, int olen, optstr ? optstr : ); } } +} else { +set_socket_error(EINVAL); } qemu_opts_del(opts); return sock; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 4/4] use inet_listen()/inet_connect() to support ipv6 migration
Use help functions in qemu-socket.c for tcp migration, which already support ipv6 addresses. For IPv6 brackets must be mandatory if you require a port. Referencing to RFC5952, the recommended format is: [2312::8274]:5200 test status: Successed listen side: qemu-kvm -incoming tcp:[2312::8274]:5200 client side: qemu-kvm ... (qemu) migrate -d tcp:[2312::8274]:5200 Signed-off-by: Amos Kong ak...@redhat.com --- migration-tcp.c | 74 +++ 1 files changed, 20 insertions(+), 54 deletions(-) diff --git a/migration-tcp.c b/migration-tcp.c index 35a5781..251d955 100644 --- a/migration-tcp.c +++ b/migration-tcp.c @@ -81,43 +81,32 @@ static void tcp_wait_for_connect(void *opaque) int tcp_start_outgoing_migration(MigrationState *s, const char *host_port) { -struct sockaddr_in addr; -int ret; - -ret = parse_host_port(addr, host_port); -if (ret 0) { -return ret; -} +int err; s-get_error = socket_errno; s-write = socket_write; s-close = tcp_close; -s-fd = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (s-fd == -1) { -DPRINTF(Unable to open socket); -return -socket_error(); -} - -socket_set_nonblock(s-fd); +s-fd = inet_connect(host_port, false); +err = -socket_error(); -do { -ret = connect(s-fd, (struct sockaddr *)addr, sizeof(addr)); -if (ret == -1) { -ret = -socket_error(); -} -if (ret == -EINPROGRESS || ret == -EWOULDBLOCK) { -qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s); -return 0; +if (err == -EINPROGRESS) { +DPRINTF(connect in progress); +qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s); +#ifdef _WIN32 +} else if (err == -WSAEALREADY || err == -WSAEINVAL) { +DPRINTF(connect in progress); +qemu_set_fd_handler2(s-fd, NULL, NULL, tcp_wait_for_connect, s); +#endif +} else if (err 0) { +DPRINTF(connect failed: %s\n, strerror(-err)); +if (s-fd != -1) { +migrate_fd_error(s); } -} while (ret == -EINTR); - -if (ret 0) { -DPRINTF(connect failed\n); -migrate_fd_error(s); -return ret; +return err; +} else { +migrate_fd_connect(s); } -migrate_fd_connect(s); return 0; } @@ -157,38 +146,15 @@ out2: int tcp_start_incoming_migration(const char *host_port) { -struct sockaddr_in addr; -int val; int s; -DPRINTF(Attempting to start an incoming migration\n); - -if (parse_host_port(addr, host_port) 0) { -fprintf(stderr, invalid host/port combination: %s\n, host_port); -return -EINVAL; -} - -s = qemu_socket(PF_INET, SOCK_STREAM, 0); -if (s == -1) { +s = inet_listen(host_port, NULL, 256, SOCK_STREAM, 0); +if (s 0) { return -socket_error(); } -val = 1; -setsockopt(s, SOL_SOCKET, SO_REUSEADDR, (const char *)val, sizeof(val)); - -if (bind(s, (struct sockaddr *)addr, sizeof(addr)) == -1) { -goto err; -} -if (listen(s, 1) == -1) { -goto err; -} - qemu_set_fd_handler2(s, NULL, tcp_accept_incoming_migration, NULL, (void *)(intptr_t)s); return 0; - -err: -close(s); -return -socket_error(); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/38] powerpc/booke: Provide exception macros with interrupt name
On 03/21/2012 01:04 PM, Kumar Gala wrote: On Feb 28, 2012, at 6:09 PM, Alexander Graf wrote: From: Scott Wood scottw...@freescale.com DO_KVM will need to identify the particular exception type. There is an existing set of arbitrary numbers that Linux passes, but it's an undocumented mess that sort of corresponds to server/classic exception vectors but not really. So what do the new names correspond to? The names are Linux-defined. The values are IVOR numbers. What header is defining MACHINE_CHECK, BOOKE_INTERRUPT_EXTERNAL, etc (asm/kvm_asm.h)? Yes, it's asm/kvm_asm.h at the moment. It's actually BOOKE_INTERRUPT_MACHINE_CHECK, etc. The exception macros paste on BOOKE_INTERRUPT_ when they use it, to keep the macro users from having to wrap lines even more often. If so we really should move these out of asm/kvm_asm.h and into something a bit more appropriate. Yes, that was one of the things I had been planning to fix post-RFC. Any preference what header it should be in? asm/reg_booke.h seems to be the least bad option of the existing headers -- or perhaps move head_booke.h to arch/powerpc/include/asm. Long-term it would also be nice for this to replace, rather than supplement, the current numbers, though as Ben pointed out there are a number of places throughout the code that will have to be fixed for that to happen. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 12/38] powerpc/booke: Provide exception macros with interrupt name
On Mar 21, 2012, at 1:19 PM, Scott Wood wrote: On 03/21/2012 01:04 PM, Kumar Gala wrote: On Feb 28, 2012, at 6:09 PM, Alexander Graf wrote: From: Scott Wood scottw...@freescale.com DO_KVM will need to identify the particular exception type. There is an existing set of arbitrary numbers that Linux passes, but it's an undocumented mess that sort of corresponds to server/classic exception vectors but not really. So what do the new names correspond to? The names are Linux-defined. The values are IVOR numbers. What header is defining MACHINE_CHECK, BOOKE_INTERRUPT_EXTERNAL, etc (asm/kvm_asm.h)? Yes, it's asm/kvm_asm.h at the moment. It's actually BOOKE_INTERRUPT_MACHINE_CHECK, etc. The exception macros paste on BOOKE_INTERRUPT_ when they use it, to keep the macro users from having to wrap lines even more often. If so we really should move these out of asm/kvm_asm.h and into something a bit more appropriate. Yes, that was one of the things I had been planning to fix post-RFC. Any preference what header it should be in? asm/reg_booke.h seems to be the least bad option of the existing headers -- or perhaps move head_booke.h to arch/powerpc/include/asm. asm/reg_booke.h seems the least painful right now. head_booke.h is only used on 32-bit so that's not the best choice at this point. We could create an asm/head_booke.h in addition to the one in arch/powerpc/kernel/ Long-term it would also be nice for this to replace, rather than supplement, the current numbers, though as Ben pointed out there are a number of places throughout the code that will have to be fixed for that to happen. Yeah, not too worried about that. More just wanting it to be clear what one has to 'update' if adding support for a new exception. - k-- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html