Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 06:23:34PM -0300, Marcelo Tosatti wrote: > On Wed, Jul 18, 2012 at 02:28:34PM -0600, Alex Williamson wrote: > > > turn on lockdep to remember why I couldn't sleep there. > > > > switching to a mutex results in: > > > > BUG: sleeping function called from invalid context at kernel/mutex.c:269 > > in_atomic(): 1, irqs_disabled(): 0, pid: 30025, name: qemu-system-x86 > > INFO: lockdep is turned off. > > Pid: 30025, comm: qemu-system-x86 Not tainted 3.5.0-rc4+ #109 > > Call Trace: > > [] __might_sleep+0xf5/0x130 > > [] mutex_lock_nested+0x2f/0x60 > > [] eoifd_event+0x25/0x70 [kvm] > > [] kvm_notify_acked_irq+0xa4/0x140 [kvm] > > [] ? kvm_notify_acked_irq+0x2a/0x140 [kvm] > > [] kvm_ioapic_update_eoi+0x84/0xf0 [kvm] > > [] apic_set_eoi+0x123/0x130 [kvm] > > [] apic_reg_write+0x388/0x670 [kvm] > > [] ? vcpu_enter_guest+0x32c/0x740 [kvm] > > [] kvm_lapic_set_eoi+0x21/0x30 [kvm] > > [] handle_apic_access+0x69/0x80 [kvm_intel] > > [] vmx_handle_exit+0xaa/0x260 [kvm_intel] > > Its RCU from ack notifiers, OK. I'm testing a patch that moves all bitmap handling to under pic/ioapic lock. After this we can teach kvm_set_irq to report when a bit that is cleared/set is already clear/set, without races. And then no tracking will be necessary in irqfd - we can just call kvm_set_irq(..., 0) and look at the return status. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 02:28:34PM -0600, Alex Williamson wrote: > > turn on lockdep to remember why I couldn't sleep there. > > switching to a mutex results in: > > BUG: sleeping function called from invalid context at kernel/mutex.c:269 > in_atomic(): 1, irqs_disabled(): 0, pid: 30025, name: qemu-system-x86 > INFO: lockdep is turned off. > Pid: 30025, comm: qemu-system-x86 Not tainted 3.5.0-rc4+ #109 > Call Trace: > [] __might_sleep+0xf5/0x130 > [] mutex_lock_nested+0x2f/0x60 > [] eoifd_event+0x25/0x70 [kvm] > [] kvm_notify_acked_irq+0xa4/0x140 [kvm] > [] ? kvm_notify_acked_irq+0x2a/0x140 [kvm] > [] kvm_ioapic_update_eoi+0x84/0xf0 [kvm] > [] apic_set_eoi+0x123/0x130 [kvm] > [] apic_reg_write+0x388/0x670 [kvm] > [] ? vcpu_enter_guest+0x32c/0x740 [kvm] > [] kvm_lapic_set_eoi+0x21/0x30 [kvm] > [] handle_apic_access+0x69/0x80 [kvm_intel] > [] vmx_handle_exit+0xaa/0x260 [kvm_intel] Its RCU from ack notifiers, OK. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, 2012-07-18 at 13:13 -0600, Alex Williamson wrote: > On Wed, 2012-07-18 at 13:07 -0600, Alex Williamson wrote: > > On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote: > > > On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote: > > > > > > > > > Back to original point though current > > > > > > > > > situation is that calling kvm_set_irq() under spinlock is not > > > > > > > > > worse for > > > > > > > > > scalability than calling it not under one. > > > > > > > > > > > > > > > > Yes. Still the specific use can just use an atomic flag, > > > > > > > > lock+bool is not needed, and we won't need to undo it later. > > > > > > > > > > > > > > > > > > > > > Actually, no, replacing it with an atomic is racy. > > > > > > > > > > > > > > CPU0 (inject) CPU1 (EOI) > > > > > > > atomic_cmpxchg(, 0, 1) > > > > > > > atomic_cmpxchg(, 1, > > > > > > > 0) > > > > > > > kvm_set_irq(0) > > > > > > > kvm_set_irq(1) > > > > > > > eventfd_signal > > > > > > > > > > > > > > The interrupt is now stuck on until another interrupt is injected. > > > > > > > > > > > > > > > > > > > Well EOI somehow happened here before interrupt so it's a bug > > > > > > somewhere > > > > > > else? > > > > > > > > > > Interrupts can be shared. We also can't guarantee that the guest > > > > > won't > > > > > write a bogus EOI to the ioapic. The irq ack notifier doesn't filter > > > > > on > > > > > irq source id... I'm not sure it can. > > > > > > > > I guess if Avi OKs adding another kvm_set_irq under spinlock that's > > > > the best we can do for now. > > > > > > Why can't a mutex be used instead of a spinlock again? > > > > eventfd_signal calls the inject function from atomic context. > > Actually, that's called from a workq. I'll have to switch it back and > turn on lockdep to remember why I couldn't sleep there. switching to a mutex results in: BUG: sleeping function called from invalid context at kernel/mutex.c:269 in_atomic(): 1, irqs_disabled(): 0, pid: 30025, name: qemu-system-x86 INFO: lockdep is turned off. Pid: 30025, comm: qemu-system-x86 Not tainted 3.5.0-rc4+ #109 Call Trace: [] __might_sleep+0xf5/0x130 [] mutex_lock_nested+0x2f/0x60 [] eoifd_event+0x25/0x70 [kvm] [] kvm_notify_acked_irq+0xa4/0x140 [kvm] [] ? kvm_notify_acked_irq+0x2a/0x140 [kvm] [] kvm_ioapic_update_eoi+0x84/0xf0 [kvm] [] apic_set_eoi+0x123/0x130 [kvm] [] apic_reg_write+0x388/0x670 [kvm] [] ? vcpu_enter_guest+0x32c/0x740 [kvm] [] kvm_lapic_set_eoi+0x21/0x30 [kvm] [] handle_apic_access+0x69/0x80 [kvm_intel] [] vmx_handle_exit+0xaa/0x260 [kvm_intel] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 01:13:06PM -0600, Alex Williamson wrote: > On Wed, 2012-07-18 at 13:07 -0600, Alex Williamson wrote: > > On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote: > > > On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote: > > > > > > > > > Back to original point though current > > > > > > > > > situation is that calling kvm_set_irq() under spinlock is not > > > > > > > > > worse for > > > > > > > > > scalability than calling it not under one. > > > > > > > > > > > > > > > > Yes. Still the specific use can just use an atomic flag, > > > > > > > > lock+bool is not needed, and we won't need to undo it later. > > > > > > > > > > > > > > > > > > > > > Actually, no, replacing it with an atomic is racy. > > > > > > > > > > > > > > CPU0 (inject) CPU1 (EOI) > > > > > > > atomic_cmpxchg(, 0, 1) > > > > > > > atomic_cmpxchg(, 1, > > > > > > > 0) > > > > > > > kvm_set_irq(0) > > > > > > > kvm_set_irq(1) > > > > > > > eventfd_signal > > > > > > > > > > > > > > The interrupt is now stuck on until another interrupt is injected. > > > > > > > > > > > > > > > > > > > Well EOI somehow happened here before interrupt so it's a bug > > > > > > somewhere > > > > > > else? > > > > > > > > > > Interrupts can be shared. We also can't guarantee that the guest > > > > > won't > > > > > write a bogus EOI to the ioapic. The irq ack notifier doesn't filter > > > > > on > > > > > irq source id... I'm not sure it can. > > > > > > > > I guess if Avi OKs adding another kvm_set_irq under spinlock that's > > > > the best we can do for now. > > > > > > Why can't a mutex be used instead of a spinlock again? > > > > eventfd_signal calls the inject function from atomic context. > > Actually, that's called from a workq. I'll have to switch it back and > turn on lockdep to remember why I couldn't sleep there. I'll try to fix kvm_set_irq so it returns 0 if level was already 0. Then you do not need extra state. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, 2012-07-18 at 13:07 -0600, Alex Williamson wrote: > On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote: > > On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote: > > > > > > > > Back to original point though current > > > > > > > > situation is that calling kvm_set_irq() under spinlock is not > > > > > > > > worse for > > > > > > > > scalability than calling it not under one. > > > > > > > > > > > > > > Yes. Still the specific use can just use an atomic flag, > > > > > > > lock+bool is not needed, and we won't need to undo it later. > > > > > > > > > > > > > > > > > > Actually, no, replacing it with an atomic is racy. > > > > > > > > > > > > CPU0 (inject) CPU1 (EOI) > > > > > > atomic_cmpxchg(, 0, 1) > > > > > > atomic_cmpxchg(, 1, 0) > > > > > > kvm_set_irq(0) > > > > > > kvm_set_irq(1) > > > > > > eventfd_signal > > > > > > > > > > > > The interrupt is now stuck on until another interrupt is injected. > > > > > > > > > > > > > > > > Well EOI somehow happened here before interrupt so it's a bug > > > > > somewhere > > > > > else? > > > > > > > > Interrupts can be shared. We also can't guarantee that the guest won't > > > > write a bogus EOI to the ioapic. The irq ack notifier doesn't filter on > > > > irq source id... I'm not sure it can. > > > > > > I guess if Avi OKs adding another kvm_set_irq under spinlock that's > > > the best we can do for now. > > > > Why can't a mutex be used instead of a spinlock again? > > eventfd_signal calls the inject function from atomic context. Actually, that's called from a workq. I'll have to switch it back and turn on lockdep to remember why I couldn't sleep there. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote: > On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote: > > > > > > > Back to original point though current > > > > > > > situation is that calling kvm_set_irq() under spinlock is not > > > > > > > worse for > > > > > > > scalability than calling it not under one. > > > > > > > > > > > > Yes. Still the specific use can just use an atomic flag, > > > > > > lock+bool is not needed, and we won't need to undo it later. > > > > > > > > > > > > > > > Actually, no, replacing it with an atomic is racy. > > > > > > > > > > CPU0 (inject) CPU1 (EOI) > > > > > atomic_cmpxchg(, 0, 1) > > > > > atomic_cmpxchg(, 1, 0) > > > > > kvm_set_irq(0) > > > > > kvm_set_irq(1) > > > > > eventfd_signal > > > > > > > > > > The interrupt is now stuck on until another interrupt is injected. > > > > > > > > > > > > > Well EOI somehow happened here before interrupt so it's a bug somewhere > > > > else? > > > > > > Interrupts can be shared. We also can't guarantee that the guest won't > > > write a bogus EOI to the ioapic. The irq ack notifier doesn't filter on > > > irq source id... I'm not sure it can. > > > > I guess if Avi OKs adding another kvm_set_irq under spinlock that's > > the best we can do for now. > > Why can't a mutex be used instead of a spinlock again? eventfd_signal calls the inject function from atomic context. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 03:42:09PM -0300, Marcelo Tosatti wrote: > On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote: > > > > > > > Back to original point though current > > > > > > > situation is that calling kvm_set_irq() under spinlock is not > > > > > > > worse for > > > > > > > scalability than calling it not under one. > > > > > > > > > > > > Yes. Still the specific use can just use an atomic flag, > > > > > > lock+bool is not needed, and we won't need to undo it later. > > > > > > > > > > > > > > > Actually, no, replacing it with an atomic is racy. > > > > > > > > > > CPU0 (inject) CPU1 (EOI) > > > > > atomic_cmpxchg(, 0, 1) > > > > > atomic_cmpxchg(, 1, 0) > > > > > kvm_set_irq(0) > > > > > kvm_set_irq(1) > > > > > eventfd_signal > > > > > > > > > > The interrupt is now stuck on until another interrupt is injected. > > > > > > > > > > > > > Well EOI somehow happened here before interrupt so it's a bug somewhere > > > > else? > > > > > > Interrupts can be shared. We also can't guarantee that the guest won't > > > write a bogus EOI to the ioapic. The irq ack notifier doesn't filter on > > > irq source id... I'm not sure it can. > > > > I guess if Avi OKs adding another kvm_set_irq under spinlock that's > > the best we can do for now. > > Why can't a mutex be used instead of a spinlock again? > Why was it changed at the first place? Commit says that the function is called from unsleepable context, but no stack trace. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote: > > > > > > Back to original point though current > > > > > > situation is that calling kvm_set_irq() under spinlock is not worse > > > > > > for > > > > > > scalability than calling it not under one. > > > > > > > > > > Yes. Still the specific use can just use an atomic flag, > > > > > lock+bool is not needed, and we won't need to undo it later. > > > > > > > > > > > > Actually, no, replacing it with an atomic is racy. > > > > > > > > CPU0 (inject) CPU1 (EOI) > > > > atomic_cmpxchg(, 0, 1) > > > > atomic_cmpxchg(, 1, 0) > > > > kvm_set_irq(0) > > > > kvm_set_irq(1) > > > > eventfd_signal > > > > > > > > The interrupt is now stuck on until another interrupt is injected. > > > > > > > > > > Well EOI somehow happened here before interrupt so it's a bug somewhere > > > else? > > > > Interrupts can be shared. We also can't guarantee that the guest won't > > write a bogus EOI to the ioapic. The irq ack notifier doesn't filter on > > irq source id... I'm not sure it can. > > I guess if Avi OKs adding another kvm_set_irq under spinlock that's > the best we can do for now. Why can't a mutex be used instead of a spinlock again? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 09:48:01AM -0600, Alex Williamson wrote: > On Wed, 2012-07-18 at 18:38 +0300, Michael S. Tsirkin wrote: > > On Wed, Jul 18, 2012 at 08:47:23AM -0600, Alex Williamson wrote: > > > On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote: > > > > On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote: > > > > > On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote: > > > > > > On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote: > > > > > > > > > > > > > So as was discussed kvm_set_irq under spinlock is bad > > > > > > > > > > > > > for scalability > > > > > > > > > > > > > with multiple VCPUs. Why do we need a spinlock > > > > > > > > > > > > > simply to protect > > > > > > > > > > > > > level_asserted? Let's use an atomic test and > > > > > > > > > > > > > set/test and clear and the > > > > > > > > > > > > > problem goes away. > > > > > > > > > > > > > > > > > > > > > > > > > That sad reality is that for level interrupt we already > > > > > > > > > > > > scan all vcpus > > > > > > > > > > > > under spinlock. > > > > > > > > > > > > > > > > > > > > > > Where? > > > > > > > > > > > > > > > > > > > > > ioapic > > > > > > > > > > > > > > > > > > $ grep kvm_for_each_vcpu virt/kvm/ioapic.c > > > > > > > > > $ > > > > > > > > > > > > > > > > > > ? > > > > > > > > > > > > > > > > > > > > > > > > > Come on Michael. You can do better than grep and actually look > > > > > > > > at what > > > > > > > > code does. The code that loops over all vcpus while delivering > > > > > > > > an irq is > > > > > > > > in kvm_irq_delivery_to_apic(). Now grep for that. > > > > > > > > > > > > > > Hmm, I see, it's actually done for edge if injected from ioapic > > > > > > > too, > > > > > > > right? > > > > > > > > > > > > > > So set_irq does a linear scan, and for each matching CPU it calls > > > > > > > kvm_irq_delivery_to_apic which is another scan? > > > > > > > So it's actually N^2 worst case for a broadcast? > > > > > > > > > > > > No it isn't, I misread the code. > > > > > > > > > > > > > > > > > > Anyway, maybe not trivially but this looks fixable to me: we could > > > > > > drop > > > > > > the ioapic lock before calling kvm_irq_delivery_to_apic. > > > > > > > > > > > May be, may be not. Just saying "lets drop lock whenever we don't feel > > > > > like holding one" does not cut it. > > > > > > > > One thing we do is set remote_irr if interrupt was injected. > > > > I agree these things are tricky. > > > > > > > > One other question: > > > > > > > > static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx) > > > > { > > > > union kvm_ioapic_redirect_entry *pent; > > > > int injected = -1; > > > > > > > > pent = >redirtbl[idx]; > > > > > > > > if (!pent->fields.mask) { > > > > injected = ioapic_deliver(ioapic, idx); > > > > if (injected && pent->fields.trig_mode == > > > > IOAPIC_LEVEL_TRIG) > > > > pent->fields.remote_irr = 1; > > > > } > > > > > > > > return injected; > > > > } > > > > > > > > > > > > This if (injected) looks a bit strange since ioapic_deliver returns > > > > -1 if no matching destinations. Should be if (injected > 0)? > > > > > > > > > > > > > > > > > Back to original point though current > > > > > situation is that calling kvm_set_irq() under spinlock is not worse > > > > > for > > > > > scalability than calling it not under one. > > > > > > > > Yes. Still the specific use can just use an atomic flag, > > > > lock+bool is not needed, and we won't need to undo it later. > > > > > > > > > Actually, no, replacing it with an atomic is racy. > > > > > > CPU0 (inject) CPU1 (EOI) > > > atomic_cmpxchg(, 0, 1) > > > atomic_cmpxchg(, 1, 0) > > > kvm_set_irq(0) > > > kvm_set_irq(1) > > > eventfd_signal > > > > > > The interrupt is now stuck on until another interrupt is injected. > > > > > > > Well EOI somehow happened here before interrupt so it's a bug somewhere > > else? > > Interrupts can be shared. We also can't guarantee that the guest won't > write a bogus EOI to the ioapic. The irq ack notifier doesn't filter on > irq source id... I'm not sure it can. I guess if Avi OKs adding another kvm_set_irq under spinlock that's the best we can do for now. If not, maybe we can teach kvm_set_irq to return an indication of the previous status. Specifically kvm_irq_line_state could do test_and_set/test_and_clear and if already set/clear we return 0 immediately. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, 2012-07-18 at 18:38 +0300, Michael S. Tsirkin wrote: > On Wed, Jul 18, 2012 at 08:47:23AM -0600, Alex Williamson wrote: > > On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote: > > > On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote: > > > > On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote: > > > > > On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote: > > > > > > > > > > > > So as was discussed kvm_set_irq under spinlock is bad > > > > > > > > > > > > for scalability > > > > > > > > > > > > with multiple VCPUs. Why do we need a spinlock simply > > > > > > > > > > > > to protect > > > > > > > > > > > > level_asserted? Let's use an atomic test and set/test > > > > > > > > > > > > and clear and the > > > > > > > > > > > > problem goes away. > > > > > > > > > > > > > > > > > > > > > > > That sad reality is that for level interrupt we already > > > > > > > > > > > scan all vcpus > > > > > > > > > > > under spinlock. > > > > > > > > > > > > > > > > > > > > Where? > > > > > > > > > > > > > > > > > > > ioapic > > > > > > > > > > > > > > > > $ grep kvm_for_each_vcpu virt/kvm/ioapic.c > > > > > > > > $ > > > > > > > > > > > > > > > > ? > > > > > > > > > > > > > > > > > > > > > > Come on Michael. You can do better than grep and actually look at > > > > > > > what > > > > > > > code does. The code that loops over all vcpus while delivering an > > > > > > > irq is > > > > > > > in kvm_irq_delivery_to_apic(). Now grep for that. > > > > > > > > > > > > Hmm, I see, it's actually done for edge if injected from ioapic too, > > > > > > right? > > > > > > > > > > > > So set_irq does a linear scan, and for each matching CPU it calls > > > > > > kvm_irq_delivery_to_apic which is another scan? > > > > > > So it's actually N^2 worst case for a broadcast? > > > > > > > > > > No it isn't, I misread the code. > > > > > > > > > > > > > > > Anyway, maybe not trivially but this looks fixable to me: we could > > > > > drop > > > > > the ioapic lock before calling kvm_irq_delivery_to_apic. > > > > > > > > > May be, may be not. Just saying "lets drop lock whenever we don't feel > > > > like holding one" does not cut it. > > > > > > One thing we do is set remote_irr if interrupt was injected. > > > I agree these things are tricky. > > > > > > One other question: > > > > > > static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx) > > > { > > > union kvm_ioapic_redirect_entry *pent; > > > int injected = -1; > > > > > > pent = >redirtbl[idx]; > > > > > > if (!pent->fields.mask) { > > > injected = ioapic_deliver(ioapic, idx); > > > if (injected && pent->fields.trig_mode == > > > IOAPIC_LEVEL_TRIG) > > > pent->fields.remote_irr = 1; > > > } > > > > > > return injected; > > > } > > > > > > > > > This if (injected) looks a bit strange since ioapic_deliver returns > > > -1 if no matching destinations. Should be if (injected > 0)? > > > > > > > > > > > > > Back to original point though current > > > > situation is that calling kvm_set_irq() under spinlock is not worse for > > > > scalability than calling it not under one. > > > > > > Yes. Still the specific use can just use an atomic flag, > > > lock+bool is not needed, and we won't need to undo it later. > > > > > > Actually, no, replacing it with an atomic is racy. > > > > CPU0 (inject) CPU1 (EOI) > > atomic_cmpxchg(, 0, 1) > > atomic_cmpxchg(, 1, 0) > > kvm_set_irq(0) > > kvm_set_irq(1) > > eventfd_signal > > > > The interrupt is now stuck on until another interrupt is injected. > > > > Well EOI somehow happened here before interrupt so it's a bug somewhere > else? Interrupts can be shared. We also can't guarantee that the guest won't write a bogus EOI to the ioapic. The irq ack notifier doesn't filter on irq source id... I'm not sure it can. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 08:47:23AM -0600, Alex Williamson wrote: > On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote: > > On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote: > > > On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote: > > > > On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote: > > > > > > > > > > > So as was discussed kvm_set_irq under spinlock is bad for > > > > > > > > > > > scalability > > > > > > > > > > > with multiple VCPUs. Why do we need a spinlock simply to > > > > > > > > > > > protect > > > > > > > > > > > level_asserted? Let's use an atomic test and set/test > > > > > > > > > > > and clear and the > > > > > > > > > > > problem goes away. > > > > > > > > > > > > > > > > > > > > > That sad reality is that for level interrupt we already > > > > > > > > > > scan all vcpus > > > > > > > > > > under spinlock. > > > > > > > > > > > > > > > > > > Where? > > > > > > > > > > > > > > > > > ioapic > > > > > > > > > > > > > > $ grep kvm_for_each_vcpu virt/kvm/ioapic.c > > > > > > > $ > > > > > > > > > > > > > > ? > > > > > > > > > > > > > > > > > > > Come on Michael. You can do better than grep and actually look at > > > > > > what > > > > > > code does. The code that loops over all vcpus while delivering an > > > > > > irq is > > > > > > in kvm_irq_delivery_to_apic(). Now grep for that. > > > > > > > > > > Hmm, I see, it's actually done for edge if injected from ioapic too, > > > > > right? > > > > > > > > > > So set_irq does a linear scan, and for each matching CPU it calls > > > > > kvm_irq_delivery_to_apic which is another scan? > > > > > So it's actually N^2 worst case for a broadcast? > > > > > > > > No it isn't, I misread the code. > > > > > > > > > > > > Anyway, maybe not trivially but this looks fixable to me: we could drop > > > > the ioapic lock before calling kvm_irq_delivery_to_apic. > > > > > > > May be, may be not. Just saying "lets drop lock whenever we don't feel > > > like holding one" does not cut it. > > > > One thing we do is set remote_irr if interrupt was injected. > > I agree these things are tricky. > > > > One other question: > > > > static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx) > > { > > union kvm_ioapic_redirect_entry *pent; > > int injected = -1; > > > > pent = >redirtbl[idx]; > > > > if (!pent->fields.mask) { > > injected = ioapic_deliver(ioapic, idx); > > if (injected && pent->fields.trig_mode == IOAPIC_LEVEL_TRIG) > > pent->fields.remote_irr = 1; > > } > > > > return injected; > > } > > > > > > This if (injected) looks a bit strange since ioapic_deliver returns > > -1 if no matching destinations. Should be if (injected > 0)? > > > > > > > > > Back to original point though current > > > situation is that calling kvm_set_irq() under spinlock is not worse for > > > scalability than calling it not under one. > > > > Yes. Still the specific use can just use an atomic flag, > > lock+bool is not needed, and we won't need to undo it later. > > > Actually, no, replacing it with an atomic is racy. > > CPU0 (inject) CPU1 (EOI) > atomic_cmpxchg(, 0, 1) > atomic_cmpxchg(, 1, 0) > kvm_set_irq(0) > kvm_set_irq(1) > eventfd_signal > > The interrupt is now stuck on until another interrupt is injected. > Well EOI somehow happened here before interrupt so it's a bug somewhere else? -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote: > On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote: > > On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote: > > > On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote: > > > > > > > > > > So as was discussed kvm_set_irq under spinlock is bad for > > > > > > > > > > scalability > > > > > > > > > > with multiple VCPUs. Why do we need a spinlock simply to > > > > > > > > > > protect > > > > > > > > > > level_asserted? Let's use an atomic test and set/test and > > > > > > > > > > clear and the > > > > > > > > > > problem goes away. > > > > > > > > > > > > > > > > > > > That sad reality is that for level interrupt we already scan > > > > > > > > > all vcpus > > > > > > > > > under spinlock. > > > > > > > > > > > > > > > > Where? > > > > > > > > > > > > > > > ioapic > > > > > > > > > > > > $ grep kvm_for_each_vcpu virt/kvm/ioapic.c > > > > > > $ > > > > > > > > > > > > ? > > > > > > > > > > > > > > > > Come on Michael. You can do better than grep and actually look at what > > > > > code does. The code that loops over all vcpus while delivering an irq > > > > > is > > > > > in kvm_irq_delivery_to_apic(). Now grep for that. > > > > > > > > Hmm, I see, it's actually done for edge if injected from ioapic too, > > > > right? > > > > > > > > So set_irq does a linear scan, and for each matching CPU it calls > > > > kvm_irq_delivery_to_apic which is another scan? > > > > So it's actually N^2 worst case for a broadcast? > > > > > > No it isn't, I misread the code. > > > > > > > > > Anyway, maybe not trivially but this looks fixable to me: we could drop > > > the ioapic lock before calling kvm_irq_delivery_to_apic. > > > > > May be, may be not. Just saying "lets drop lock whenever we don't feel > > like holding one" does not cut it. > > One thing we do is set remote_irr if interrupt was injected. > I agree these things are tricky. > > One other question: > > static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx) > { > union kvm_ioapic_redirect_entry *pent; > int injected = -1; > > pent = >redirtbl[idx]; > > if (!pent->fields.mask) { > injected = ioapic_deliver(ioapic, idx); > if (injected && pent->fields.trig_mode == IOAPIC_LEVEL_TRIG) > pent->fields.remote_irr = 1; > } > > return injected; > } > > > This if (injected) looks a bit strange since ioapic_deliver returns > -1 if no matching destinations. Should be if (injected > 0)? > > > > > Back to original point though current > > situation is that calling kvm_set_irq() under spinlock is not worse for > > scalability than calling it not under one. > > Yes. Still the specific use can just use an atomic flag, > lock+bool is not needed, and we won't need to undo it later. Actually, no, replacing it with an atomic is racy. CPU0 (inject) CPU1 (EOI) atomic_cmpxchg(, 0, 1) atomic_cmpxchg(, 1, 0) kvm_set_irq(0) kvm_set_irq(1) eventfd_signal The interrupt is now stuck on until another interrupt is injected. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote: > On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote: > > On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote: > > > > > > > > > So as was discussed kvm_set_irq under spinlock is bad for > > > > > > > > > scalability > > > > > > > > > with multiple VCPUs. Why do we need a spinlock simply to > > > > > > > > > protect > > > > > > > > > level_asserted? Let's use an atomic test and set/test and > > > > > > > > > clear and the > > > > > > > > > problem goes away. > > > > > > > > > > > > > > > > > That sad reality is that for level interrupt we already scan > > > > > > > > all vcpus > > > > > > > > under spinlock. > > > > > > > > > > > > > > Where? > > > > > > > > > > > > > ioapic > > > > > > > > > > $ grep kvm_for_each_vcpu virt/kvm/ioapic.c > > > > > $ > > > > > > > > > > ? > > > > > > > > > > > > > Come on Michael. You can do better than grep and actually look at what > > > > code does. The code that loops over all vcpus while delivering an irq is > > > > in kvm_irq_delivery_to_apic(). Now grep for that. > > > > > > Hmm, I see, it's actually done for edge if injected from ioapic too, > > > right? > > > > > > So set_irq does a linear scan, and for each matching CPU it calls > > > kvm_irq_delivery_to_apic which is another scan? > > > So it's actually N^2 worst case for a broadcast? > > > > No it isn't, I misread the code. > > > > > > Anyway, maybe not trivially but this looks fixable to me: we could drop > > the ioapic lock before calling kvm_irq_delivery_to_apic. > > > May be, may be not. Just saying "lets drop lock whenever we don't feel > like holding one" does not cut it. One thing we do is set remote_irr if interrupt was injected. I agree these things are tricky. One other question: static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx) { union kvm_ioapic_redirect_entry *pent; int injected = -1; pent = >redirtbl[idx]; if (!pent->fields.mask) { injected = ioapic_deliver(ioapic, idx); if (injected && pent->fields.trig_mode == IOAPIC_LEVEL_TRIG) pent->fields.remote_irr = 1; } return injected; } This if (injected) looks a bit strange since ioapic_deliver returns -1 if no matching destinations. Should be if (injected > 0)? > Back to original point though current > situation is that calling kvm_set_irq() under spinlock is not worse for > scalability than calling it not under one. Yes. Still the specific use can just use an atomic flag, lock+bool is not needed, and we won't need to undo it later. > -- > Gleb. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote: > On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote: > > > > > > > > So as was discussed kvm_set_irq under spinlock is bad for > > > > > > > > scalability > > > > > > > > with multiple VCPUs. Why do we need a spinlock simply to > > > > > > > > protect > > > > > > > > level_asserted? Let's use an atomic test and set/test and > > > > > > > > clear and the > > > > > > > > problem goes away. > > > > > > > > > > > > > > > That sad reality is that for level interrupt we already scan all > > > > > > > vcpus > > > > > > > under spinlock. > > > > > > > > > > > > Where? > > > > > > > > > > > ioapic > > > > > > > > $ grep kvm_for_each_vcpu virt/kvm/ioapic.c > > > > $ > > > > > > > > ? > > > > > > > > > > Come on Michael. You can do better than grep and actually look at what > > > code does. The code that loops over all vcpus while delivering an irq is > > > in kvm_irq_delivery_to_apic(). Now grep for that. > > > > Hmm, I see, it's actually done for edge if injected from ioapic too, > > right? > > > > So set_irq does a linear scan, and for each matching CPU it calls > > kvm_irq_delivery_to_apic which is another scan? > > So it's actually N^2 worst case for a broadcast? > > No it isn't, I misread the code. > > > Anyway, maybe not trivially but this looks fixable to me: we could drop > the ioapic lock before calling kvm_irq_delivery_to_apic. > May be, may be not. Just saying "lets drop lock whenever we don't feel like holding one" does not cut it. Back to original point though current situation is that calling kvm_set_irq() under spinlock is not worse for scalability than calling it not under one. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote: > > > > > > > So as was discussed kvm_set_irq under spinlock is bad for > > > > > > > scalability > > > > > > > with multiple VCPUs. Why do we need a spinlock simply to protect > > > > > > > level_asserted? Let's use an atomic test and set/test and clear > > > > > > > and the > > > > > > > problem goes away. > > > > > > > > > > > > > That sad reality is that for level interrupt we already scan all > > > > > > vcpus > > > > > > under spinlock. > > > > > > > > > > Where? > > > > > > > > > ioapic > > > > > > $ grep kvm_for_each_vcpu virt/kvm/ioapic.c > > > $ > > > > > > ? > > > > > > > Come on Michael. You can do better than grep and actually look at what > > code does. The code that loops over all vcpus while delivering an irq is > > in kvm_irq_delivery_to_apic(). Now grep for that. > > Hmm, I see, it's actually done for edge if injected from ioapic too, > right? > > So set_irq does a linear scan, and for each matching CPU it calls > kvm_irq_delivery_to_apic which is another scan? > So it's actually N^2 worst case for a broadcast? No it isn't, I misread the code. Anyway, maybe not trivially but this looks fixable to me: we could drop the ioapic lock before calling kvm_irq_delivery_to_apic. > > -- > > Gleb. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 01:55:30PM +0300, Gleb Natapov wrote: > On Wed, Jul 18, 2012 at 01:53:11PM +0300, Michael S. Tsirkin wrote: > > On Wed, Jul 18, 2012 at 01:49:06PM +0300, Gleb Natapov wrote: > > > On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote: > > > > On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote: > > > > > On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote: > > > > > > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: > > > > > > > In order to inject a level interrupt from an external source > > > > > > > using an > > > > > > > irqfd, we need to allocate a new irq_source_id. This allows us to > > > > > > > assert and (later) de-assert an interrupt line independently from > > > > > > > users of KVM_IRQ_LINE and avoid lost interrupts. > > > > > > > > > > > > > > We also add what may appear like a bit of excessive infrastructure > > > > > > > around an object for storing this irq_source_id. However, notice > > > > > > > that we only provide a way to assert the interrupt here. A > > > > > > > follow-on > > > > > > > interface will make use of the same irq_source_id to allow > > > > > > > de-assert. > > > > > > > > > > > > > > Signed-off-by: Alex Williamson > > > > > > > --- > > > > > > > > > > > > > > Documentation/virtual/kvm/api.txt |6 ++ > > > > > > > arch/x86/kvm/x86.c|1 > > > > > > > include/linux/kvm.h |3 + > > > > > > > virt/kvm/eventfd.c| 114 > > > > > > > - > > > > > > > 4 files changed, 120 insertions(+), 4 deletions(-) > > > > > > > > > > > > > > diff --git a/Documentation/virtual/kvm/api.txt > > > > > > > b/Documentation/virtual/kvm/api.txt > > > > > > > index 100acde..c7267d5 100644 > > > > > > > --- a/Documentation/virtual/kvm/api.txt > > > > > > > +++ b/Documentation/virtual/kvm/api.txt > > > > > > > @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin. > > > > > > > The irqfd is removed using > > > > > > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd > > > > > > > and kvm_irqfd.gsi. > > > > > > > > > > > > > > +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a > > > > > > > level > > > > > > > +triggered interrupt. In this case a new irqchip input is > > > > > > > allocated > > > > > > > +which is logically OR'd with other inputs allowing multiple > > > > > > > sources > > > > > > > +to independently assert level interrupts. The > > > > > > > KVM_IRQFD_FLAG_LEVEL > > > > > > > +is only necessary on setup, teardown is identical to that above. > > > > > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. > > > > > > > > > > > > > > 5. The kvm_run structure > > > > > > > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > > > > > index a01a424..80bed07 100644 > > > > > > > --- a/arch/x86/kvm/x86.c > > > > > > > +++ b/arch/x86/kvm/x86.c > > > > > > > @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > > > > > > case KVM_CAP_GET_TSC_KHZ: > > > > > > > case KVM_CAP_PCI_2_3: > > > > > > > case KVM_CAP_KVMCLOCK_CTRL: > > > > > > > + case KVM_CAP_IRQFD_LEVEL: > > > > > > > r = 1; > > > > > > > break; > > > > > > > case KVM_CAP_COALESCED_MMIO: > > > > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > > > > > > index 2ce09aa..b2e6e4f 100644 > > > > > > > --- a/include/linux/kvm.h > > > > > > > +++ b/include/linux/kvm.h > > > > > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { > > > > > > > #define KVM_CAP_PPC_GET_SMMU_INFO 78 > > > > > > > #define KVM_CAP_S390_COW 79 > > > > > > > #define KVM_CAP_PPC_ALLOC_HTAB 80 > > > > > > > +#define KVM_CAP_IRQFD_LEVEL 81 > > > > > > > > > > > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > > > > > > > > > > > @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config { > > > > > > > #endif > > > > > > > > > > > > > > #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) > > > > > > > +/* Available with KVM_CAP_IRQFD_LEVEL */ > > > > > > > +#define KVM_IRQFD_FLAG_LEVEL (1 << 1) > > > > > > > > > > > > > > struct kvm_irqfd { > > > > > > > __u32 fd; > > > > > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > > > > > > > index 7d7e2aa..ecdbfea 100644 > > > > > > > --- a/virt/kvm/eventfd.c > > > > > > > +++ b/virt/kvm/eventfd.c > > > > > > > @@ -36,6 +36,68 @@ > > > > > > > #include "iodev.h" > > > > > > > > > > > > > > /* > > > > > > > + * An irq_source_id can be created from KVM_IRQFD for level > > > > > > > interrupt > > > > > > > + * injections and shared with other interfaces for EOI or > > > > > > > de-assert. > > > > > > > + * Create an object with reference counting to make it easy to > > > > > > > use. > > > > > > > + */ > > > > > > > +struct _irq_source { > > > > > > > + int id; /* the IRQ source ID */ > > > > > > > + bool level_asserted; /* Track assertion state and protect with > > > > > > >
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 01:53:11PM +0300, Michael S. Tsirkin wrote: > On Wed, Jul 18, 2012 at 01:49:06PM +0300, Gleb Natapov wrote: > > On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote: > > > On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote: > > > > On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote: > > > > > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: > > > > > > In order to inject a level interrupt from an external source using > > > > > > an > > > > > > irqfd, we need to allocate a new irq_source_id. This allows us to > > > > > > assert and (later) de-assert an interrupt line independently from > > > > > > users of KVM_IRQ_LINE and avoid lost interrupts. > > > > > > > > > > > > We also add what may appear like a bit of excessive infrastructure > > > > > > around an object for storing this irq_source_id. However, notice > > > > > > that we only provide a way to assert the interrupt here. A > > > > > > follow-on > > > > > > interface will make use of the same irq_source_id to allow > > > > > > de-assert. > > > > > > > > > > > > Signed-off-by: Alex Williamson > > > > > > --- > > > > > > > > > > > > Documentation/virtual/kvm/api.txt |6 ++ > > > > > > arch/x86/kvm/x86.c|1 > > > > > > include/linux/kvm.h |3 + > > > > > > virt/kvm/eventfd.c| 114 > > > > > > - > > > > > > 4 files changed, 120 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/Documentation/virtual/kvm/api.txt > > > > > > b/Documentation/virtual/kvm/api.txt > > > > > > index 100acde..c7267d5 100644 > > > > > > --- a/Documentation/virtual/kvm/api.txt > > > > > > +++ b/Documentation/virtual/kvm/api.txt > > > > > > @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin. The > > > > > > irqfd is removed using > > > > > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd > > > > > > and kvm_irqfd.gsi. > > > > > > > > > > > > +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a > > > > > > level > > > > > > +triggered interrupt. In this case a new irqchip input is allocated > > > > > > +which is logically OR'd with other inputs allowing multiple sources > > > > > > +to independently assert level interrupts. The KVM_IRQFD_FLAG_LEVEL > > > > > > +is only necessary on setup, teardown is identical to that above. > > > > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. > > > > > > > > > > > > 5. The kvm_run structure > > > > > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > > > > index a01a424..80bed07 100644 > > > > > > --- a/arch/x86/kvm/x86.c > > > > > > +++ b/arch/x86/kvm/x86.c > > > > > > @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > > > > > case KVM_CAP_GET_TSC_KHZ: > > > > > > case KVM_CAP_PCI_2_3: > > > > > > case KVM_CAP_KVMCLOCK_CTRL: > > > > > > + case KVM_CAP_IRQFD_LEVEL: > > > > > > r = 1; > > > > > > break; > > > > > > case KVM_CAP_COALESCED_MMIO: > > > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > > > > > index 2ce09aa..b2e6e4f 100644 > > > > > > --- a/include/linux/kvm.h > > > > > > +++ b/include/linux/kvm.h > > > > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { > > > > > > #define KVM_CAP_PPC_GET_SMMU_INFO 78 > > > > > > #define KVM_CAP_S390_COW 79 > > > > > > #define KVM_CAP_PPC_ALLOC_HTAB 80 > > > > > > +#define KVM_CAP_IRQFD_LEVEL 81 > > > > > > > > > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > > > > > > > > > @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config { > > > > > > #endif > > > > > > > > > > > > #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) > > > > > > +/* Available with KVM_CAP_IRQFD_LEVEL */ > > > > > > +#define KVM_IRQFD_FLAG_LEVEL (1 << 1) > > > > > > > > > > > > struct kvm_irqfd { > > > > > > __u32 fd; > > > > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > > > > > > index 7d7e2aa..ecdbfea 100644 > > > > > > --- a/virt/kvm/eventfd.c > > > > > > +++ b/virt/kvm/eventfd.c > > > > > > @@ -36,6 +36,68 @@ > > > > > > #include "iodev.h" > > > > > > > > > > > > /* > > > > > > + * An irq_source_id can be created from KVM_IRQFD for level > > > > > > interrupt > > > > > > + * injections and shared with other interfaces for EOI or > > > > > > de-assert. > > > > > > + * Create an object with reference counting to make it easy to use. > > > > > > + */ > > > > > > +struct _irq_source { > > > > > > + int id; /* the IRQ source ID */ > > > > > > + bool level_asserted; /* Track assertion state and protect with > > > > > > lock */ > > > > > > + spinlock_t lock; /* to avoid unnecessary re-assert/spurious > > > > > > eoi. */ > > > > > > + struct kvm *kvm; > > > > > > + struct kref kref; > > > > > > +}; > > > > > > + > > > > > > +static void _irq_source_release(struct kref *kref) > > > > > > +{ > > > > > > +
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 01:49:06PM +0300, Gleb Natapov wrote: > On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote: > > On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote: > > > On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote: > > > > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: > > > > > In order to inject a level interrupt from an external source using an > > > > > irqfd, we need to allocate a new irq_source_id. This allows us to > > > > > assert and (later) de-assert an interrupt line independently from > > > > > users of KVM_IRQ_LINE and avoid lost interrupts. > > > > > > > > > > We also add what may appear like a bit of excessive infrastructure > > > > > around an object for storing this irq_source_id. However, notice > > > > > that we only provide a way to assert the interrupt here. A follow-on > > > > > interface will make use of the same irq_source_id to allow de-assert. > > > > > > > > > > Signed-off-by: Alex Williamson > > > > > --- > > > > > > > > > > Documentation/virtual/kvm/api.txt |6 ++ > > > > > arch/x86/kvm/x86.c|1 > > > > > include/linux/kvm.h |3 + > > > > > virt/kvm/eventfd.c| 114 > > > > > - > > > > > 4 files changed, 120 insertions(+), 4 deletions(-) > > > > > > > > > > diff --git a/Documentation/virtual/kvm/api.txt > > > > > b/Documentation/virtual/kvm/api.txt > > > > > index 100acde..c7267d5 100644 > > > > > --- a/Documentation/virtual/kvm/api.txt > > > > > +++ b/Documentation/virtual/kvm/api.txt > > > > > @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin. The > > > > > irqfd is removed using > > > > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd > > > > > and kvm_irqfd.gsi. > > > > > > > > > > +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level > > > > > +triggered interrupt. In this case a new irqchip input is allocated > > > > > +which is logically OR'd with other inputs allowing multiple sources > > > > > +to independently assert level interrupts. The KVM_IRQFD_FLAG_LEVEL > > > > > +is only necessary on setup, teardown is identical to that above. > > > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. > > > > > > > > > > 5. The kvm_run structure > > > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > > > index a01a424..80bed07 100644 > > > > > --- a/arch/x86/kvm/x86.c > > > > > +++ b/arch/x86/kvm/x86.c > > > > > @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > > > > case KVM_CAP_GET_TSC_KHZ: > > > > > case KVM_CAP_PCI_2_3: > > > > > case KVM_CAP_KVMCLOCK_CTRL: > > > > > + case KVM_CAP_IRQFD_LEVEL: > > > > > r = 1; > > > > > break; > > > > > case KVM_CAP_COALESCED_MMIO: > > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > > > > index 2ce09aa..b2e6e4f 100644 > > > > > --- a/include/linux/kvm.h > > > > > +++ b/include/linux/kvm.h > > > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { > > > > > #define KVM_CAP_PPC_GET_SMMU_INFO 78 > > > > > #define KVM_CAP_S390_COW 79 > > > > > #define KVM_CAP_PPC_ALLOC_HTAB 80 > > > > > +#define KVM_CAP_IRQFD_LEVEL 81 > > > > > > > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > > > > > > > @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config { > > > > > #endif > > > > > > > > > > #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) > > > > > +/* Available with KVM_CAP_IRQFD_LEVEL */ > > > > > +#define KVM_IRQFD_FLAG_LEVEL (1 << 1) > > > > > > > > > > struct kvm_irqfd { > > > > > __u32 fd; > > > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > > > > > index 7d7e2aa..ecdbfea 100644 > > > > > --- a/virt/kvm/eventfd.c > > > > > +++ b/virt/kvm/eventfd.c > > > > > @@ -36,6 +36,68 @@ > > > > > #include "iodev.h" > > > > > > > > > > /* > > > > > + * An irq_source_id can be created from KVM_IRQFD for level interrupt > > > > > + * injections and shared with other interfaces for EOI or de-assert. > > > > > + * Create an object with reference counting to make it easy to use. > > > > > + */ > > > > > +struct _irq_source { > > > > > + int id; /* the IRQ source ID */ > > > > > + bool level_asserted; /* Track assertion state and protect with > > > > > lock */ > > > > > + spinlock_t lock; /* to avoid unnecessary re-assert/spurious > > > > > eoi. */ > > > > > + struct kvm *kvm; > > > > > + struct kref kref; > > > > > +}; > > > > > + > > > > > +static void _irq_source_release(struct kref *kref) > > > > > +{ > > > > > + struct _irq_source *source; > > > > > + > > > > > + source = container_of(kref, struct _irq_source, kref); > > > > > + > > > > > + /* This also de-asserts */ > > > > > + kvm_free_irq_source_id(source->kvm, source->id); > > > > > + kfree(source); > > > > > +} > > > > > + > > > > > +static void
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote: > On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote: > > On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote: > > > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: > > > > In order to inject a level interrupt from an external source using an > > > > irqfd, we need to allocate a new irq_source_id. This allows us to > > > > assert and (later) de-assert an interrupt line independently from > > > > users of KVM_IRQ_LINE and avoid lost interrupts. > > > > > > > > We also add what may appear like a bit of excessive infrastructure > > > > around an object for storing this irq_source_id. However, notice > > > > that we only provide a way to assert the interrupt here. A follow-on > > > > interface will make use of the same irq_source_id to allow de-assert. > > > > > > > > Signed-off-by: Alex Williamson > > > > --- > > > > > > > > Documentation/virtual/kvm/api.txt |6 ++ > > > > arch/x86/kvm/x86.c|1 > > > > include/linux/kvm.h |3 + > > > > virt/kvm/eventfd.c| 114 > > > > - > > > > 4 files changed, 120 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/Documentation/virtual/kvm/api.txt > > > > b/Documentation/virtual/kvm/api.txt > > > > index 100acde..c7267d5 100644 > > > > --- a/Documentation/virtual/kvm/api.txt > > > > +++ b/Documentation/virtual/kvm/api.txt > > > > @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin. The > > > > irqfd is removed using > > > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd > > > > and kvm_irqfd.gsi. > > > > > > > > +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level > > > > +triggered interrupt. In this case a new irqchip input is allocated > > > > +which is logically OR'd with other inputs allowing multiple sources > > > > +to independently assert level interrupts. The KVM_IRQFD_FLAG_LEVEL > > > > +is only necessary on setup, teardown is identical to that above. > > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. > > > > > > > > 5. The kvm_run structure > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > > index a01a424..80bed07 100644 > > > > --- a/arch/x86/kvm/x86.c > > > > +++ b/arch/x86/kvm/x86.c > > > > @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > > > case KVM_CAP_GET_TSC_KHZ: > > > > case KVM_CAP_PCI_2_3: > > > > case KVM_CAP_KVMCLOCK_CTRL: > > > > + case KVM_CAP_IRQFD_LEVEL: > > > > r = 1; > > > > break; > > > > case KVM_CAP_COALESCED_MMIO: > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > > > index 2ce09aa..b2e6e4f 100644 > > > > --- a/include/linux/kvm.h > > > > +++ b/include/linux/kvm.h > > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { > > > > #define KVM_CAP_PPC_GET_SMMU_INFO 78 > > > > #define KVM_CAP_S390_COW 79 > > > > #define KVM_CAP_PPC_ALLOC_HTAB 80 > > > > +#define KVM_CAP_IRQFD_LEVEL 81 > > > > > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > > > > > @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config { > > > > #endif > > > > > > > > #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) > > > > +/* Available with KVM_CAP_IRQFD_LEVEL */ > > > > +#define KVM_IRQFD_FLAG_LEVEL (1 << 1) > > > > > > > > struct kvm_irqfd { > > > > __u32 fd; > > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > > > > index 7d7e2aa..ecdbfea 100644 > > > > --- a/virt/kvm/eventfd.c > > > > +++ b/virt/kvm/eventfd.c > > > > @@ -36,6 +36,68 @@ > > > > #include "iodev.h" > > > > > > > > /* > > > > + * An irq_source_id can be created from KVM_IRQFD for level interrupt > > > > + * injections and shared with other interfaces for EOI or de-assert. > > > > + * Create an object with reference counting to make it easy to use. > > > > + */ > > > > +struct _irq_source { > > > > + int id; /* the IRQ source ID */ > > > > + bool level_asserted; /* Track assertion state and protect with > > > > lock */ > > > > + spinlock_t lock; /* to avoid unnecessary re-assert/spurious > > > > eoi. */ > > > > + struct kvm *kvm; > > > > + struct kref kref; > > > > +}; > > > > + > > > > +static void _irq_source_release(struct kref *kref) > > > > +{ > > > > + struct _irq_source *source; > > > > + > > > > + source = container_of(kref, struct _irq_source, kref); > > > > + > > > > + /* This also de-asserts */ > > > > + kvm_free_irq_source_id(source->kvm, source->id); > > > > + kfree(source); > > > > +} > > > > + > > > > +static void _irq_source_put(struct _irq_source *source) > > > > +{ > > > > + if (source) > > > > + kref_put(>kref, _irq_source_release); > > > > +} > > > > + > > > > +static struct _irq_source *__attribute__ ((used)) /* white lie for now > > > > */ >
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote: > On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote: > > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: > > > In order to inject a level interrupt from an external source using an > > > irqfd, we need to allocate a new irq_source_id. This allows us to > > > assert and (later) de-assert an interrupt line independently from > > > users of KVM_IRQ_LINE and avoid lost interrupts. > > > > > > We also add what may appear like a bit of excessive infrastructure > > > around an object for storing this irq_source_id. However, notice > > > that we only provide a way to assert the interrupt here. A follow-on > > > interface will make use of the same irq_source_id to allow de-assert. > > > > > > Signed-off-by: Alex Williamson > > > --- > > > > > > Documentation/virtual/kvm/api.txt |6 ++ > > > arch/x86/kvm/x86.c|1 > > > include/linux/kvm.h |3 + > > > virt/kvm/eventfd.c| 114 > > > - > > > 4 files changed, 120 insertions(+), 4 deletions(-) > > > > > > diff --git a/Documentation/virtual/kvm/api.txt > > > b/Documentation/virtual/kvm/api.txt > > > index 100acde..c7267d5 100644 > > > --- a/Documentation/virtual/kvm/api.txt > > > +++ b/Documentation/virtual/kvm/api.txt > > > @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin. The irqfd > > > is removed using > > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd > > > and kvm_irqfd.gsi. > > > > > > +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level > > > +triggered interrupt. In this case a new irqchip input is allocated > > > +which is logically OR'd with other inputs allowing multiple sources > > > +to independently assert level interrupts. The KVM_IRQFD_FLAG_LEVEL > > > +is only necessary on setup, teardown is identical to that above. > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. > > > > > > 5. The kvm_run structure > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > index a01a424..80bed07 100644 > > > --- a/arch/x86/kvm/x86.c > > > +++ b/arch/x86/kvm/x86.c > > > @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > > case KVM_CAP_GET_TSC_KHZ: > > > case KVM_CAP_PCI_2_3: > > > case KVM_CAP_KVMCLOCK_CTRL: > > > + case KVM_CAP_IRQFD_LEVEL: > > > r = 1; > > > break; > > > case KVM_CAP_COALESCED_MMIO: > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > > index 2ce09aa..b2e6e4f 100644 > > > --- a/include/linux/kvm.h > > > +++ b/include/linux/kvm.h > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { > > > #define KVM_CAP_PPC_GET_SMMU_INFO 78 > > > #define KVM_CAP_S390_COW 79 > > > #define KVM_CAP_PPC_ALLOC_HTAB 80 > > > +#define KVM_CAP_IRQFD_LEVEL 81 > > > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > > > @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config { > > > #endif > > > > > > #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) > > > +/* Available with KVM_CAP_IRQFD_LEVEL */ > > > +#define KVM_IRQFD_FLAG_LEVEL (1 << 1) > > > > > > struct kvm_irqfd { > > > __u32 fd; > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > > > index 7d7e2aa..ecdbfea 100644 > > > --- a/virt/kvm/eventfd.c > > > +++ b/virt/kvm/eventfd.c > > > @@ -36,6 +36,68 @@ > > > #include "iodev.h" > > > > > > /* > > > + * An irq_source_id can be created from KVM_IRQFD for level interrupt > > > + * injections and shared with other interfaces for EOI or de-assert. > > > + * Create an object with reference counting to make it easy to use. > > > + */ > > > +struct _irq_source { > > > + int id; /* the IRQ source ID */ > > > + bool level_asserted; /* Track assertion state and protect with lock */ > > > + spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */ > > > + struct kvm *kvm; > > > + struct kref kref; > > > +}; > > > + > > > +static void _irq_source_release(struct kref *kref) > > > +{ > > > + struct _irq_source *source; > > > + > > > + source = container_of(kref, struct _irq_source, kref); > > > + > > > + /* This also de-asserts */ > > > + kvm_free_irq_source_id(source->kvm, source->id); > > > + kfree(source); > > > +} > > > + > > > +static void _irq_source_put(struct _irq_source *source) > > > +{ > > > + if (source) > > > + kref_put(>kref, _irq_source_release); > > > +} > > > + > > > +static struct _irq_source *__attribute__ ((used)) /* white lie for now */ > > > +_irq_source_get(struct _irq_source *source) > > > +{ > > > + if (source) > > > + kref_get(>kref); > > > + > > > + return source; > > > +} > > > + > > > +static struct _irq_source *_irq_source_alloc(struct kvm *kvm) > > > +{ > > > + struct _irq_source *source; > > > + int id; > > > + > > > + source = kzalloc(sizeof(*source), GFP_KERNEL); > > > + if (!source) > > > + return ERR_PTR(-ENOMEM); > > > + > > > + id =
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote: > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: > > In order to inject a level interrupt from an external source using an > > irqfd, we need to allocate a new irq_source_id. This allows us to > > assert and (later) de-assert an interrupt line independently from > > users of KVM_IRQ_LINE and avoid lost interrupts. > > > > We also add what may appear like a bit of excessive infrastructure > > around an object for storing this irq_source_id. However, notice > > that we only provide a way to assert the interrupt here. A follow-on > > interface will make use of the same irq_source_id to allow de-assert. > > > > Signed-off-by: Alex Williamson > > --- > > > > Documentation/virtual/kvm/api.txt |6 ++ > > arch/x86/kvm/x86.c|1 > > include/linux/kvm.h |3 + > > virt/kvm/eventfd.c| 114 > > - > > 4 files changed, 120 insertions(+), 4 deletions(-) > > > > diff --git a/Documentation/virtual/kvm/api.txt > > b/Documentation/virtual/kvm/api.txt > > index 100acde..c7267d5 100644 > > --- a/Documentation/virtual/kvm/api.txt > > +++ b/Documentation/virtual/kvm/api.txt > > @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin. The irqfd is > > removed using > > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd > > and kvm_irqfd.gsi. > > > > +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level > > +triggered interrupt. In this case a new irqchip input is allocated > > +which is logically OR'd with other inputs allowing multiple sources > > +to independently assert level interrupts. The KVM_IRQFD_FLAG_LEVEL > > +is only necessary on setup, teardown is identical to that above. > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. > > > > 5. The kvm_run structure > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index a01a424..80bed07 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > case KVM_CAP_GET_TSC_KHZ: > > case KVM_CAP_PCI_2_3: > > case KVM_CAP_KVMCLOCK_CTRL: > > + case KVM_CAP_IRQFD_LEVEL: > > r = 1; > > break; > > case KVM_CAP_COALESCED_MMIO: > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > index 2ce09aa..b2e6e4f 100644 > > --- a/include/linux/kvm.h > > +++ b/include/linux/kvm.h > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { > > #define KVM_CAP_PPC_GET_SMMU_INFO 78 > > #define KVM_CAP_S390_COW 79 > > #define KVM_CAP_PPC_ALLOC_HTAB 80 > > +#define KVM_CAP_IRQFD_LEVEL 81 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config { > > #endif > > > > #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) > > +/* Available with KVM_CAP_IRQFD_LEVEL */ > > +#define KVM_IRQFD_FLAG_LEVEL (1 << 1) > > > > struct kvm_irqfd { > > __u32 fd; > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > > index 7d7e2aa..ecdbfea 100644 > > --- a/virt/kvm/eventfd.c > > +++ b/virt/kvm/eventfd.c > > @@ -36,6 +36,68 @@ > > #include "iodev.h" > > > > /* > > + * An irq_source_id can be created from KVM_IRQFD for level interrupt > > + * injections and shared with other interfaces for EOI or de-assert. > > + * Create an object with reference counting to make it easy to use. > > + */ > > +struct _irq_source { > > + int id; /* the IRQ source ID */ > > + bool level_asserted; /* Track assertion state and protect with lock */ > > + spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */ > > + struct kvm *kvm; > > + struct kref kref; > > +}; > > + > > +static void _irq_source_release(struct kref *kref) > > +{ > > + struct _irq_source *source; > > + > > + source = container_of(kref, struct _irq_source, kref); > > + > > + /* This also de-asserts */ > > + kvm_free_irq_source_id(source->kvm, source->id); > > + kfree(source); > > +} > > + > > +static void _irq_source_put(struct _irq_source *source) > > +{ > > + if (source) > > + kref_put(>kref, _irq_source_release); > > +} > > + > > +static struct _irq_source *__attribute__ ((used)) /* white lie for now */ > > +_irq_source_get(struct _irq_source *source) > > +{ > > + if (source) > > + kref_get(>kref); > > + > > + return source; > > +} > > + > > +static struct _irq_source *_irq_source_alloc(struct kvm *kvm) > > +{ > > + struct _irq_source *source; > > + int id; > > + > > + source = kzalloc(sizeof(*source), GFP_KERNEL); > > + if (!source) > > + return ERR_PTR(-ENOMEM); > > + > > + id = kvm_request_irq_source_id(kvm); > > + if (id < 0) { > > + kfree(source); > > + return ERR_PTR(id); > > + } > > + > > + kref_init(>kref); > > + spin_lock_init(>lock); > > + source->kvm = kvm; > > + source->id = id; > > + > > + return
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: > In order to inject a level interrupt from an external source using an > irqfd, we need to allocate a new irq_source_id. This allows us to > assert and (later) de-assert an interrupt line independently from > users of KVM_IRQ_LINE and avoid lost interrupts. > > We also add what may appear like a bit of excessive infrastructure > around an object for storing this irq_source_id. However, notice > that we only provide a way to assert the interrupt here. A follow-on > interface will make use of the same irq_source_id to allow de-assert. > > Signed-off-by: Alex Williamson > --- > > Documentation/virtual/kvm/api.txt |6 ++ > arch/x86/kvm/x86.c|1 > include/linux/kvm.h |3 + > virt/kvm/eventfd.c| 114 > - > 4 files changed, 120 insertions(+), 4 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 100acde..c7267d5 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin. The irqfd is > removed using > the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd > and kvm_irqfd.gsi. > > +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level > +triggered interrupt. In this case a new irqchip input is allocated > +which is logically OR'd with other inputs allowing multiple sources > +to independently assert level interrupts. The KVM_IRQFD_FLAG_LEVEL > +is only necessary on setup, teardown is identical to that above. > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. > > 5. The kvm_run structure > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index a01a424..80bed07 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext) > case KVM_CAP_GET_TSC_KHZ: > case KVM_CAP_PCI_2_3: > case KVM_CAP_KVMCLOCK_CTRL: > + case KVM_CAP_IRQFD_LEVEL: > r = 1; > break; > case KVM_CAP_COALESCED_MMIO: > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > index 2ce09aa..b2e6e4f 100644 > --- a/include/linux/kvm.h > +++ b/include/linux/kvm.h > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { > #define KVM_CAP_PPC_GET_SMMU_INFO 78 > #define KVM_CAP_S390_COW 79 > #define KVM_CAP_PPC_ALLOC_HTAB 80 > +#define KVM_CAP_IRQFD_LEVEL 81 > > #ifdef KVM_CAP_IRQ_ROUTING > > @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config { > #endif > > #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0) > +/* Available with KVM_CAP_IRQFD_LEVEL */ > +#define KVM_IRQFD_FLAG_LEVEL (1 << 1) > > struct kvm_irqfd { > __u32 fd; > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c > index 7d7e2aa..ecdbfea 100644 > --- a/virt/kvm/eventfd.c > +++ b/virt/kvm/eventfd.c > @@ -36,6 +36,68 @@ > #include "iodev.h" > > /* > + * An irq_source_id can be created from KVM_IRQFD for level interrupt > + * injections and shared with other interfaces for EOI or de-assert. > + * Create an object with reference counting to make it easy to use. > + */ > +struct _irq_source { > + int id; /* the IRQ source ID */ > + bool level_asserted; /* Track assertion state and protect with lock */ > + spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */ > + struct kvm *kvm; > + struct kref kref; > +}; > + > +static void _irq_source_release(struct kref *kref) > +{ > + struct _irq_source *source; > + > + source = container_of(kref, struct _irq_source, kref); > + > + /* This also de-asserts */ > + kvm_free_irq_source_id(source->kvm, source->id); > + kfree(source); > +} > + > +static void _irq_source_put(struct _irq_source *source) > +{ > + if (source) > + kref_put(>kref, _irq_source_release); > +} > + > +static struct _irq_source *__attribute__ ((used)) /* white lie for now */ > +_irq_source_get(struct _irq_source *source) > +{ > + if (source) > + kref_get(>kref); > + > + return source; > +} > + > +static struct _irq_source *_irq_source_alloc(struct kvm *kvm) > +{ > + struct _irq_source *source; > + int id; > + > + source = kzalloc(sizeof(*source), GFP_KERNEL); > + if (!source) > + return ERR_PTR(-ENOMEM); > + > + id = kvm_request_irq_source_id(kvm); > + if (id < 0) { > + kfree(source); > + return ERR_PTR(id); > + } > + > + kref_init(>kref); > + spin_lock_init(>lock); > + source->kvm = kvm; > + source->id = id; > + > + return source; > +} > + > +/* > * > * irqfd: Allows an fd to be used to inject an interrupt to the guest > * > @@ -52,6 +114,8 @@ struct _irqfd { > /* Used for level IRQ fast-path */ > int gsi;
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: In order to inject a level interrupt from an external source using an irqfd, we need to allocate a new irq_source_id. This allows us to assert and (later) de-assert an interrupt line independently from users of KVM_IRQ_LINE and avoid lost interrupts. We also add what may appear like a bit of excessive infrastructure around an object for storing this irq_source_id. However, notice that we only provide a way to assert the interrupt here. A follow-on interface will make use of the same irq_source_id to allow de-assert. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Documentation/virtual/kvm/api.txt |6 ++ arch/x86/kvm/x86.c|1 include/linux/kvm.h |3 + virt/kvm/eventfd.c| 114 - 4 files changed, 120 insertions(+), 4 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 100acde..c7267d5 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin. The irqfd is removed using the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd and kvm_irqfd.gsi. +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level +triggered interrupt. In this case a new irqchip input is allocated +which is logically OR'd with other inputs allowing multiple sources +to independently assert level interrupts. The KVM_IRQFD_FLAG_LEVEL +is only necessary on setup, teardown is identical to that above. +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. 5. The kvm_run structure diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a01a424..80bed07 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_GET_TSC_KHZ: case KVM_CAP_PCI_2_3: case KVM_CAP_KVMCLOCK_CTRL: + case KVM_CAP_IRQFD_LEVEL: r = 1; break; case KVM_CAP_COALESCED_MMIO: diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 2ce09aa..b2e6e4f 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_GET_SMMU_INFO 78 #define KVM_CAP_S390_COW 79 #define KVM_CAP_PPC_ALLOC_HTAB 80 +#define KVM_CAP_IRQFD_LEVEL 81 #ifdef KVM_CAP_IRQ_ROUTING @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config { #endif #define KVM_IRQFD_FLAG_DEASSIGN (1 0) +/* Available with KVM_CAP_IRQFD_LEVEL */ +#define KVM_IRQFD_FLAG_LEVEL (1 1) struct kvm_irqfd { __u32 fd; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 7d7e2aa..ecdbfea 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -36,6 +36,68 @@ #include iodev.h /* + * An irq_source_id can be created from KVM_IRQFD for level interrupt + * injections and shared with other interfaces for EOI or de-assert. + * Create an object with reference counting to make it easy to use. + */ +struct _irq_source { + int id; /* the IRQ source ID */ + bool level_asserted; /* Track assertion state and protect with lock */ + spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */ + struct kvm *kvm; + struct kref kref; +}; + +static void _irq_source_release(struct kref *kref) +{ + struct _irq_source *source; + + source = container_of(kref, struct _irq_source, kref); + + /* This also de-asserts */ + kvm_free_irq_source_id(source-kvm, source-id); + kfree(source); +} + +static void _irq_source_put(struct _irq_source *source) +{ + if (source) + kref_put(source-kref, _irq_source_release); +} + +static struct _irq_source *__attribute__ ((used)) /* white lie for now */ +_irq_source_get(struct _irq_source *source) +{ + if (source) + kref_get(source-kref); + + return source; +} + +static struct _irq_source *_irq_source_alloc(struct kvm *kvm) +{ + struct _irq_source *source; + int id; + + source = kzalloc(sizeof(*source), GFP_KERNEL); + if (!source) + return ERR_PTR(-ENOMEM); + + id = kvm_request_irq_source_id(kvm); + if (id 0) { + kfree(source); + return ERR_PTR(id); + } + + kref_init(source-kref); + spin_lock_init(source-lock); + source-kvm = kvm; + source-id = id; + + return source; +} + +/* * * irqfd: Allows an fd to be used to inject an interrupt to the guest * @@ -52,6 +114,8 @@ struct _irqfd { /* Used for level IRQ fast-path */ int gsi; struct work_struct inject; + /* IRQ source ID for level triggered irqfds */ + struct
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote: On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: In order to inject a level interrupt from an external source using an irqfd, we need to allocate a new irq_source_id. This allows us to assert and (later) de-assert an interrupt line independently from users of KVM_IRQ_LINE and avoid lost interrupts. We also add what may appear like a bit of excessive infrastructure around an object for storing this irq_source_id. However, notice that we only provide a way to assert the interrupt here. A follow-on interface will make use of the same irq_source_id to allow de-assert. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Documentation/virtual/kvm/api.txt |6 ++ arch/x86/kvm/x86.c|1 include/linux/kvm.h |3 + virt/kvm/eventfd.c| 114 - 4 files changed, 120 insertions(+), 4 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 100acde..c7267d5 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin. The irqfd is removed using the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd and kvm_irqfd.gsi. +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level +triggered interrupt. In this case a new irqchip input is allocated +which is logically OR'd with other inputs allowing multiple sources +to independently assert level interrupts. The KVM_IRQFD_FLAG_LEVEL +is only necessary on setup, teardown is identical to that above. +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. 5. The kvm_run structure diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a01a424..80bed07 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_GET_TSC_KHZ: case KVM_CAP_PCI_2_3: case KVM_CAP_KVMCLOCK_CTRL: + case KVM_CAP_IRQFD_LEVEL: r = 1; break; case KVM_CAP_COALESCED_MMIO: diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 2ce09aa..b2e6e4f 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_GET_SMMU_INFO 78 #define KVM_CAP_S390_COW 79 #define KVM_CAP_PPC_ALLOC_HTAB 80 +#define KVM_CAP_IRQFD_LEVEL 81 #ifdef KVM_CAP_IRQ_ROUTING @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config { #endif #define KVM_IRQFD_FLAG_DEASSIGN (1 0) +/* Available with KVM_CAP_IRQFD_LEVEL */ +#define KVM_IRQFD_FLAG_LEVEL (1 1) struct kvm_irqfd { __u32 fd; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 7d7e2aa..ecdbfea 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -36,6 +36,68 @@ #include iodev.h /* + * An irq_source_id can be created from KVM_IRQFD for level interrupt + * injections and shared with other interfaces for EOI or de-assert. + * Create an object with reference counting to make it easy to use. + */ +struct _irq_source { + int id; /* the IRQ source ID */ + bool level_asserted; /* Track assertion state and protect with lock */ + spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */ + struct kvm *kvm; + struct kref kref; +}; + +static void _irq_source_release(struct kref *kref) +{ + struct _irq_source *source; + + source = container_of(kref, struct _irq_source, kref); + + /* This also de-asserts */ + kvm_free_irq_source_id(source-kvm, source-id); + kfree(source); +} + +static void _irq_source_put(struct _irq_source *source) +{ + if (source) + kref_put(source-kref, _irq_source_release); +} + +static struct _irq_source *__attribute__ ((used)) /* white lie for now */ +_irq_source_get(struct _irq_source *source) +{ + if (source) + kref_get(source-kref); + + return source; +} + +static struct _irq_source *_irq_source_alloc(struct kvm *kvm) +{ + struct _irq_source *source; + int id; + + source = kzalloc(sizeof(*source), GFP_KERNEL); + if (!source) + return ERR_PTR(-ENOMEM); + + id = kvm_request_irq_source_id(kvm); + if (id 0) { + kfree(source); + return ERR_PTR(id); + } + + kref_init(source-kref); + spin_lock_init(source-lock); + source-kvm = kvm; + source-id = id; + + return source; +} + +/* * * irqfd: Allows an fd to be used to inject an interrupt to the guest * @@ -52,6 +114,8 @@ struct _irqfd { /* Used for level IRQ
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote: On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote: On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: In order to inject a level interrupt from an external source using an irqfd, we need to allocate a new irq_source_id. This allows us to assert and (later) de-assert an interrupt line independently from users of KVM_IRQ_LINE and avoid lost interrupts. We also add what may appear like a bit of excessive infrastructure around an object for storing this irq_source_id. However, notice that we only provide a way to assert the interrupt here. A follow-on interface will make use of the same irq_source_id to allow de-assert. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Documentation/virtual/kvm/api.txt |6 ++ arch/x86/kvm/x86.c|1 include/linux/kvm.h |3 + virt/kvm/eventfd.c| 114 - 4 files changed, 120 insertions(+), 4 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 100acde..c7267d5 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin. The irqfd is removed using the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd and kvm_irqfd.gsi. +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level +triggered interrupt. In this case a new irqchip input is allocated +which is logically OR'd with other inputs allowing multiple sources +to independently assert level interrupts. The KVM_IRQFD_FLAG_LEVEL +is only necessary on setup, teardown is identical to that above. +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. 5. The kvm_run structure diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a01a424..80bed07 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_GET_TSC_KHZ: case KVM_CAP_PCI_2_3: case KVM_CAP_KVMCLOCK_CTRL: + case KVM_CAP_IRQFD_LEVEL: r = 1; break; case KVM_CAP_COALESCED_MMIO: diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 2ce09aa..b2e6e4f 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_GET_SMMU_INFO 78 #define KVM_CAP_S390_COW 79 #define KVM_CAP_PPC_ALLOC_HTAB 80 +#define KVM_CAP_IRQFD_LEVEL 81 #ifdef KVM_CAP_IRQ_ROUTING @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config { #endif #define KVM_IRQFD_FLAG_DEASSIGN (1 0) +/* Available with KVM_CAP_IRQFD_LEVEL */ +#define KVM_IRQFD_FLAG_LEVEL (1 1) struct kvm_irqfd { __u32 fd; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 7d7e2aa..ecdbfea 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -36,6 +36,68 @@ #include iodev.h /* + * An irq_source_id can be created from KVM_IRQFD for level interrupt + * injections and shared with other interfaces for EOI or de-assert. + * Create an object with reference counting to make it easy to use. + */ +struct _irq_source { + int id; /* the IRQ source ID */ + bool level_asserted; /* Track assertion state and protect with lock */ + spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */ + struct kvm *kvm; + struct kref kref; +}; + +static void _irq_source_release(struct kref *kref) +{ + struct _irq_source *source; + + source = container_of(kref, struct _irq_source, kref); + + /* This also de-asserts */ + kvm_free_irq_source_id(source-kvm, source-id); + kfree(source); +} + +static void _irq_source_put(struct _irq_source *source) +{ + if (source) + kref_put(source-kref, _irq_source_release); +} + +static struct _irq_source *__attribute__ ((used)) /* white lie for now */ +_irq_source_get(struct _irq_source *source) +{ + if (source) + kref_get(source-kref); + + return source; +} + +static struct _irq_source *_irq_source_alloc(struct kvm *kvm) +{ + struct _irq_source *source; + int id; + + source = kzalloc(sizeof(*source), GFP_KERNEL); + if (!source) + return ERR_PTR(-ENOMEM); + + id = kvm_request_irq_source_id(kvm); + if (id 0) { + kfree(source); + return ERR_PTR(id); + } + + kref_init(source-kref); + spin_lock_init(source-lock); + source-kvm = kvm; + source-id = id; + + return source; +} + +/* * * irqfd:
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote: On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote: On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: In order to inject a level interrupt from an external source using an irqfd, we need to allocate a new irq_source_id. This allows us to assert and (later) de-assert an interrupt line independently from users of KVM_IRQ_LINE and avoid lost interrupts. We also add what may appear like a bit of excessive infrastructure around an object for storing this irq_source_id. However, notice that we only provide a way to assert the interrupt here. A follow-on interface will make use of the same irq_source_id to allow de-assert. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Documentation/virtual/kvm/api.txt |6 ++ arch/x86/kvm/x86.c|1 include/linux/kvm.h |3 + virt/kvm/eventfd.c| 114 - 4 files changed, 120 insertions(+), 4 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 100acde..c7267d5 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin. The irqfd is removed using the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd and kvm_irqfd.gsi. +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level +triggered interrupt. In this case a new irqchip input is allocated +which is logically OR'd with other inputs allowing multiple sources +to independently assert level interrupts. The KVM_IRQFD_FLAG_LEVEL +is only necessary on setup, teardown is identical to that above. +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. 5. The kvm_run structure diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a01a424..80bed07 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_GET_TSC_KHZ: case KVM_CAP_PCI_2_3: case KVM_CAP_KVMCLOCK_CTRL: + case KVM_CAP_IRQFD_LEVEL: r = 1; break; case KVM_CAP_COALESCED_MMIO: diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 2ce09aa..b2e6e4f 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_GET_SMMU_INFO 78 #define KVM_CAP_S390_COW 79 #define KVM_CAP_PPC_ALLOC_HTAB 80 +#define KVM_CAP_IRQFD_LEVEL 81 #ifdef KVM_CAP_IRQ_ROUTING @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config { #endif #define KVM_IRQFD_FLAG_DEASSIGN (1 0) +/* Available with KVM_CAP_IRQFD_LEVEL */ +#define KVM_IRQFD_FLAG_LEVEL (1 1) struct kvm_irqfd { __u32 fd; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 7d7e2aa..ecdbfea 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -36,6 +36,68 @@ #include iodev.h /* + * An irq_source_id can be created from KVM_IRQFD for level interrupt + * injections and shared with other interfaces for EOI or de-assert. + * Create an object with reference counting to make it easy to use. + */ +struct _irq_source { + int id; /* the IRQ source ID */ + bool level_asserted; /* Track assertion state and protect with lock */ + spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */ + struct kvm *kvm; + struct kref kref; +}; + +static void _irq_source_release(struct kref *kref) +{ + struct _irq_source *source; + + source = container_of(kref, struct _irq_source, kref); + + /* This also de-asserts */ + kvm_free_irq_source_id(source-kvm, source-id); + kfree(source); +} + +static void _irq_source_put(struct _irq_source *source) +{ + if (source) + kref_put(source-kref, _irq_source_release); +} + +static struct _irq_source *__attribute__ ((used)) /* white lie for now */ +_irq_source_get(struct _irq_source *source) +{ + if (source) + kref_get(source-kref); + + return source; +} + +static struct _irq_source *_irq_source_alloc(struct kvm *kvm) +{ + struct _irq_source *source; + int id; + + source = kzalloc(sizeof(*source), GFP_KERNEL); + if (!source) + return
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 01:49:06PM +0300, Gleb Natapov wrote: On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote: On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote: On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: In order to inject a level interrupt from an external source using an irqfd, we need to allocate a new irq_source_id. This allows us to assert and (later) de-assert an interrupt line independently from users of KVM_IRQ_LINE and avoid lost interrupts. We also add what may appear like a bit of excessive infrastructure around an object for storing this irq_source_id. However, notice that we only provide a way to assert the interrupt here. A follow-on interface will make use of the same irq_source_id to allow de-assert. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Documentation/virtual/kvm/api.txt |6 ++ arch/x86/kvm/x86.c|1 include/linux/kvm.h |3 + virt/kvm/eventfd.c| 114 - 4 files changed, 120 insertions(+), 4 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 100acde..c7267d5 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin. The irqfd is removed using the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd and kvm_irqfd.gsi. +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level +triggered interrupt. In this case a new irqchip input is allocated +which is logically OR'd with other inputs allowing multiple sources +to independently assert level interrupts. The KVM_IRQFD_FLAG_LEVEL +is only necessary on setup, teardown is identical to that above. +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. 5. The kvm_run structure diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a01a424..80bed07 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_GET_TSC_KHZ: case KVM_CAP_PCI_2_3: case KVM_CAP_KVMCLOCK_CTRL: + case KVM_CAP_IRQFD_LEVEL: r = 1; break; case KVM_CAP_COALESCED_MMIO: diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 2ce09aa..b2e6e4f 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_GET_SMMU_INFO 78 #define KVM_CAP_S390_COW 79 #define KVM_CAP_PPC_ALLOC_HTAB 80 +#define KVM_CAP_IRQFD_LEVEL 81 #ifdef KVM_CAP_IRQ_ROUTING @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config { #endif #define KVM_IRQFD_FLAG_DEASSIGN (1 0) +/* Available with KVM_CAP_IRQFD_LEVEL */ +#define KVM_IRQFD_FLAG_LEVEL (1 1) struct kvm_irqfd { __u32 fd; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 7d7e2aa..ecdbfea 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -36,6 +36,68 @@ #include iodev.h /* + * An irq_source_id can be created from KVM_IRQFD for level interrupt + * injections and shared with other interfaces for EOI or de-assert. + * Create an object with reference counting to make it easy to use. + */ +struct _irq_source { + int id; /* the IRQ source ID */ + bool level_asserted; /* Track assertion state and protect with lock */ + spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */ + struct kvm *kvm; + struct kref kref; +}; + +static void _irq_source_release(struct kref *kref) +{ + struct _irq_source *source; + + source = container_of(kref, struct _irq_source, kref); + + /* This also de-asserts */ + kvm_free_irq_source_id(source-kvm, source-id); + kfree(source); +} + +static void _irq_source_put(struct _irq_source *source) +{ + if (source) + kref_put(source-kref, _irq_source_release); +} + +static struct _irq_source *__attribute__ ((used)) /* white lie for now */ +_irq_source_get(struct _irq_source *source) +{ + if (source) + kref_get(source-kref); + + return source; +} + +static struct _irq_source *_irq_source_alloc(struct kvm *kvm) +{ + struct _irq_source *source; +
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 01:53:11PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 01:49:06PM +0300, Gleb Natapov wrote: On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote: On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote: On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: In order to inject a level interrupt from an external source using an irqfd, we need to allocate a new irq_source_id. This allows us to assert and (later) de-assert an interrupt line independently from users of KVM_IRQ_LINE and avoid lost interrupts. We also add what may appear like a bit of excessive infrastructure around an object for storing this irq_source_id. However, notice that we only provide a way to assert the interrupt here. A follow-on interface will make use of the same irq_source_id to allow de-assert. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Documentation/virtual/kvm/api.txt |6 ++ arch/x86/kvm/x86.c|1 include/linux/kvm.h |3 + virt/kvm/eventfd.c| 114 - 4 files changed, 120 insertions(+), 4 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 100acde..c7267d5 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin. The irqfd is removed using the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd and kvm_irqfd.gsi. +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level +triggered interrupt. In this case a new irqchip input is allocated +which is logically OR'd with other inputs allowing multiple sources +to independently assert level interrupts. The KVM_IRQFD_FLAG_LEVEL +is only necessary on setup, teardown is identical to that above. +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. 5. The kvm_run structure diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a01a424..80bed07 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_GET_TSC_KHZ: case KVM_CAP_PCI_2_3: case KVM_CAP_KVMCLOCK_CTRL: + case KVM_CAP_IRQFD_LEVEL: r = 1; break; case KVM_CAP_COALESCED_MMIO: diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 2ce09aa..b2e6e4f 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_GET_SMMU_INFO 78 #define KVM_CAP_S390_COW 79 #define KVM_CAP_PPC_ALLOC_HTAB 80 +#define KVM_CAP_IRQFD_LEVEL 81 #ifdef KVM_CAP_IRQ_ROUTING @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config { #endif #define KVM_IRQFD_FLAG_DEASSIGN (1 0) +/* Available with KVM_CAP_IRQFD_LEVEL */ +#define KVM_IRQFD_FLAG_LEVEL (1 1) struct kvm_irqfd { __u32 fd; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 7d7e2aa..ecdbfea 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -36,6 +36,68 @@ #include iodev.h /* + * An irq_source_id can be created from KVM_IRQFD for level interrupt + * injections and shared with other interfaces for EOI or de-assert. + * Create an object with reference counting to make it easy to use. + */ +struct _irq_source { + int id; /* the IRQ source ID */ + bool level_asserted; /* Track assertion state and protect with lock */ + spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */ + struct kvm *kvm; + struct kref kref; +}; + +static void _irq_source_release(struct kref *kref) +{ + struct _irq_source *source; + + source = container_of(kref, struct _irq_source, kref); + + /* This also de-asserts */ + kvm_free_irq_source_id(source-kvm, source-id); + kfree(source); +} + +static void _irq_source_put(struct _irq_source *source) +{ + if (source) + kref_put(source-kref, _irq_source_release); +} + +static struct _irq_source *__attribute__ ((used)) /* white lie for now */ +_irq_source_get(struct _irq_source *source) +{ + if (source) +
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 01:55:30PM +0300, Gleb Natapov wrote: On Wed, Jul 18, 2012 at 01:53:11PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 01:49:06PM +0300, Gleb Natapov wrote: On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote: On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote: On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: In order to inject a level interrupt from an external source using an irqfd, we need to allocate a new irq_source_id. This allows us to assert and (later) de-assert an interrupt line independently from users of KVM_IRQ_LINE and avoid lost interrupts. We also add what may appear like a bit of excessive infrastructure around an object for storing this irq_source_id. However, notice that we only provide a way to assert the interrupt here. A follow-on interface will make use of the same irq_source_id to allow de-assert. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Documentation/virtual/kvm/api.txt |6 ++ arch/x86/kvm/x86.c|1 include/linux/kvm.h |3 + virt/kvm/eventfd.c| 114 - 4 files changed, 120 insertions(+), 4 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 100acde..c7267d5 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin. The irqfd is removed using the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd and kvm_irqfd.gsi. +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level +triggered interrupt. In this case a new irqchip input is allocated +which is logically OR'd with other inputs allowing multiple sources +to independently assert level interrupts. The KVM_IRQFD_FLAG_LEVEL +is only necessary on setup, teardown is identical to that above. +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL. 5. The kvm_run structure diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a01a424..80bed07 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_GET_TSC_KHZ: case KVM_CAP_PCI_2_3: case KVM_CAP_KVMCLOCK_CTRL: + case KVM_CAP_IRQFD_LEVEL: r = 1; break; case KVM_CAP_COALESCED_MMIO: diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 2ce09aa..b2e6e4f 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info { #define KVM_CAP_PPC_GET_SMMU_INFO 78 #define KVM_CAP_S390_COW 79 #define KVM_CAP_PPC_ALLOC_HTAB 80 +#define KVM_CAP_IRQFD_LEVEL 81 #ifdef KVM_CAP_IRQ_ROUTING @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config { #endif #define KVM_IRQFD_FLAG_DEASSIGN (1 0) +/* Available with KVM_CAP_IRQFD_LEVEL */ +#define KVM_IRQFD_FLAG_LEVEL (1 1) struct kvm_irqfd { __u32 fd; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 7d7e2aa..ecdbfea 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -36,6 +36,68 @@ #include iodev.h /* + * An irq_source_id can be created from KVM_IRQFD for level interrupt + * injections and shared with other interfaces for EOI or de-assert. + * Create an object with reference counting to make it easy to use. + */ +struct _irq_source { + int id; /* the IRQ source ID */ + bool level_asserted; /* Track assertion state and protect with lock */ + spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */ + struct kvm *kvm; + struct kref kref; +}; + +static void _irq_source_release(struct kref *kref) +{ + struct _irq_source *source; + + source = container_of(kref, struct _irq_source, kref); + + /* This also de-asserts */ + kvm_free_irq_source_id(source-kvm, source-id); + kfree(source); +} + +static void _irq_source_put(struct _irq_source *source) +{ + if (source) + kref_put(source-kref, _irq_source_release); +} + +static
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote: So as was discussed kvm_set_irq under spinlock is bad for scalability with multiple VCPUs. Why do we need a spinlock simply to protect level_asserted? Let's use an atomic test and set/test and clear and the problem goes away. That sad reality is that for level interrupt we already scan all vcpus under spinlock. Where? ioapic $ grep kvm_for_each_vcpu virt/kvm/ioapic.c $ ? Come on Michael. You can do better than grep and actually look at what code does. The code that loops over all vcpus while delivering an irq is in kvm_irq_delivery_to_apic(). Now grep for that. Hmm, I see, it's actually done for edge if injected from ioapic too, right? So set_irq does a linear scan, and for each matching CPU it calls kvm_irq_delivery_to_apic which is another scan? So it's actually N^2 worst case for a broadcast? No it isn't, I misread the code. Anyway, maybe not trivially but this looks fixable to me: we could drop the ioapic lock before calling kvm_irq_delivery_to_apic. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote: So as was discussed kvm_set_irq under spinlock is bad for scalability with multiple VCPUs. Why do we need a spinlock simply to protect level_asserted? Let's use an atomic test and set/test and clear and the problem goes away. That sad reality is that for level interrupt we already scan all vcpus under spinlock. Where? ioapic $ grep kvm_for_each_vcpu virt/kvm/ioapic.c $ ? Come on Michael. You can do better than grep and actually look at what code does. The code that loops over all vcpus while delivering an irq is in kvm_irq_delivery_to_apic(). Now grep for that. Hmm, I see, it's actually done for edge if injected from ioapic too, right? So set_irq does a linear scan, and for each matching CPU it calls kvm_irq_delivery_to_apic which is another scan? So it's actually N^2 worst case for a broadcast? No it isn't, I misread the code. Anyway, maybe not trivially but this looks fixable to me: we could drop the ioapic lock before calling kvm_irq_delivery_to_apic. May be, may be not. Just saying lets drop lock whenever we don't feel like holding one does not cut it. Back to original point though current situation is that calling kvm_set_irq() under spinlock is not worse for scalability than calling it not under one. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote: On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote: So as was discussed kvm_set_irq under spinlock is bad for scalability with multiple VCPUs. Why do we need a spinlock simply to protect level_asserted? Let's use an atomic test and set/test and clear and the problem goes away. That sad reality is that for level interrupt we already scan all vcpus under spinlock. Where? ioapic $ grep kvm_for_each_vcpu virt/kvm/ioapic.c $ ? Come on Michael. You can do better than grep and actually look at what code does. The code that loops over all vcpus while delivering an irq is in kvm_irq_delivery_to_apic(). Now grep for that. Hmm, I see, it's actually done for edge if injected from ioapic too, right? So set_irq does a linear scan, and for each matching CPU it calls kvm_irq_delivery_to_apic which is another scan? So it's actually N^2 worst case for a broadcast? No it isn't, I misread the code. Anyway, maybe not trivially but this looks fixable to me: we could drop the ioapic lock before calling kvm_irq_delivery_to_apic. May be, may be not. Just saying lets drop lock whenever we don't feel like holding one does not cut it. One thing we do is set remote_irr if interrupt was injected. I agree these things are tricky. One other question: static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx) { union kvm_ioapic_redirect_entry *pent; int injected = -1; pent = ioapic-redirtbl[idx]; if (!pent-fields.mask) { injected = ioapic_deliver(ioapic, idx); if (injected pent-fields.trig_mode == IOAPIC_LEVEL_TRIG) pent-fields.remote_irr = 1; } return injected; } This if (injected) looks a bit strange since ioapic_deliver returns -1 if no matching destinations. Should be if (injected 0)? Back to original point though current situation is that calling kvm_set_irq() under spinlock is not worse for scalability than calling it not under one. Yes. Still the specific use can just use an atomic flag, lock+bool is not needed, and we won't need to undo it later. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote: On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote: So as was discussed kvm_set_irq under spinlock is bad for scalability with multiple VCPUs. Why do we need a spinlock simply to protect level_asserted? Let's use an atomic test and set/test and clear and the problem goes away. That sad reality is that for level interrupt we already scan all vcpus under spinlock. Where? ioapic $ grep kvm_for_each_vcpu virt/kvm/ioapic.c $ ? Come on Michael. You can do better than grep and actually look at what code does. The code that loops over all vcpus while delivering an irq is in kvm_irq_delivery_to_apic(). Now grep for that. Hmm, I see, it's actually done for edge if injected from ioapic too, right? So set_irq does a linear scan, and for each matching CPU it calls kvm_irq_delivery_to_apic which is another scan? So it's actually N^2 worst case for a broadcast? No it isn't, I misread the code. Anyway, maybe not trivially but this looks fixable to me: we could drop the ioapic lock before calling kvm_irq_delivery_to_apic. May be, may be not. Just saying lets drop lock whenever we don't feel like holding one does not cut it. One thing we do is set remote_irr if interrupt was injected. I agree these things are tricky. One other question: static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx) { union kvm_ioapic_redirect_entry *pent; int injected = -1; pent = ioapic-redirtbl[idx]; if (!pent-fields.mask) { injected = ioapic_deliver(ioapic, idx); if (injected pent-fields.trig_mode == IOAPIC_LEVEL_TRIG) pent-fields.remote_irr = 1; } return injected; } This if (injected) looks a bit strange since ioapic_deliver returns -1 if no matching destinations. Should be if (injected 0)? Back to original point though current situation is that calling kvm_set_irq() under spinlock is not worse for scalability than calling it not under one. Yes. Still the specific use can just use an atomic flag, lock+bool is not needed, and we won't need to undo it later. Actually, no, replacing it with an atomic is racy. CPU0 (inject) CPU1 (EOI) atomic_cmpxchg(asserted, 0, 1) atomic_cmpxchg(asserted, 1, 0) kvm_set_irq(0) kvm_set_irq(1) eventfd_signal The interrupt is now stuck on until another interrupt is injected. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 08:47:23AM -0600, Alex Williamson wrote: On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote: On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote: So as was discussed kvm_set_irq under spinlock is bad for scalability with multiple VCPUs. Why do we need a spinlock simply to protect level_asserted? Let's use an atomic test and set/test and clear and the problem goes away. That sad reality is that for level interrupt we already scan all vcpus under spinlock. Where? ioapic $ grep kvm_for_each_vcpu virt/kvm/ioapic.c $ ? Come on Michael. You can do better than grep and actually look at what code does. The code that loops over all vcpus while delivering an irq is in kvm_irq_delivery_to_apic(). Now grep for that. Hmm, I see, it's actually done for edge if injected from ioapic too, right? So set_irq does a linear scan, and for each matching CPU it calls kvm_irq_delivery_to_apic which is another scan? So it's actually N^2 worst case for a broadcast? No it isn't, I misread the code. Anyway, maybe not trivially but this looks fixable to me: we could drop the ioapic lock before calling kvm_irq_delivery_to_apic. May be, may be not. Just saying lets drop lock whenever we don't feel like holding one does not cut it. One thing we do is set remote_irr if interrupt was injected. I agree these things are tricky. One other question: static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx) { union kvm_ioapic_redirect_entry *pent; int injected = -1; pent = ioapic-redirtbl[idx]; if (!pent-fields.mask) { injected = ioapic_deliver(ioapic, idx); if (injected pent-fields.trig_mode == IOAPIC_LEVEL_TRIG) pent-fields.remote_irr = 1; } return injected; } This if (injected) looks a bit strange since ioapic_deliver returns -1 if no matching destinations. Should be if (injected 0)? Back to original point though current situation is that calling kvm_set_irq() under spinlock is not worse for scalability than calling it not under one. Yes. Still the specific use can just use an atomic flag, lock+bool is not needed, and we won't need to undo it later. Actually, no, replacing it with an atomic is racy. CPU0 (inject) CPU1 (EOI) atomic_cmpxchg(asserted, 0, 1) atomic_cmpxchg(asserted, 1, 0) kvm_set_irq(0) kvm_set_irq(1) eventfd_signal The interrupt is now stuck on until another interrupt is injected. Well EOI somehow happened here before interrupt so it's a bug somewhere else? -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, 2012-07-18 at 18:38 +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 08:47:23AM -0600, Alex Williamson wrote: On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote: On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote: So as was discussed kvm_set_irq under spinlock is bad for scalability with multiple VCPUs. Why do we need a spinlock simply to protect level_asserted? Let's use an atomic test and set/test and clear and the problem goes away. That sad reality is that for level interrupt we already scan all vcpus under spinlock. Where? ioapic $ grep kvm_for_each_vcpu virt/kvm/ioapic.c $ ? Come on Michael. You can do better than grep and actually look at what code does. The code that loops over all vcpus while delivering an irq is in kvm_irq_delivery_to_apic(). Now grep for that. Hmm, I see, it's actually done for edge if injected from ioapic too, right? So set_irq does a linear scan, and for each matching CPU it calls kvm_irq_delivery_to_apic which is another scan? So it's actually N^2 worst case for a broadcast? No it isn't, I misread the code. Anyway, maybe not trivially but this looks fixable to me: we could drop the ioapic lock before calling kvm_irq_delivery_to_apic. May be, may be not. Just saying lets drop lock whenever we don't feel like holding one does not cut it. One thing we do is set remote_irr if interrupt was injected. I agree these things are tricky. One other question: static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx) { union kvm_ioapic_redirect_entry *pent; int injected = -1; pent = ioapic-redirtbl[idx]; if (!pent-fields.mask) { injected = ioapic_deliver(ioapic, idx); if (injected pent-fields.trig_mode == IOAPIC_LEVEL_TRIG) pent-fields.remote_irr = 1; } return injected; } This if (injected) looks a bit strange since ioapic_deliver returns -1 if no matching destinations. Should be if (injected 0)? Back to original point though current situation is that calling kvm_set_irq() under spinlock is not worse for scalability than calling it not under one. Yes. Still the specific use can just use an atomic flag, lock+bool is not needed, and we won't need to undo it later. Actually, no, replacing it with an atomic is racy. CPU0 (inject) CPU1 (EOI) atomic_cmpxchg(asserted, 0, 1) atomic_cmpxchg(asserted, 1, 0) kvm_set_irq(0) kvm_set_irq(1) eventfd_signal The interrupt is now stuck on until another interrupt is injected. Well EOI somehow happened here before interrupt so it's a bug somewhere else? Interrupts can be shared. We also can't guarantee that the guest won't write a bogus EOI to the ioapic. The irq ack notifier doesn't filter on irq source id... I'm not sure it can. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 09:48:01AM -0600, Alex Williamson wrote: On Wed, 2012-07-18 at 18:38 +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 08:47:23AM -0600, Alex Williamson wrote: On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote: On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote: On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote: So as was discussed kvm_set_irq under spinlock is bad for scalability with multiple VCPUs. Why do we need a spinlock simply to protect level_asserted? Let's use an atomic test and set/test and clear and the problem goes away. That sad reality is that for level interrupt we already scan all vcpus under spinlock. Where? ioapic $ grep kvm_for_each_vcpu virt/kvm/ioapic.c $ ? Come on Michael. You can do better than grep and actually look at what code does. The code that loops over all vcpus while delivering an irq is in kvm_irq_delivery_to_apic(). Now grep for that. Hmm, I see, it's actually done for edge if injected from ioapic too, right? So set_irq does a linear scan, and for each matching CPU it calls kvm_irq_delivery_to_apic which is another scan? So it's actually N^2 worst case for a broadcast? No it isn't, I misread the code. Anyway, maybe not trivially but this looks fixable to me: we could drop the ioapic lock before calling kvm_irq_delivery_to_apic. May be, may be not. Just saying lets drop lock whenever we don't feel like holding one does not cut it. One thing we do is set remote_irr if interrupt was injected. I agree these things are tricky. One other question: static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx) { union kvm_ioapic_redirect_entry *pent; int injected = -1; pent = ioapic-redirtbl[idx]; if (!pent-fields.mask) { injected = ioapic_deliver(ioapic, idx); if (injected pent-fields.trig_mode == IOAPIC_LEVEL_TRIG) pent-fields.remote_irr = 1; } return injected; } This if (injected) looks a bit strange since ioapic_deliver returns -1 if no matching destinations. Should be if (injected 0)? Back to original point though current situation is that calling kvm_set_irq() under spinlock is not worse for scalability than calling it not under one. Yes. Still the specific use can just use an atomic flag, lock+bool is not needed, and we won't need to undo it later. Actually, no, replacing it with an atomic is racy. CPU0 (inject) CPU1 (EOI) atomic_cmpxchg(asserted, 0, 1) atomic_cmpxchg(asserted, 1, 0) kvm_set_irq(0) kvm_set_irq(1) eventfd_signal The interrupt is now stuck on until another interrupt is injected. Well EOI somehow happened here before interrupt so it's a bug somewhere else? Interrupts can be shared. We also can't guarantee that the guest won't write a bogus EOI to the ioapic. The irq ack notifier doesn't filter on irq source id... I'm not sure it can. I guess if Avi OKs adding another kvm_set_irq under spinlock that's the best we can do for now. If not, maybe we can teach kvm_set_irq to return an indication of the previous status. Specifically kvm_irq_line_state could do test_and_set/test_and_clear and if already set/clear we return 0 immediately. -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote: Back to original point though current situation is that calling kvm_set_irq() under spinlock is not worse for scalability than calling it not under one. Yes. Still the specific use can just use an atomic flag, lock+bool is not needed, and we won't need to undo it later. Actually, no, replacing it with an atomic is racy. CPU0 (inject) CPU1 (EOI) atomic_cmpxchg(asserted, 0, 1) atomic_cmpxchg(asserted, 1, 0) kvm_set_irq(0) kvm_set_irq(1) eventfd_signal The interrupt is now stuck on until another interrupt is injected. Well EOI somehow happened here before interrupt so it's a bug somewhere else? Interrupts can be shared. We also can't guarantee that the guest won't write a bogus EOI to the ioapic. The irq ack notifier doesn't filter on irq source id... I'm not sure it can. I guess if Avi OKs adding another kvm_set_irq under spinlock that's the best we can do for now. Why can't a mutex be used instead of a spinlock again? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 03:42:09PM -0300, Marcelo Tosatti wrote: On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote: Back to original point though current situation is that calling kvm_set_irq() under spinlock is not worse for scalability than calling it not under one. Yes. Still the specific use can just use an atomic flag, lock+bool is not needed, and we won't need to undo it later. Actually, no, replacing it with an atomic is racy. CPU0 (inject) CPU1 (EOI) atomic_cmpxchg(asserted, 0, 1) atomic_cmpxchg(asserted, 1, 0) kvm_set_irq(0) kvm_set_irq(1) eventfd_signal The interrupt is now stuck on until another interrupt is injected. Well EOI somehow happened here before interrupt so it's a bug somewhere else? Interrupts can be shared. We also can't guarantee that the guest won't write a bogus EOI to the ioapic. The irq ack notifier doesn't filter on irq source id... I'm not sure it can. I guess if Avi OKs adding another kvm_set_irq under spinlock that's the best we can do for now. Why can't a mutex be used instead of a spinlock again? Why was it changed at the first place? Commit says that the function is called from unsleepable context, but no stack trace. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote: On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote: Back to original point though current situation is that calling kvm_set_irq() under spinlock is not worse for scalability than calling it not under one. Yes. Still the specific use can just use an atomic flag, lock+bool is not needed, and we won't need to undo it later. Actually, no, replacing it with an atomic is racy. CPU0 (inject) CPU1 (EOI) atomic_cmpxchg(asserted, 0, 1) atomic_cmpxchg(asserted, 1, 0) kvm_set_irq(0) kvm_set_irq(1) eventfd_signal The interrupt is now stuck on until another interrupt is injected. Well EOI somehow happened here before interrupt so it's a bug somewhere else? Interrupts can be shared. We also can't guarantee that the guest won't write a bogus EOI to the ioapic. The irq ack notifier doesn't filter on irq source id... I'm not sure it can. I guess if Avi OKs adding another kvm_set_irq under spinlock that's the best we can do for now. Why can't a mutex be used instead of a spinlock again? eventfd_signal calls the inject function from atomic context. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, 2012-07-18 at 13:07 -0600, Alex Williamson wrote: On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote: On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote: Back to original point though current situation is that calling kvm_set_irq() under spinlock is not worse for scalability than calling it not under one. Yes. Still the specific use can just use an atomic flag, lock+bool is not needed, and we won't need to undo it later. Actually, no, replacing it with an atomic is racy. CPU0 (inject) CPU1 (EOI) atomic_cmpxchg(asserted, 0, 1) atomic_cmpxchg(asserted, 1, 0) kvm_set_irq(0) kvm_set_irq(1) eventfd_signal The interrupt is now stuck on until another interrupt is injected. Well EOI somehow happened here before interrupt so it's a bug somewhere else? Interrupts can be shared. We also can't guarantee that the guest won't write a bogus EOI to the ioapic. The irq ack notifier doesn't filter on irq source id... I'm not sure it can. I guess if Avi OKs adding another kvm_set_irq under spinlock that's the best we can do for now. Why can't a mutex be used instead of a spinlock again? eventfd_signal calls the inject function from atomic context. Actually, that's called from a workq. I'll have to switch it back and turn on lockdep to remember why I couldn't sleep there. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 01:13:06PM -0600, Alex Williamson wrote: On Wed, 2012-07-18 at 13:07 -0600, Alex Williamson wrote: On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote: On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote: Back to original point though current situation is that calling kvm_set_irq() under spinlock is not worse for scalability than calling it not under one. Yes. Still the specific use can just use an atomic flag, lock+bool is not needed, and we won't need to undo it later. Actually, no, replacing it with an atomic is racy. CPU0 (inject) CPU1 (EOI) atomic_cmpxchg(asserted, 0, 1) atomic_cmpxchg(asserted, 1, 0) kvm_set_irq(0) kvm_set_irq(1) eventfd_signal The interrupt is now stuck on until another interrupt is injected. Well EOI somehow happened here before interrupt so it's a bug somewhere else? Interrupts can be shared. We also can't guarantee that the guest won't write a bogus EOI to the ioapic. The irq ack notifier doesn't filter on irq source id... I'm not sure it can. I guess if Avi OKs adding another kvm_set_irq under spinlock that's the best we can do for now. Why can't a mutex be used instead of a spinlock again? eventfd_signal calls the inject function from atomic context. Actually, that's called from a workq. I'll have to switch it back and turn on lockdep to remember why I couldn't sleep there. I'll try to fix kvm_set_irq so it returns 0 if level was already 0. Then you do not need extra state. -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, 2012-07-18 at 13:13 -0600, Alex Williamson wrote: On Wed, 2012-07-18 at 13:07 -0600, Alex Williamson wrote: On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote: On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote: Back to original point though current situation is that calling kvm_set_irq() under spinlock is not worse for scalability than calling it not under one. Yes. Still the specific use can just use an atomic flag, lock+bool is not needed, and we won't need to undo it later. Actually, no, replacing it with an atomic is racy. CPU0 (inject) CPU1 (EOI) atomic_cmpxchg(asserted, 0, 1) atomic_cmpxchg(asserted, 1, 0) kvm_set_irq(0) kvm_set_irq(1) eventfd_signal The interrupt is now stuck on until another interrupt is injected. Well EOI somehow happened here before interrupt so it's a bug somewhere else? Interrupts can be shared. We also can't guarantee that the guest won't write a bogus EOI to the ioapic. The irq ack notifier doesn't filter on irq source id... I'm not sure it can. I guess if Avi OKs adding another kvm_set_irq under spinlock that's the best we can do for now. Why can't a mutex be used instead of a spinlock again? eventfd_signal calls the inject function from atomic context. Actually, that's called from a workq. I'll have to switch it back and turn on lockdep to remember why I couldn't sleep there. switching to a mutex results in: BUG: sleeping function called from invalid context at kernel/mutex.c:269 in_atomic(): 1, irqs_disabled(): 0, pid: 30025, name: qemu-system-x86 INFO: lockdep is turned off. Pid: 30025, comm: qemu-system-x86 Not tainted 3.5.0-rc4+ #109 Call Trace: [81088425] __might_sleep+0xf5/0x130 [81564c6f] mutex_lock_nested+0x2f/0x60 [a07db7d5] eoifd_event+0x25/0x70 [kvm] [a07daea4] kvm_notify_acked_irq+0xa4/0x140 [kvm] [a07dae2a] ? kvm_notify_acked_irq+0x2a/0x140 [kvm] [a07d9bb4] kvm_ioapic_update_eoi+0x84/0xf0 [kvm] [a0806c43] apic_set_eoi+0x123/0x130 [kvm] [a0806fd8] apic_reg_write+0x388/0x670 [kvm] [a07eb03c] ? vcpu_enter_guest+0x32c/0x740 [kvm] [a0807481] kvm_lapic_set_eoi+0x21/0x30 [kvm] [a04ba3f9] handle_apic_access+0x69/0x80 [kvm_intel] [a04ba02a] vmx_handle_exit+0xaa/0x260 [kvm_intel] -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 02:28:34PM -0600, Alex Williamson wrote: turn on lockdep to remember why I couldn't sleep there. switching to a mutex results in: BUG: sleeping function called from invalid context at kernel/mutex.c:269 in_atomic(): 1, irqs_disabled(): 0, pid: 30025, name: qemu-system-x86 INFO: lockdep is turned off. Pid: 30025, comm: qemu-system-x86 Not tainted 3.5.0-rc4+ #109 Call Trace: [81088425] __might_sleep+0xf5/0x130 [81564c6f] mutex_lock_nested+0x2f/0x60 [a07db7d5] eoifd_event+0x25/0x70 [kvm] [a07daea4] kvm_notify_acked_irq+0xa4/0x140 [kvm] [a07dae2a] ? kvm_notify_acked_irq+0x2a/0x140 [kvm] [a07d9bb4] kvm_ioapic_update_eoi+0x84/0xf0 [kvm] [a0806c43] apic_set_eoi+0x123/0x130 [kvm] [a0806fd8] apic_reg_write+0x388/0x670 [kvm] [a07eb03c] ? vcpu_enter_guest+0x32c/0x740 [kvm] [a0807481] kvm_lapic_set_eoi+0x21/0x30 [kvm] [a04ba3f9] handle_apic_access+0x69/0x80 [kvm_intel] [a04ba02a] vmx_handle_exit+0xaa/0x260 [kvm_intel] Its RCU from ack notifiers, OK. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, Jul 18, 2012 at 06:23:34PM -0300, Marcelo Tosatti wrote: On Wed, Jul 18, 2012 at 02:28:34PM -0600, Alex Williamson wrote: turn on lockdep to remember why I couldn't sleep there. switching to a mutex results in: BUG: sleeping function called from invalid context at kernel/mutex.c:269 in_atomic(): 1, irqs_disabled(): 0, pid: 30025, name: qemu-system-x86 INFO: lockdep is turned off. Pid: 30025, comm: qemu-system-x86 Not tainted 3.5.0-rc4+ #109 Call Trace: [81088425] __might_sleep+0xf5/0x130 [81564c6f] mutex_lock_nested+0x2f/0x60 [a07db7d5] eoifd_event+0x25/0x70 [kvm] [a07daea4] kvm_notify_acked_irq+0xa4/0x140 [kvm] [a07dae2a] ? kvm_notify_acked_irq+0x2a/0x140 [kvm] [a07d9bb4] kvm_ioapic_update_eoi+0x84/0xf0 [kvm] [a0806c43] apic_set_eoi+0x123/0x130 [kvm] [a0806fd8] apic_reg_write+0x388/0x670 [kvm] [a07eb03c] ? vcpu_enter_guest+0x32c/0x740 [kvm] [a0807481] kvm_lapic_set_eoi+0x21/0x30 [kvm] [a04ba3f9] handle_apic_access+0x69/0x80 [kvm_intel] [a04ba02a] vmx_handle_exit+0xaa/0x260 [kvm_intel] Its RCU from ack notifiers, OK. I'm testing a patch that moves all bitmap handling to under pic/ioapic lock. After this we can teach kvm_set_irq to report when a bit that is cleared/set is already clear/set, without races. And then no tracking will be necessary in irqfd - we can just call kvm_set_irq(..., 0) and look at the return status. -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Tue, Jul 17, 2012 at 04:16:04PM -0600, Alex Williamson wrote: > On Wed, 2012-07-18 at 01:00 +0300, Michael S. Tsirkin wrote: > > On Tue, Jul 17, 2012 at 03:57:41PM -0600, Alex Williamson wrote: > > > On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote: > > > > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: > > > > > @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work) > > > > >* It is now safe to release the object's resources > > > > >*/ > > > > > eventfd_ctx_put(irqfd->eventfd); > > > > > + > > > > > + _irq_source_put(irqfd->source); > > > > > + > > > > > kfree(irqfd); > > > > > } > > > > > > > > > > > > > Hang on, this is a bug I think. This is done asynchronously. So this > > > > means that I can assign MAX number of irqfds, then close one, and now > > > What is this? > > > Do you mean max irq source ids? > > > > Yes, this is what I meant. Sorry about being unclear. > > > > > > assign will fail because deassign did not get freed. > > > > > > > > Maybe we can solve this by flushing wq before assign? > > > > Looks a bit fragile but may be enough - need to document well. > > Why is this fragile? We could even make it part of a retry so we don't > call it unless we need to. > It just ties in assign and deassign. Maybe it's ok - but pls add a comment explaining the whole design. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, 2012-07-18 at 01:00 +0300, Michael S. Tsirkin wrote: > On Tue, Jul 17, 2012 at 03:57:41PM -0600, Alex Williamson wrote: > > On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote: > > > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: > > > > @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work) > > > > * It is now safe to release the object's resources > > > > */ > > > > eventfd_ctx_put(irqfd->eventfd); > > > > + > > > > + _irq_source_put(irqfd->source); > > > > + > > > > kfree(irqfd); > > > > } > > > > > > > > > > Hang on, this is a bug I think. This is done asynchronously. So this > > > means that I can assign MAX number of irqfds, then close one, and now > > What is this? > > Do you mean max irq source ids? > > Yes, this is what I meant. Sorry about being unclear. > > > > assign will fail because deassign did not get freed. > > > > > > Maybe we can solve this by flushing wq before assign? > > > Looks a bit fragile but may be enough - need to document well. Why is this fragile? We could even make it part of a retry so we don't call it unless we need to. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Tue, Jul 17, 2012 at 03:57:41PM -0600, Alex Williamson wrote: > On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote: > > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: > > > @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work) > > >* It is now safe to release the object's resources > > >*/ > > > eventfd_ctx_put(irqfd->eventfd); > > > + > > > + _irq_source_put(irqfd->source); > > > + > > > kfree(irqfd); > > > } > > > > > > > Hang on, this is a bug I think. This is done asynchronously. So this > > means that I can assign MAX number of irqfds, then close one, and now > What is this? > Do you mean max irq source ids? Yes, this is what I meant. Sorry about being unclear. > > assign will fail because deassign did not get freed. > > > > Maybe we can solve this by flushing wq before assign? > > Looks a bit fragile but may be enough - need to document well. > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote: > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: > > @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work) > > * It is now safe to release the object's resources > > */ > > eventfd_ctx_put(irqfd->eventfd); > > + > > + _irq_source_put(irqfd->source); > > + > > kfree(irqfd); > > } > > > > Hang on, this is a bug I think. This is done asynchronously. So this > means that I can assign MAX number of irqfds, then close one, and now What is this? Do you mean max irq source ids? > assign will fail because deassign did not get freed. > > Maybe we can solve this by flushing wq before assign? > Looks a bit fragile but may be enough - need to document well. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: > @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work) >* It is now safe to release the object's resources >*/ > eventfd_ctx_put(irqfd->eventfd); > + > + _irq_source_put(irqfd->source); > + > kfree(irqfd); > } > Hang on, this is a bug I think. This is done asynchronously. So this means that I can assign MAX number of irqfds, then close one, and now assign will fail because deassign did not get freed. Maybe we can solve this by flushing wq before assign? Looks a bit fragile but may be enough - need to document well. -- MST -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work) * It is now safe to release the object's resources */ eventfd_ctx_put(irqfd-eventfd); + + _irq_source_put(irqfd-source); + kfree(irqfd); } Hang on, this is a bug I think. This is done asynchronously. So this means that I can assign MAX number of irqfds, then close one, and now assign will fail because deassign did not get freed. Maybe we can solve this by flushing wq before assign? Looks a bit fragile but may be enough - need to document well. -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote: On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work) * It is now safe to release the object's resources */ eventfd_ctx_put(irqfd-eventfd); + + _irq_source_put(irqfd-source); + kfree(irqfd); } Hang on, this is a bug I think. This is done asynchronously. So this means that I can assign MAX number of irqfds, then close one, and now What is this? Do you mean max irq source ids? assign will fail because deassign did not get freed. Maybe we can solve this by flushing wq before assign? Looks a bit fragile but may be enough - need to document well. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Tue, Jul 17, 2012 at 03:57:41PM -0600, Alex Williamson wrote: On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote: On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work) * It is now safe to release the object's resources */ eventfd_ctx_put(irqfd-eventfd); + + _irq_source_put(irqfd-source); + kfree(irqfd); } Hang on, this is a bug I think. This is done asynchronously. So this means that I can assign MAX number of irqfds, then close one, and now What is this? Do you mean max irq source ids? Yes, this is what I meant. Sorry about being unclear. assign will fail because deassign did not get freed. Maybe we can solve this by flushing wq before assign? Looks a bit fragile but may be enough - need to document well. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Wed, 2012-07-18 at 01:00 +0300, Michael S. Tsirkin wrote: On Tue, Jul 17, 2012 at 03:57:41PM -0600, Alex Williamson wrote: On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote: On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work) * It is now safe to release the object's resources */ eventfd_ctx_put(irqfd-eventfd); + + _irq_source_put(irqfd-source); + kfree(irqfd); } Hang on, this is a bug I think. This is done asynchronously. So this means that I can assign MAX number of irqfds, then close one, and now What is this? Do you mean max irq source ids? Yes, this is what I meant. Sorry about being unclear. assign will fail because deassign did not get freed. Maybe we can solve this by flushing wq before assign? Looks a bit fragile but may be enough - need to document well. Why is this fragile? We could even make it part of a retry so we don't call it unless we need to. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts
On Tue, Jul 17, 2012 at 04:16:04PM -0600, Alex Williamson wrote: On Wed, 2012-07-18 at 01:00 +0300, Michael S. Tsirkin wrote: On Tue, Jul 17, 2012 at 03:57:41PM -0600, Alex Williamson wrote: On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote: On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote: @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work) * It is now safe to release the object's resources */ eventfd_ctx_put(irqfd-eventfd); + + _irq_source_put(irqfd-source); + kfree(irqfd); } Hang on, this is a bug I think. This is done asynchronously. So this means that I can assign MAX number of irqfds, then close one, and now What is this? Do you mean max irq source ids? Yes, this is what I meant. Sorry about being unclear. assign will fail because deassign did not get freed. Maybe we can solve this by flushing wq before assign? Looks a bit fragile but may be enough - need to document well. Why is this fragile? We could even make it part of a retry so we don't call it unless we need to. It just ties in assign and deassign. Maybe it's ok - but pls add a comment explaining the whole design. -- MST -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/