Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 06:23:34PM -0300, Marcelo Tosatti wrote:
> On Wed, Jul 18, 2012 at 02:28:34PM -0600, Alex Williamson wrote:
> > > turn on lockdep to remember why I couldn't sleep there.
> > 
> > switching to a mutex results in:
> > 
> > BUG: sleeping function called from invalid context at kernel/mutex.c:269
> > in_atomic(): 1, irqs_disabled(): 0, pid: 30025, name: qemu-system-x86
> > INFO: lockdep is turned off.
> > Pid: 30025, comm: qemu-system-x86 Not tainted 3.5.0-rc4+ #109
> > Call Trace:
> >  [] __might_sleep+0xf5/0x130
> >  [] mutex_lock_nested+0x2f/0x60
> >  [] eoifd_event+0x25/0x70 [kvm]
> >  [] kvm_notify_acked_irq+0xa4/0x140 [kvm]
> >  [] ? kvm_notify_acked_irq+0x2a/0x140 [kvm]
> >  [] kvm_ioapic_update_eoi+0x84/0xf0 [kvm]
> >  [] apic_set_eoi+0x123/0x130 [kvm]
> >  [] apic_reg_write+0x388/0x670 [kvm]
> >  [] ? vcpu_enter_guest+0x32c/0x740 [kvm]
> >  [] kvm_lapic_set_eoi+0x21/0x30 [kvm]
> >  [] handle_apic_access+0x69/0x80 [kvm_intel]
> >  [] vmx_handle_exit+0xaa/0x260 [kvm_intel]
> 
> Its RCU from ack notifiers, OK. 

I'm testing a patch that moves all bitmap handling to under
pic/ioapic lock. After this we can teach kvm_set_irq to report
when a bit that is cleared/set is already clear/set, without
races.

And then no tracking will be necessary in irqfd - we can just
call kvm_set_irq(..., 0) and look at the return status.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Marcelo Tosatti
On Wed, Jul 18, 2012 at 02:28:34PM -0600, Alex Williamson wrote:
> > turn on lockdep to remember why I couldn't sleep there.
> 
> switching to a mutex results in:
> 
> BUG: sleeping function called from invalid context at kernel/mutex.c:269
> in_atomic(): 1, irqs_disabled(): 0, pid: 30025, name: qemu-system-x86
> INFO: lockdep is turned off.
> Pid: 30025, comm: qemu-system-x86 Not tainted 3.5.0-rc4+ #109
> Call Trace:
>  [] __might_sleep+0xf5/0x130
>  [] mutex_lock_nested+0x2f/0x60
>  [] eoifd_event+0x25/0x70 [kvm]
>  [] kvm_notify_acked_irq+0xa4/0x140 [kvm]
>  [] ? kvm_notify_acked_irq+0x2a/0x140 [kvm]
>  [] kvm_ioapic_update_eoi+0x84/0xf0 [kvm]
>  [] apic_set_eoi+0x123/0x130 [kvm]
>  [] apic_reg_write+0x388/0x670 [kvm]
>  [] ? vcpu_enter_guest+0x32c/0x740 [kvm]
>  [] kvm_lapic_set_eoi+0x21/0x30 [kvm]
>  [] handle_apic_access+0x69/0x80 [kvm_intel]
>  [] vmx_handle_exit+0xaa/0x260 [kvm_intel]

Its RCU from ack notifiers, OK. 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Alex Williamson
On Wed, 2012-07-18 at 13:13 -0600, Alex Williamson wrote:
> On Wed, 2012-07-18 at 13:07 -0600, Alex Williamson wrote:
> > On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote:
> > > On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > > Back to original point though current
> > > > > > > > > situation is that calling kvm_set_irq() under spinlock is not 
> > > > > > > > > worse for
> > > > > > > > > scalability than calling it not under one.
> > > > > > > > 
> > > > > > > > Yes. Still the specific use can just use an atomic flag,
> > > > > > > > lock+bool is not needed, and we won't need to undo it later.
> > > > > > > 
> > > > > > > 
> > > > > > > Actually, no, replacing it with an atomic is racy.
> > > > > > > 
> > > > > > > CPU0 (inject)   CPU1 (EOI)
> > > > > > > atomic_cmpxchg(, 0, 1)
> > > > > > > atomic_cmpxchg(, 1, 
> > > > > > > 0)
> > > > > > > kvm_set_irq(0)
> > > > > > > kvm_set_irq(1)
> > > > > > > eventfd_signal
> > > > > > > 
> > > > > > > The interrupt is now stuck on until another interrupt is injected.
> > > > > > > 
> > > > > > 
> > > > > > Well EOI somehow happened here before interrupt so it's a bug 
> > > > > > somewhere
> > > > > > else?
> > > > > 
> > > > > Interrupts can be shared.  We also can't guarantee that the guest 
> > > > > won't
> > > > > write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter 
> > > > > on
> > > > > irq source id... I'm not sure it can.
> > > > 
> > > > I guess if Avi OKs adding another kvm_set_irq under spinlock that's
> > > > the best we can do for now.
> > > 
> > > Why can't a mutex be used instead of a spinlock again?
> > 
> > eventfd_signal calls the inject function from atomic context.
> 
> Actually, that's called from a workq.  I'll have to switch it back and
> turn on lockdep to remember why I couldn't sleep there.

switching to a mutex results in:

BUG: sleeping function called from invalid context at kernel/mutex.c:269
in_atomic(): 1, irqs_disabled(): 0, pid: 30025, name: qemu-system-x86
INFO: lockdep is turned off.
Pid: 30025, comm: qemu-system-x86 Not tainted 3.5.0-rc4+ #109
Call Trace:
 [] __might_sleep+0xf5/0x130
 [] mutex_lock_nested+0x2f/0x60
 [] eoifd_event+0x25/0x70 [kvm]
 [] kvm_notify_acked_irq+0xa4/0x140 [kvm]
 [] ? kvm_notify_acked_irq+0x2a/0x140 [kvm]
 [] kvm_ioapic_update_eoi+0x84/0xf0 [kvm]
 [] apic_set_eoi+0x123/0x130 [kvm]
 [] apic_reg_write+0x388/0x670 [kvm]
 [] ? vcpu_enter_guest+0x32c/0x740 [kvm]
 [] kvm_lapic_set_eoi+0x21/0x30 [kvm]
 [] handle_apic_access+0x69/0x80 [kvm_intel]
 [] vmx_handle_exit+0xaa/0x260 [kvm_intel]


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 01:13:06PM -0600, Alex Williamson wrote:
> On Wed, 2012-07-18 at 13:07 -0600, Alex Williamson wrote:
> > On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote:
> > > On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > > Back to original point though current
> > > > > > > > > situation is that calling kvm_set_irq() under spinlock is not 
> > > > > > > > > worse for
> > > > > > > > > scalability than calling it not under one.
> > > > > > > > 
> > > > > > > > Yes. Still the specific use can just use an atomic flag,
> > > > > > > > lock+bool is not needed, and we won't need to undo it later.
> > > > > > > 
> > > > > > > 
> > > > > > > Actually, no, replacing it with an atomic is racy.
> > > > > > > 
> > > > > > > CPU0 (inject)   CPU1 (EOI)
> > > > > > > atomic_cmpxchg(, 0, 1)
> > > > > > > atomic_cmpxchg(, 1, 
> > > > > > > 0)
> > > > > > > kvm_set_irq(0)
> > > > > > > kvm_set_irq(1)
> > > > > > > eventfd_signal
> > > > > > > 
> > > > > > > The interrupt is now stuck on until another interrupt is injected.
> > > > > > > 
> > > > > > 
> > > > > > Well EOI somehow happened here before interrupt so it's a bug 
> > > > > > somewhere
> > > > > > else?
> > > > > 
> > > > > Interrupts can be shared.  We also can't guarantee that the guest 
> > > > > won't
> > > > > write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter 
> > > > > on
> > > > > irq source id... I'm not sure it can.
> > > > 
> > > > I guess if Avi OKs adding another kvm_set_irq under spinlock that's
> > > > the best we can do for now.
> > > 
> > > Why can't a mutex be used instead of a spinlock again?
> > 
> > eventfd_signal calls the inject function from atomic context.
> 
> Actually, that's called from a workq.  I'll have to switch it back and
> turn on lockdep to remember why I couldn't sleep there.


I'll try to fix kvm_set_irq so it returns 0 if level was already 0.
Then you do not need extra state.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Alex Williamson
On Wed, 2012-07-18 at 13:07 -0600, Alex Williamson wrote:
> On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote:
> > On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > Back to original point though current
> > > > > > > > situation is that calling kvm_set_irq() under spinlock is not 
> > > > > > > > worse for
> > > > > > > > scalability than calling it not under one.
> > > > > > > 
> > > > > > > Yes. Still the specific use can just use an atomic flag,
> > > > > > > lock+bool is not needed, and we won't need to undo it later.
> > > > > > 
> > > > > > 
> > > > > > Actually, no, replacing it with an atomic is racy.
> > > > > > 
> > > > > > CPU0 (inject)   CPU1 (EOI)
> > > > > > atomic_cmpxchg(, 0, 1)
> > > > > > atomic_cmpxchg(, 1, 0)
> > > > > > kvm_set_irq(0)
> > > > > > kvm_set_irq(1)
> > > > > > eventfd_signal
> > > > > > 
> > > > > > The interrupt is now stuck on until another interrupt is injected.
> > > > > > 
> > > > > 
> > > > > Well EOI somehow happened here before interrupt so it's a bug 
> > > > > somewhere
> > > > > else?
> > > > 
> > > > Interrupts can be shared.  We also can't guarantee that the guest won't
> > > > write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter on
> > > > irq source id... I'm not sure it can.
> > > 
> > > I guess if Avi OKs adding another kvm_set_irq under spinlock that's
> > > the best we can do for now.
> > 
> > Why can't a mutex be used instead of a spinlock again?
> 
> eventfd_signal calls the inject function from atomic context.

Actually, that's called from a workq.  I'll have to switch it back and
turn on lockdep to remember why I couldn't sleep there.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Alex Williamson
On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote:
> On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote:
> > > > > > > Back to original point though current
> > > > > > > situation is that calling kvm_set_irq() under spinlock is not 
> > > > > > > worse for
> > > > > > > scalability than calling it not under one.
> > > > > > 
> > > > > > Yes. Still the specific use can just use an atomic flag,
> > > > > > lock+bool is not needed, and we won't need to undo it later.
> > > > > 
> > > > > 
> > > > > Actually, no, replacing it with an atomic is racy.
> > > > > 
> > > > > CPU0 (inject)   CPU1 (EOI)
> > > > > atomic_cmpxchg(, 0, 1)
> > > > > atomic_cmpxchg(, 1, 0)
> > > > > kvm_set_irq(0)
> > > > > kvm_set_irq(1)
> > > > > eventfd_signal
> > > > > 
> > > > > The interrupt is now stuck on until another interrupt is injected.
> > > > > 
> > > > 
> > > > Well EOI somehow happened here before interrupt so it's a bug somewhere
> > > > else?
> > > 
> > > Interrupts can be shared.  We also can't guarantee that the guest won't
> > > write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter on
> > > irq source id... I'm not sure it can.
> > 
> > I guess if Avi OKs adding another kvm_set_irq under spinlock that's
> > the best we can do for now.
> 
> Why can't a mutex be used instead of a spinlock again?

eventfd_signal calls the inject function from atomic context.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2012 at 03:42:09PM -0300, Marcelo Tosatti wrote:
> On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote:
> > > > > > > Back to original point though current
> > > > > > > situation is that calling kvm_set_irq() under spinlock is not 
> > > > > > > worse for
> > > > > > > scalability than calling it not under one.
> > > > > > 
> > > > > > Yes. Still the specific use can just use an atomic flag,
> > > > > > lock+bool is not needed, and we won't need to undo it later.
> > > > > 
> > > > > 
> > > > > Actually, no, replacing it with an atomic is racy.
> > > > > 
> > > > > CPU0 (inject)   CPU1 (EOI)
> > > > > atomic_cmpxchg(, 0, 1)
> > > > > atomic_cmpxchg(, 1, 0)
> > > > > kvm_set_irq(0)
> > > > > kvm_set_irq(1)
> > > > > eventfd_signal
> > > > > 
> > > > > The interrupt is now stuck on until another interrupt is injected.
> > > > > 
> > > > 
> > > > Well EOI somehow happened here before interrupt so it's a bug somewhere
> > > > else?
> > > 
> > > Interrupts can be shared.  We also can't guarantee that the guest won't
> > > write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter on
> > > irq source id... I'm not sure it can.
> > 
> > I guess if Avi OKs adding another kvm_set_irq under spinlock that's
> > the best we can do for now.
> 
> Why can't a mutex be used instead of a spinlock again?
> 
Why was it changed at the first place? Commit says that the function is
called from unsleepable context, but no stack trace.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Marcelo Tosatti
On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote:
> > > > > > Back to original point though current
> > > > > > situation is that calling kvm_set_irq() under spinlock is not worse 
> > > > > > for
> > > > > > scalability than calling it not under one.
> > > > > 
> > > > > Yes. Still the specific use can just use an atomic flag,
> > > > > lock+bool is not needed, and we won't need to undo it later.
> > > > 
> > > > 
> > > > Actually, no, replacing it with an atomic is racy.
> > > > 
> > > > CPU0 (inject)   CPU1 (EOI)
> > > > atomic_cmpxchg(, 0, 1)
> > > > atomic_cmpxchg(, 1, 0)
> > > > kvm_set_irq(0)
> > > > kvm_set_irq(1)
> > > > eventfd_signal
> > > > 
> > > > The interrupt is now stuck on until another interrupt is injected.
> > > > 
> > > 
> > > Well EOI somehow happened here before interrupt so it's a bug somewhere
> > > else?
> > 
> > Interrupts can be shared.  We also can't guarantee that the guest won't
> > write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter on
> > irq source id... I'm not sure it can.
> 
> I guess if Avi OKs adding another kvm_set_irq under spinlock that's
> the best we can do for now.

Why can't a mutex be used instead of a spinlock again?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 09:48:01AM -0600, Alex Williamson wrote:
> On Wed, 2012-07-18 at 18:38 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jul 18, 2012 at 08:47:23AM -0600, Alex Williamson wrote:
> > > On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote:
> > > > > On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > > > > > > So as was discussed kvm_set_irq under spinlock is bad 
> > > > > > > > > > > > > for scalability
> > > > > > > > > > > > > with multiple VCPUs.  Why do we need a spinlock 
> > > > > > > > > > > > > simply to protect
> > > > > > > > > > > > > level_asserted?  Let's use an atomic test and 
> > > > > > > > > > > > > set/test and clear and the
> > > > > > > > > > > > > problem goes away.
> > > > > > > > > > > > > 
> > > > > > > > > > > > That sad reality is that for level interrupt we already 
> > > > > > > > > > > > scan all vcpus
> > > > > > > > > > > > under spinlock.
> > > > > > > > > > > 
> > > > > > > > > > > Where?
> > > > > > > > > > > 
> > > > > > > > > > ioapic
> > > > > > > > > 
> > > > > > > > > $ grep kvm_for_each_vcpu virt/kvm/ioapic.c
> > > > > > > > > $
> > > > > > > > > 
> > > > > > > > > ?
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Come on Michael. You can do better than grep and actually look 
> > > > > > > > at what
> > > > > > > > code does. The code that loops over all vcpus while delivering 
> > > > > > > > an irq is
> > > > > > > > in kvm_irq_delivery_to_apic(). Now grep for that.
> > > > > > > 
> > > > > > > Hmm, I see, it's actually done for edge if injected from ioapic 
> > > > > > > too,
> > > > > > > right?
> > > > > > > 
> > > > > > > So set_irq does a linear scan, and for each matching CPU it calls
> > > > > > > kvm_irq_delivery_to_apic which is another scan?
> > > > > > > So it's actually N^2 worst case for a broadcast?
> > > > > > 
> > > > > > No it isn't, I misread the code.
> > > > > > 
> > > > > > 
> > > > > > Anyway, maybe not trivially but this looks fixable to me: we could 
> > > > > > drop
> > > > > > the ioapic lock before calling kvm_irq_delivery_to_apic.
> > > > > > 
> > > > > May be, may be not. Just saying "lets drop lock whenever we don't feel
> > > > > like holding one" does not cut it.
> > > > 
> > > > One thing we do is set remote_irr if interrupt was injected.
> > > > I agree these things are tricky.
> > > > 
> > > > One other question:
> > > > 
> > > > static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
> > > > {
> > > > union kvm_ioapic_redirect_entry *pent;
> > > > int injected = -1;
> > > > 
> > > > pent = >redirtbl[idx];
> > > > 
> > > > if (!pent->fields.mask) {
> > > > injected = ioapic_deliver(ioapic, idx);
> > > > if (injected && pent->fields.trig_mode == 
> > > > IOAPIC_LEVEL_TRIG)
> > > > pent->fields.remote_irr = 1;
> > > > }
> > > > 
> > > > return injected;
> > > > }
> > > > 
> > > > 
> > > > This if (injected) looks a bit strange since ioapic_deliver returns
> > > > -1 if no matching destinations. Should be if (injected > 0)?
> > > > 
> > > > 
> > > > 
> > > > > Back to original point though current
> > > > > situation is that calling kvm_set_irq() under spinlock is not worse 
> > > > > for
> > > > > scalability than calling it not under one.
> > > > 
> > > > Yes. Still the specific use can just use an atomic flag,
> > > > lock+bool is not needed, and we won't need to undo it later.
> > > 
> > > 
> > > Actually, no, replacing it with an atomic is racy.
> > > 
> > > CPU0 (inject)   CPU1 (EOI)
> > > atomic_cmpxchg(, 0, 1)
> > > atomic_cmpxchg(, 1, 0)
> > > kvm_set_irq(0)
> > > kvm_set_irq(1)
> > > eventfd_signal
> > > 
> > > The interrupt is now stuck on until another interrupt is injected.
> > > 
> > 
> > Well EOI somehow happened here before interrupt so it's a bug somewhere
> > else?
> 
> Interrupts can be shared.  We also can't guarantee that the guest won't
> write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter on
> irq source id... I'm not sure it can.

I guess if Avi OKs adding another kvm_set_irq under spinlock that's
the best we can do for now.

If not, maybe we can teach kvm_set_irq to return an indication
of the previous status. Specifically kvm_irq_line_state
could do test_and_set/test_and_clear and if already set/clear
we return 0 immediately.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Alex Williamson
On Wed, 2012-07-18 at 18:38 +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 18, 2012 at 08:47:23AM -0600, Alex Williamson wrote:
> > On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote:
> > > On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote:
> > > > On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > > > > > So as was discussed kvm_set_irq under spinlock is bad 
> > > > > > > > > > > > for scalability
> > > > > > > > > > > > with multiple VCPUs.  Why do we need a spinlock simply 
> > > > > > > > > > > > to protect
> > > > > > > > > > > > level_asserted?  Let's use an atomic test and set/test 
> > > > > > > > > > > > and clear and the
> > > > > > > > > > > > problem goes away.
> > > > > > > > > > > > 
> > > > > > > > > > > That sad reality is that for level interrupt we already 
> > > > > > > > > > > scan all vcpus
> > > > > > > > > > > under spinlock.
> > > > > > > > > > 
> > > > > > > > > > Where?
> > > > > > > > > > 
> > > > > > > > > ioapic
> > > > > > > > 
> > > > > > > > $ grep kvm_for_each_vcpu virt/kvm/ioapic.c
> > > > > > > > $
> > > > > > > > 
> > > > > > > > ?
> > > > > > > > 
> > > > > > > 
> > > > > > > Come on Michael. You can do better than grep and actually look at 
> > > > > > > what
> > > > > > > code does. The code that loops over all vcpus while delivering an 
> > > > > > > irq is
> > > > > > > in kvm_irq_delivery_to_apic(). Now grep for that.
> > > > > > 
> > > > > > Hmm, I see, it's actually done for edge if injected from ioapic too,
> > > > > > right?
> > > > > > 
> > > > > > So set_irq does a linear scan, and for each matching CPU it calls
> > > > > > kvm_irq_delivery_to_apic which is another scan?
> > > > > > So it's actually N^2 worst case for a broadcast?
> > > > > 
> > > > > No it isn't, I misread the code.
> > > > > 
> > > > > 
> > > > > Anyway, maybe not trivially but this looks fixable to me: we could 
> > > > > drop
> > > > > the ioapic lock before calling kvm_irq_delivery_to_apic.
> > > > > 
> > > > May be, may be not. Just saying "lets drop lock whenever we don't feel
> > > > like holding one" does not cut it.
> > > 
> > > One thing we do is set remote_irr if interrupt was injected.
> > > I agree these things are tricky.
> > > 
> > > One other question:
> > > 
> > > static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
> > > {
> > > union kvm_ioapic_redirect_entry *pent;
> > > int injected = -1;
> > > 
> > > pent = >redirtbl[idx];
> > > 
> > > if (!pent->fields.mask) {
> > > injected = ioapic_deliver(ioapic, idx);
> > > if (injected && pent->fields.trig_mode == 
> > > IOAPIC_LEVEL_TRIG)
> > > pent->fields.remote_irr = 1;
> > > }
> > > 
> > > return injected;
> > > }
> > > 
> > > 
> > > This if (injected) looks a bit strange since ioapic_deliver returns
> > > -1 if no matching destinations. Should be if (injected > 0)?
> > > 
> > > 
> > > 
> > > > Back to original point though current
> > > > situation is that calling kvm_set_irq() under spinlock is not worse for
> > > > scalability than calling it not under one.
> > > 
> > > Yes. Still the specific use can just use an atomic flag,
> > > lock+bool is not needed, and we won't need to undo it later.
> > 
> > 
> > Actually, no, replacing it with an atomic is racy.
> > 
> > CPU0 (inject)   CPU1 (EOI)
> > atomic_cmpxchg(, 0, 1)
> > atomic_cmpxchg(, 1, 0)
> > kvm_set_irq(0)
> > kvm_set_irq(1)
> > eventfd_signal
> > 
> > The interrupt is now stuck on until another interrupt is injected.
> > 
> 
> Well EOI somehow happened here before interrupt so it's a bug somewhere
> else?

Interrupts can be shared.  We also can't guarantee that the guest won't
write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter on
irq source id... I'm not sure it can.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 08:47:23AM -0600, Alex Williamson wrote:
> On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote:
> > > On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > > > > So as was discussed kvm_set_irq under spinlock is bad for 
> > > > > > > > > > > scalability
> > > > > > > > > > > with multiple VCPUs.  Why do we need a spinlock simply to 
> > > > > > > > > > > protect
> > > > > > > > > > > level_asserted?  Let's use an atomic test and set/test 
> > > > > > > > > > > and clear and the
> > > > > > > > > > > problem goes away.
> > > > > > > > > > > 
> > > > > > > > > > That sad reality is that for level interrupt we already 
> > > > > > > > > > scan all vcpus
> > > > > > > > > > under spinlock.
> > > > > > > > > 
> > > > > > > > > Where?
> > > > > > > > > 
> > > > > > > > ioapic
> > > > > > > 
> > > > > > > $ grep kvm_for_each_vcpu virt/kvm/ioapic.c
> > > > > > > $
> > > > > > > 
> > > > > > > ?
> > > > > > > 
> > > > > > 
> > > > > > Come on Michael. You can do better than grep and actually look at 
> > > > > > what
> > > > > > code does. The code that loops over all vcpus while delivering an 
> > > > > > irq is
> > > > > > in kvm_irq_delivery_to_apic(). Now grep for that.
> > > > > 
> > > > > Hmm, I see, it's actually done for edge if injected from ioapic too,
> > > > > right?
> > > > > 
> > > > > So set_irq does a linear scan, and for each matching CPU it calls
> > > > > kvm_irq_delivery_to_apic which is another scan?
> > > > > So it's actually N^2 worst case for a broadcast?
> > > > 
> > > > No it isn't, I misread the code.
> > > > 
> > > > 
> > > > Anyway, maybe not trivially but this looks fixable to me: we could drop
> > > > the ioapic lock before calling kvm_irq_delivery_to_apic.
> > > > 
> > > May be, may be not. Just saying "lets drop lock whenever we don't feel
> > > like holding one" does not cut it.
> > 
> > One thing we do is set remote_irr if interrupt was injected.
> > I agree these things are tricky.
> > 
> > One other question:
> > 
> > static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
> > {
> > union kvm_ioapic_redirect_entry *pent;
> > int injected = -1;
> > 
> > pent = >redirtbl[idx];
> > 
> > if (!pent->fields.mask) {
> > injected = ioapic_deliver(ioapic, idx);
> > if (injected && pent->fields.trig_mode == IOAPIC_LEVEL_TRIG)
> > pent->fields.remote_irr = 1;
> > }
> > 
> > return injected;
> > }
> > 
> > 
> > This if (injected) looks a bit strange since ioapic_deliver returns
> > -1 if no matching destinations. Should be if (injected > 0)?
> > 
> > 
> > 
> > > Back to original point though current
> > > situation is that calling kvm_set_irq() under spinlock is not worse for
> > > scalability than calling it not under one.
> > 
> > Yes. Still the specific use can just use an atomic flag,
> > lock+bool is not needed, and we won't need to undo it later.
> 
> 
> Actually, no, replacing it with an atomic is racy.
> 
> CPU0 (inject)   CPU1 (EOI)
> atomic_cmpxchg(, 0, 1)
> atomic_cmpxchg(, 1, 0)
> kvm_set_irq(0)
> kvm_set_irq(1)
> eventfd_signal
> 
> The interrupt is now stuck on until another interrupt is injected.
> 

Well EOI somehow happened here before interrupt so it's a bug somewhere
else?


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Alex Williamson
On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote:
> > On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > > > So as was discussed kvm_set_irq under spinlock is bad for 
> > > > > > > > > > scalability
> > > > > > > > > > with multiple VCPUs.  Why do we need a spinlock simply to 
> > > > > > > > > > protect
> > > > > > > > > > level_asserted?  Let's use an atomic test and set/test and 
> > > > > > > > > > clear and the
> > > > > > > > > > problem goes away.
> > > > > > > > > > 
> > > > > > > > > That sad reality is that for level interrupt we already scan 
> > > > > > > > > all vcpus
> > > > > > > > > under spinlock.
> > > > > > > > 
> > > > > > > > Where?
> > > > > > > > 
> > > > > > > ioapic
> > > > > > 
> > > > > > $ grep kvm_for_each_vcpu virt/kvm/ioapic.c
> > > > > > $
> > > > > > 
> > > > > > ?
> > > > > > 
> > > > > 
> > > > > Come on Michael. You can do better than grep and actually look at what
> > > > > code does. The code that loops over all vcpus while delivering an irq 
> > > > > is
> > > > > in kvm_irq_delivery_to_apic(). Now grep for that.
> > > > 
> > > > Hmm, I see, it's actually done for edge if injected from ioapic too,
> > > > right?
> > > > 
> > > > So set_irq does a linear scan, and for each matching CPU it calls
> > > > kvm_irq_delivery_to_apic which is another scan?
> > > > So it's actually N^2 worst case for a broadcast?
> > > 
> > > No it isn't, I misread the code.
> > > 
> > > 
> > > Anyway, maybe not trivially but this looks fixable to me: we could drop
> > > the ioapic lock before calling kvm_irq_delivery_to_apic.
> > > 
> > May be, may be not. Just saying "lets drop lock whenever we don't feel
> > like holding one" does not cut it.
> 
> One thing we do is set remote_irr if interrupt was injected.
> I agree these things are tricky.
> 
> One other question:
> 
> static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
> {
> union kvm_ioapic_redirect_entry *pent;
> int injected = -1;
> 
> pent = >redirtbl[idx];
> 
> if (!pent->fields.mask) {
> injected = ioapic_deliver(ioapic, idx);
> if (injected && pent->fields.trig_mode == IOAPIC_LEVEL_TRIG)
> pent->fields.remote_irr = 1;
> }
> 
> return injected;
> }
> 
> 
> This if (injected) looks a bit strange since ioapic_deliver returns
> -1 if no matching destinations. Should be if (injected > 0)?
> 
> 
> 
> > Back to original point though current
> > situation is that calling kvm_set_irq() under spinlock is not worse for
> > scalability than calling it not under one.
> 
> Yes. Still the specific use can just use an atomic flag,
> lock+bool is not needed, and we won't need to undo it later.


Actually, no, replacing it with an atomic is racy.

CPU0 (inject)   CPU1 (EOI)
atomic_cmpxchg(, 0, 1)
atomic_cmpxchg(, 1, 0)
kvm_set_irq(0)
kvm_set_irq(1)
eventfd_signal

The interrupt is now stuck on until another interrupt is injected.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote:
> On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > > So as was discussed kvm_set_irq under spinlock is bad for 
> > > > > > > > > scalability
> > > > > > > > > with multiple VCPUs.  Why do we need a spinlock simply to 
> > > > > > > > > protect
> > > > > > > > > level_asserted?  Let's use an atomic test and set/test and 
> > > > > > > > > clear and the
> > > > > > > > > problem goes away.
> > > > > > > > > 
> > > > > > > > That sad reality is that for level interrupt we already scan 
> > > > > > > > all vcpus
> > > > > > > > under spinlock.
> > > > > > > 
> > > > > > > Where?
> > > > > > > 
> > > > > > ioapic
> > > > > 
> > > > > $ grep kvm_for_each_vcpu virt/kvm/ioapic.c
> > > > > $
> > > > > 
> > > > > ?
> > > > > 
> > > > 
> > > > Come on Michael. You can do better than grep and actually look at what
> > > > code does. The code that loops over all vcpus while delivering an irq is
> > > > in kvm_irq_delivery_to_apic(). Now grep for that.
> > > 
> > > Hmm, I see, it's actually done for edge if injected from ioapic too,
> > > right?
> > > 
> > > So set_irq does a linear scan, and for each matching CPU it calls
> > > kvm_irq_delivery_to_apic which is another scan?
> > > So it's actually N^2 worst case for a broadcast?
> > 
> > No it isn't, I misread the code.
> > 
> > 
> > Anyway, maybe not trivially but this looks fixable to me: we could drop
> > the ioapic lock before calling kvm_irq_delivery_to_apic.
> > 
> May be, may be not. Just saying "lets drop lock whenever we don't feel
> like holding one" does not cut it.

One thing we do is set remote_irr if interrupt was injected.
I agree these things are tricky.

One other question:

static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
{
union kvm_ioapic_redirect_entry *pent;
int injected = -1;

pent = >redirtbl[idx];

if (!pent->fields.mask) {
injected = ioapic_deliver(ioapic, idx);
if (injected && pent->fields.trig_mode == IOAPIC_LEVEL_TRIG)
pent->fields.remote_irr = 1;
}

return injected;
}


This if (injected) looks a bit strange since ioapic_deliver returns
-1 if no matching destinations. Should be if (injected > 0)?



> Back to original point though current
> situation is that calling kvm_set_irq() under spinlock is not worse for
> scalability than calling it not under one.

Yes. Still the specific use can just use an atomic flag,
lock+bool is not needed, and we won't need to undo it later.

> --
>   Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote:
> > > > > > > > So as was discussed kvm_set_irq under spinlock is bad for 
> > > > > > > > scalability
> > > > > > > > with multiple VCPUs.  Why do we need a spinlock simply to 
> > > > > > > > protect
> > > > > > > > level_asserted?  Let's use an atomic test and set/test and 
> > > > > > > > clear and the
> > > > > > > > problem goes away.
> > > > > > > > 
> > > > > > > That sad reality is that for level interrupt we already scan all 
> > > > > > > vcpus
> > > > > > > under spinlock.
> > > > > > 
> > > > > > Where?
> > > > > > 
> > > > > ioapic
> > > > 
> > > > $ grep kvm_for_each_vcpu virt/kvm/ioapic.c
> > > > $
> > > > 
> > > > ?
> > > > 
> > > 
> > > Come on Michael. You can do better than grep and actually look at what
> > > code does. The code that loops over all vcpus while delivering an irq is
> > > in kvm_irq_delivery_to_apic(). Now grep for that.
> > 
> > Hmm, I see, it's actually done for edge if injected from ioapic too,
> > right?
> > 
> > So set_irq does a linear scan, and for each matching CPU it calls
> > kvm_irq_delivery_to_apic which is another scan?
> > So it's actually N^2 worst case for a broadcast?
> 
> No it isn't, I misread the code.
> 
> 
> Anyway, maybe not trivially but this looks fixable to me: we could drop
> the ioapic lock before calling kvm_irq_delivery_to_apic.
> 
May be, may be not. Just saying "lets drop lock whenever we don't feel
like holding one" does not cut it. Back to original point though current
situation is that calling kvm_set_irq() under spinlock is not worse for
scalability than calling it not under one.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote:
> > > > > > > So as was discussed kvm_set_irq under spinlock is bad for 
> > > > > > > scalability
> > > > > > > with multiple VCPUs.  Why do we need a spinlock simply to protect
> > > > > > > level_asserted?  Let's use an atomic test and set/test and clear 
> > > > > > > and the
> > > > > > > problem goes away.
> > > > > > > 
> > > > > > That sad reality is that for level interrupt we already scan all 
> > > > > > vcpus
> > > > > > under spinlock.
> > > > > 
> > > > > Where?
> > > > > 
> > > > ioapic
> > > 
> > > $ grep kvm_for_each_vcpu virt/kvm/ioapic.c
> > > $
> > > 
> > > ?
> > > 
> > 
> > Come on Michael. You can do better than grep and actually look at what
> > code does. The code that loops over all vcpus while delivering an irq is
> > in kvm_irq_delivery_to_apic(). Now grep for that.
> 
> Hmm, I see, it's actually done for edge if injected from ioapic too,
> right?
> 
> So set_irq does a linear scan, and for each matching CPU it calls
> kvm_irq_delivery_to_apic which is another scan?
> So it's actually N^2 worst case for a broadcast?

No it isn't, I misread the code.


Anyway, maybe not trivially but this looks fixable to me: we could drop
the ioapic lock before calling kvm_irq_delivery_to_apic.

> > --
> > Gleb.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 01:55:30PM +0300, Gleb Natapov wrote:
> On Wed, Jul 18, 2012 at 01:53:11PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jul 18, 2012 at 01:49:06PM +0300, Gleb Natapov wrote:
> > > On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote:
> > > > > On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote:
> > > > > > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
> > > > > > > In order to inject a level interrupt from an external source 
> > > > > > > using an
> > > > > > > irqfd, we need to allocate a new irq_source_id.  This allows us to
> > > > > > > assert and (later) de-assert an interrupt line independently from
> > > > > > > users of KVM_IRQ_LINE and avoid lost interrupts.
> > > > > > > 
> > > > > > > We also add what may appear like a bit of excessive infrastructure
> > > > > > > around an object for storing this irq_source_id.  However, notice
> > > > > > > that we only provide a way to assert the interrupt here.  A 
> > > > > > > follow-on
> > > > > > > interface will make use of the same irq_source_id to allow 
> > > > > > > de-assert.
> > > > > > > 
> > > > > > > Signed-off-by: Alex Williamson 
> > > > > > > ---
> > > > > > > 
> > > > > > >  Documentation/virtual/kvm/api.txt |6 ++
> > > > > > >  arch/x86/kvm/x86.c|1 
> > > > > > >  include/linux/kvm.h   |3 +
> > > > > > >  virt/kvm/eventfd.c|  114 
> > > > > > > -
> > > > > > >  4 files changed, 120 insertions(+), 4 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/Documentation/virtual/kvm/api.txt 
> > > > > > > b/Documentation/virtual/kvm/api.txt
> > > > > > > index 100acde..c7267d5 100644
> > > > > > > --- a/Documentation/virtual/kvm/api.txt
> > > > > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > > > > @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  
> > > > > > > The irqfd is removed using
> > > > > > >  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > > > > > >  and kvm_irqfd.gsi.
> > > > > > >  
> > > > > > > +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a 
> > > > > > > level
> > > > > > > +triggered interrupt.  In this case a new irqchip input is 
> > > > > > > allocated
> > > > > > > +which is logically OR'd with other inputs allowing multiple 
> > > > > > > sources
> > > > > > > +to independently assert level interrupts.  The 
> > > > > > > KVM_IRQFD_FLAG_LEVEL
> > > > > > > +is only necessary on setup, teardown is identical to that above.
> > > > > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > > > > > >  
> > > > > > >  5. The kvm_run structure
> > > > > > >  
> > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > > index a01a424..80bed07 100644
> > > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > > @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > > > > >   case KVM_CAP_GET_TSC_KHZ:
> > > > > > >   case KVM_CAP_PCI_2_3:
> > > > > > >   case KVM_CAP_KVMCLOCK_CTRL:
> > > > > > > + case KVM_CAP_IRQFD_LEVEL:
> > > > > > >   r = 1;
> > > > > > >   break;
> > > > > > >   case KVM_CAP_COALESCED_MMIO:
> > > > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > > > > > index 2ce09aa..b2e6e4f 100644
> > > > > > > --- a/include/linux/kvm.h
> > > > > > > +++ b/include/linux/kvm.h
> > > > > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> > > > > > >  #define KVM_CAP_PPC_GET_SMMU_INFO 78
> > > > > > >  #define KVM_CAP_S390_COW 79
> > > > > > >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> > > > > > > +#define KVM_CAP_IRQFD_LEVEL 81
> > > > > > >  
> > > > > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > > > > >  
> > > > > > > @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
> > > > > > >  #endif
> > > > > > >  
> > > > > > >  #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> > > > > > > +/* Available with KVM_CAP_IRQFD_LEVEL */
> > > > > > > +#define KVM_IRQFD_FLAG_LEVEL (1 << 1)
> > > > > > >  
> > > > > > >  struct kvm_irqfd {
> > > > > > >   __u32 fd;
> > > > > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > > > > > > index 7d7e2aa..ecdbfea 100644
> > > > > > > --- a/virt/kvm/eventfd.c
> > > > > > > +++ b/virt/kvm/eventfd.c
> > > > > > > @@ -36,6 +36,68 @@
> > > > > > >  #include "iodev.h"
> > > > > > >  
> > > > > > >  /*
> > > > > > > + * An irq_source_id can be created from KVM_IRQFD for level 
> > > > > > > interrupt
> > > > > > > + * injections and shared with other interfaces for EOI or 
> > > > > > > de-assert.
> > > > > > > + * Create an object with reference counting to make it easy to 
> > > > > > > use.
> > > > > > > + */
> > > > > > > +struct _irq_source {
> > > > > > > + int id; /* the IRQ source ID */
> > > > > > > + bool level_asserted; /* Track assertion state and protect with 
> > > > > > > 

Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2012 at 01:53:11PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 18, 2012 at 01:49:06PM +0300, Gleb Natapov wrote:
> > On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote:
> > > On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote:
> > > > On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote:
> > > > > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
> > > > > > In order to inject a level interrupt from an external source using 
> > > > > > an
> > > > > > irqfd, we need to allocate a new irq_source_id.  This allows us to
> > > > > > assert and (later) de-assert an interrupt line independently from
> > > > > > users of KVM_IRQ_LINE and avoid lost interrupts.
> > > > > > 
> > > > > > We also add what may appear like a bit of excessive infrastructure
> > > > > > around an object for storing this irq_source_id.  However, notice
> > > > > > that we only provide a way to assert the interrupt here.  A 
> > > > > > follow-on
> > > > > > interface will make use of the same irq_source_id to allow 
> > > > > > de-assert.
> > > > > > 
> > > > > > Signed-off-by: Alex Williamson 
> > > > > > ---
> > > > > > 
> > > > > >  Documentation/virtual/kvm/api.txt |6 ++
> > > > > >  arch/x86/kvm/x86.c|1 
> > > > > >  include/linux/kvm.h   |3 +
> > > > > >  virt/kvm/eventfd.c|  114 
> > > > > > -
> > > > > >  4 files changed, 120 insertions(+), 4 deletions(-)
> > > > > > 
> > > > > > diff --git a/Documentation/virtual/kvm/api.txt 
> > > > > > b/Documentation/virtual/kvm/api.txt
> > > > > > index 100acde..c7267d5 100644
> > > > > > --- a/Documentation/virtual/kvm/api.txt
> > > > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > > > @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  The 
> > > > > > irqfd is removed using
> > > > > >  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > > > > >  and kvm_irqfd.gsi.
> > > > > >  
> > > > > > +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a 
> > > > > > level
> > > > > > +triggered interrupt.  In this case a new irqchip input is allocated
> > > > > > +which is logically OR'd with other inputs allowing multiple sources
> > > > > > +to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
> > > > > > +is only necessary on setup, teardown is identical to that above.
> > > > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > > > > >  
> > > > > >  5. The kvm_run structure
> > > > > >  
> > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > > index a01a424..80bed07 100644
> > > > > > --- a/arch/x86/kvm/x86.c
> > > > > > +++ b/arch/x86/kvm/x86.c
> > > > > > @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > > > > case KVM_CAP_GET_TSC_KHZ:
> > > > > > case KVM_CAP_PCI_2_3:
> > > > > > case KVM_CAP_KVMCLOCK_CTRL:
> > > > > > +   case KVM_CAP_IRQFD_LEVEL:
> > > > > > r = 1;
> > > > > > break;
> > > > > > case KVM_CAP_COALESCED_MMIO:
> > > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > > > > index 2ce09aa..b2e6e4f 100644
> > > > > > --- a/include/linux/kvm.h
> > > > > > +++ b/include/linux/kvm.h
> > > > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> > > > > >  #define KVM_CAP_PPC_GET_SMMU_INFO 78
> > > > > >  #define KVM_CAP_S390_COW 79
> > > > > >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> > > > > > +#define KVM_CAP_IRQFD_LEVEL 81
> > > > > >  
> > > > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > > > >  
> > > > > > @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
> > > > > >  #endif
> > > > > >  
> > > > > >  #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> > > > > > +/* Available with KVM_CAP_IRQFD_LEVEL */
> > > > > > +#define KVM_IRQFD_FLAG_LEVEL (1 << 1)
> > > > > >  
> > > > > >  struct kvm_irqfd {
> > > > > > __u32 fd;
> > > > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > > > > > index 7d7e2aa..ecdbfea 100644
> > > > > > --- a/virt/kvm/eventfd.c
> > > > > > +++ b/virt/kvm/eventfd.c
> > > > > > @@ -36,6 +36,68 @@
> > > > > >  #include "iodev.h"
> > > > > >  
> > > > > >  /*
> > > > > > + * An irq_source_id can be created from KVM_IRQFD for level 
> > > > > > interrupt
> > > > > > + * injections and shared with other interfaces for EOI or 
> > > > > > de-assert.
> > > > > > + * Create an object with reference counting to make it easy to use.
> > > > > > + */
> > > > > > +struct _irq_source {
> > > > > > +   int id; /* the IRQ source ID */
> > > > > > +   bool level_asserted; /* Track assertion state and protect with 
> > > > > > lock */
> > > > > > +   spinlock_t lock; /* to avoid unnecessary re-assert/spurious 
> > > > > > eoi. */
> > > > > > +   struct kvm *kvm;
> > > > > > +   struct kref kref;
> > > > > > +};
> > > > > > +
> > > > > > +static void _irq_source_release(struct kref *kref)
> > > > > > +{
> > > > > > +   

Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 01:49:06PM +0300, Gleb Natapov wrote:
> On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote:
> > On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote:
> > > On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
> > > > > In order to inject a level interrupt from an external source using an
> > > > > irqfd, we need to allocate a new irq_source_id.  This allows us to
> > > > > assert and (later) de-assert an interrupt line independently from
> > > > > users of KVM_IRQ_LINE and avoid lost interrupts.
> > > > > 
> > > > > We also add what may appear like a bit of excessive infrastructure
> > > > > around an object for storing this irq_source_id.  However, notice
> > > > > that we only provide a way to assert the interrupt here.  A follow-on
> > > > > interface will make use of the same irq_source_id to allow de-assert.
> > > > > 
> > > > > Signed-off-by: Alex Williamson 
> > > > > ---
> > > > > 
> > > > >  Documentation/virtual/kvm/api.txt |6 ++
> > > > >  arch/x86/kvm/x86.c|1 
> > > > >  include/linux/kvm.h   |3 +
> > > > >  virt/kvm/eventfd.c|  114 
> > > > > -
> > > > >  4 files changed, 120 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/Documentation/virtual/kvm/api.txt 
> > > > > b/Documentation/virtual/kvm/api.txt
> > > > > index 100acde..c7267d5 100644
> > > > > --- a/Documentation/virtual/kvm/api.txt
> > > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > > @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  The 
> > > > > irqfd is removed using
> > > > >  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > > > >  and kvm_irqfd.gsi.
> > > > >  
> > > > > +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level
> > > > > +triggered interrupt.  In this case a new irqchip input is allocated
> > > > > +which is logically OR'd with other inputs allowing multiple sources
> > > > > +to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
> > > > > +is only necessary on setup, teardown is identical to that above.
> > > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > > > >  
> > > > >  5. The kvm_run structure
> > > > >  
> > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > index a01a424..80bed07 100644
> > > > > --- a/arch/x86/kvm/x86.c
> > > > > +++ b/arch/x86/kvm/x86.c
> > > > > @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > > >   case KVM_CAP_GET_TSC_KHZ:
> > > > >   case KVM_CAP_PCI_2_3:
> > > > >   case KVM_CAP_KVMCLOCK_CTRL:
> > > > > + case KVM_CAP_IRQFD_LEVEL:
> > > > >   r = 1;
> > > > >   break;
> > > > >   case KVM_CAP_COALESCED_MMIO:
> > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > > > index 2ce09aa..b2e6e4f 100644
> > > > > --- a/include/linux/kvm.h
> > > > > +++ b/include/linux/kvm.h
> > > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> > > > >  #define KVM_CAP_PPC_GET_SMMU_INFO 78
> > > > >  #define KVM_CAP_S390_COW 79
> > > > >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> > > > > +#define KVM_CAP_IRQFD_LEVEL 81
> > > > >  
> > > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > > >  
> > > > > @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
> > > > >  #endif
> > > > >  
> > > > >  #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> > > > > +/* Available with KVM_CAP_IRQFD_LEVEL */
> > > > > +#define KVM_IRQFD_FLAG_LEVEL (1 << 1)
> > > > >  
> > > > >  struct kvm_irqfd {
> > > > >   __u32 fd;
> > > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > > > > index 7d7e2aa..ecdbfea 100644
> > > > > --- a/virt/kvm/eventfd.c
> > > > > +++ b/virt/kvm/eventfd.c
> > > > > @@ -36,6 +36,68 @@
> > > > >  #include "iodev.h"
> > > > >  
> > > > >  /*
> > > > > + * An irq_source_id can be created from KVM_IRQFD for level interrupt
> > > > > + * injections and shared with other interfaces for EOI or de-assert.
> > > > > + * Create an object with reference counting to make it easy to use.
> > > > > + */
> > > > > +struct _irq_source {
> > > > > + int id; /* the IRQ source ID */
> > > > > + bool level_asserted; /* Track assertion state and protect with 
> > > > > lock */
> > > > > + spinlock_t lock; /* to avoid unnecessary re-assert/spurious 
> > > > > eoi. */
> > > > > + struct kvm *kvm;
> > > > > + struct kref kref;
> > > > > +};
> > > > > +
> > > > > +static void _irq_source_release(struct kref *kref)
> > > > > +{
> > > > > + struct _irq_source *source;
> > > > > +
> > > > > + source = container_of(kref, struct _irq_source, kref);
> > > > > +
> > > > > + /* This also de-asserts */
> > > > > + kvm_free_irq_source_id(source->kvm, source->id);
> > > > > + kfree(source);
> > > > > +}
> > > > > +
> > > > > +static void 

Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote:
> On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote:
> > On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote:
> > > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
> > > > In order to inject a level interrupt from an external source using an
> > > > irqfd, we need to allocate a new irq_source_id.  This allows us to
> > > > assert and (later) de-assert an interrupt line independently from
> > > > users of KVM_IRQ_LINE and avoid lost interrupts.
> > > > 
> > > > We also add what may appear like a bit of excessive infrastructure
> > > > around an object for storing this irq_source_id.  However, notice
> > > > that we only provide a way to assert the interrupt here.  A follow-on
> > > > interface will make use of the same irq_source_id to allow de-assert.
> > > > 
> > > > Signed-off-by: Alex Williamson 
> > > > ---
> > > > 
> > > >  Documentation/virtual/kvm/api.txt |6 ++
> > > >  arch/x86/kvm/x86.c|1 
> > > >  include/linux/kvm.h   |3 +
> > > >  virt/kvm/eventfd.c|  114 
> > > > -
> > > >  4 files changed, 120 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/Documentation/virtual/kvm/api.txt 
> > > > b/Documentation/virtual/kvm/api.txt
> > > > index 100acde..c7267d5 100644
> > > > --- a/Documentation/virtual/kvm/api.txt
> > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  The 
> > > > irqfd is removed using
> > > >  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > > >  and kvm_irqfd.gsi.
> > > >  
> > > > +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level
> > > > +triggered interrupt.  In this case a new irqchip input is allocated
> > > > +which is logically OR'd with other inputs allowing multiple sources
> > > > +to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
> > > > +is only necessary on setup, teardown is identical to that above.
> > > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > > >  
> > > >  5. The kvm_run structure
> > > >  
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index a01a424..80bed07 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > > case KVM_CAP_GET_TSC_KHZ:
> > > > case KVM_CAP_PCI_2_3:
> > > > case KVM_CAP_KVMCLOCK_CTRL:
> > > > +   case KVM_CAP_IRQFD_LEVEL:
> > > > r = 1;
> > > > break;
> > > > case KVM_CAP_COALESCED_MMIO:
> > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > > index 2ce09aa..b2e6e4f 100644
> > > > --- a/include/linux/kvm.h
> > > > +++ b/include/linux/kvm.h
> > > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> > > >  #define KVM_CAP_PPC_GET_SMMU_INFO 78
> > > >  #define KVM_CAP_S390_COW 79
> > > >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> > > > +#define KVM_CAP_IRQFD_LEVEL 81
> > > >  
> > > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > >  
> > > > @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
> > > >  #endif
> > > >  
> > > >  #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> > > > +/* Available with KVM_CAP_IRQFD_LEVEL */
> > > > +#define KVM_IRQFD_FLAG_LEVEL (1 << 1)
> > > >  
> > > >  struct kvm_irqfd {
> > > > __u32 fd;
> > > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > > > index 7d7e2aa..ecdbfea 100644
> > > > --- a/virt/kvm/eventfd.c
> > > > +++ b/virt/kvm/eventfd.c
> > > > @@ -36,6 +36,68 @@
> > > >  #include "iodev.h"
> > > >  
> > > >  /*
> > > > + * An irq_source_id can be created from KVM_IRQFD for level interrupt
> > > > + * injections and shared with other interfaces for EOI or de-assert.
> > > > + * Create an object with reference counting to make it easy to use.
> > > > + */
> > > > +struct _irq_source {
> > > > +   int id; /* the IRQ source ID */
> > > > +   bool level_asserted; /* Track assertion state and protect with 
> > > > lock */
> > > > +   spinlock_t lock; /* to avoid unnecessary re-assert/spurious 
> > > > eoi. */
> > > > +   struct kvm *kvm;
> > > > +   struct kref kref;
> > > > +};
> > > > +
> > > > +static void _irq_source_release(struct kref *kref)
> > > > +{
> > > > +   struct _irq_source *source;
> > > > +
> > > > +   source = container_of(kref, struct _irq_source, kref);
> > > > +
> > > > +   /* This also de-asserts */
> > > > +   kvm_free_irq_source_id(source->kvm, source->id);
> > > > +   kfree(source);
> > > > +}
> > > > +
> > > > +static void _irq_source_put(struct _irq_source *source)
> > > > +{
> > > > +   if (source)
> > > > +   kref_put(>kref, _irq_source_release);
> > > > +}
> > > > +
> > > > +static struct _irq_source *__attribute__ ((used)) /* white lie for now 
> > > > */
> 

Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote:
> On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
> > > In order to inject a level interrupt from an external source using an
> > > irqfd, we need to allocate a new irq_source_id.  This allows us to
> > > assert and (later) de-assert an interrupt line independently from
> > > users of KVM_IRQ_LINE and avoid lost interrupts.
> > > 
> > > We also add what may appear like a bit of excessive infrastructure
> > > around an object for storing this irq_source_id.  However, notice
> > > that we only provide a way to assert the interrupt here.  A follow-on
> > > interface will make use of the same irq_source_id to allow de-assert.
> > > 
> > > Signed-off-by: Alex Williamson 
> > > ---
> > > 
> > >  Documentation/virtual/kvm/api.txt |6 ++
> > >  arch/x86/kvm/x86.c|1 
> > >  include/linux/kvm.h   |3 +
> > >  virt/kvm/eventfd.c|  114 
> > > -
> > >  4 files changed, 120 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/Documentation/virtual/kvm/api.txt 
> > > b/Documentation/virtual/kvm/api.txt
> > > index 100acde..c7267d5 100644
> > > --- a/Documentation/virtual/kvm/api.txt
> > > +++ b/Documentation/virtual/kvm/api.txt
> > > @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  The irqfd 
> > > is removed using
> > >  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> > >  and kvm_irqfd.gsi.
> > >  
> > > +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level
> > > +triggered interrupt.  In this case a new irqchip input is allocated
> > > +which is logically OR'd with other inputs allowing multiple sources
> > > +to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
> > > +is only necessary on setup, teardown is identical to that above.
> > > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> > >  
> > >  5. The kvm_run structure
> > >  
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index a01a424..80bed07 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > >   case KVM_CAP_GET_TSC_KHZ:
> > >   case KVM_CAP_PCI_2_3:
> > >   case KVM_CAP_KVMCLOCK_CTRL:
> > > + case KVM_CAP_IRQFD_LEVEL:
> > >   r = 1;
> > >   break;
> > >   case KVM_CAP_COALESCED_MMIO:
> > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > index 2ce09aa..b2e6e4f 100644
> > > --- a/include/linux/kvm.h
> > > +++ b/include/linux/kvm.h
> > > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> > >  #define KVM_CAP_PPC_GET_SMMU_INFO 78
> > >  #define KVM_CAP_S390_COW 79
> > >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> > > +#define KVM_CAP_IRQFD_LEVEL 81
> > >  
> > >  #ifdef KVM_CAP_IRQ_ROUTING
> > >  
> > > @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
> > >  #endif
> > >  
> > >  #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> > > +/* Available with KVM_CAP_IRQFD_LEVEL */
> > > +#define KVM_IRQFD_FLAG_LEVEL (1 << 1)
> > >  
> > >  struct kvm_irqfd {
> > >   __u32 fd;
> > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > > index 7d7e2aa..ecdbfea 100644
> > > --- a/virt/kvm/eventfd.c
> > > +++ b/virt/kvm/eventfd.c
> > > @@ -36,6 +36,68 @@
> > >  #include "iodev.h"
> > >  
> > >  /*
> > > + * An irq_source_id can be created from KVM_IRQFD for level interrupt
> > > + * injections and shared with other interfaces for EOI or de-assert.
> > > + * Create an object with reference counting to make it easy to use.
> > > + */
> > > +struct _irq_source {
> > > + int id; /* the IRQ source ID */
> > > + bool level_asserted; /* Track assertion state and protect with lock */
> > > + spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */
> > > + struct kvm *kvm;
> > > + struct kref kref;
> > > +};
> > > +
> > > +static void _irq_source_release(struct kref *kref)
> > > +{
> > > + struct _irq_source *source;
> > > +
> > > + source = container_of(kref, struct _irq_source, kref);
> > > +
> > > + /* This also de-asserts */
> > > + kvm_free_irq_source_id(source->kvm, source->id);
> > > + kfree(source);
> > > +}
> > > +
> > > +static void _irq_source_put(struct _irq_source *source)
> > > +{
> > > + if (source)
> > > + kref_put(>kref, _irq_source_release);
> > > +}
> > > +
> > > +static struct _irq_source *__attribute__ ((used)) /* white lie for now */
> > > +_irq_source_get(struct _irq_source *source)
> > > +{
> > > + if (source)
> > > + kref_get(>kref);
> > > +
> > > + return source;
> > > +}
> > > +
> > > +static struct _irq_source *_irq_source_alloc(struct kvm *kvm)
> > > +{
> > > + struct _irq_source *source;
> > > + int id;
> > > +
> > > + source = kzalloc(sizeof(*source), GFP_KERNEL);
> > > + if (!source)
> > > + return ERR_PTR(-ENOMEM);
> > > +
> > > + id = 

Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote:
> On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
> > In order to inject a level interrupt from an external source using an
> > irqfd, we need to allocate a new irq_source_id.  This allows us to
> > assert and (later) de-assert an interrupt line independently from
> > users of KVM_IRQ_LINE and avoid lost interrupts.
> > 
> > We also add what may appear like a bit of excessive infrastructure
> > around an object for storing this irq_source_id.  However, notice
> > that we only provide a way to assert the interrupt here.  A follow-on
> > interface will make use of the same irq_source_id to allow de-assert.
> > 
> > Signed-off-by: Alex Williamson 
> > ---
> > 
> >  Documentation/virtual/kvm/api.txt |6 ++
> >  arch/x86/kvm/x86.c|1 
> >  include/linux/kvm.h   |3 +
> >  virt/kvm/eventfd.c|  114 
> > -
> >  4 files changed, 120 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt 
> > b/Documentation/virtual/kvm/api.txt
> > index 100acde..c7267d5 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  The irqfd is 
> > removed using
> >  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
> >  and kvm_irqfd.gsi.
> >  
> > +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level
> > +triggered interrupt.  In this case a new irqchip input is allocated
> > +which is logically OR'd with other inputs allowing multiple sources
> > +to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
> > +is only necessary on setup, teardown is identical to that above.
> > +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
> >  
> >  5. The kvm_run structure
> >  
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index a01a424..80bed07 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > case KVM_CAP_GET_TSC_KHZ:
> > case KVM_CAP_PCI_2_3:
> > case KVM_CAP_KVMCLOCK_CTRL:
> > +   case KVM_CAP_IRQFD_LEVEL:
> > r = 1;
> > break;
> > case KVM_CAP_COALESCED_MMIO:
> > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > index 2ce09aa..b2e6e4f 100644
> > --- a/include/linux/kvm.h
> > +++ b/include/linux/kvm.h
> > @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
> >  #define KVM_CAP_PPC_GET_SMMU_INFO 78
> >  #define KVM_CAP_S390_COW 79
> >  #define KVM_CAP_PPC_ALLOC_HTAB 80
> > +#define KVM_CAP_IRQFD_LEVEL 81
> >  
> >  #ifdef KVM_CAP_IRQ_ROUTING
> >  
> > @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
> >  #endif
> >  
> >  #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> > +/* Available with KVM_CAP_IRQFD_LEVEL */
> > +#define KVM_IRQFD_FLAG_LEVEL (1 << 1)
> >  
> >  struct kvm_irqfd {
> > __u32 fd;
> > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > index 7d7e2aa..ecdbfea 100644
> > --- a/virt/kvm/eventfd.c
> > +++ b/virt/kvm/eventfd.c
> > @@ -36,6 +36,68 @@
> >  #include "iodev.h"
> >  
> >  /*
> > + * An irq_source_id can be created from KVM_IRQFD for level interrupt
> > + * injections and shared with other interfaces for EOI or de-assert.
> > + * Create an object with reference counting to make it easy to use.
> > + */
> > +struct _irq_source {
> > +   int id; /* the IRQ source ID */
> > +   bool level_asserted; /* Track assertion state and protect with lock */
> > +   spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */
> > +   struct kvm *kvm;
> > +   struct kref kref;
> > +};
> > +
> > +static void _irq_source_release(struct kref *kref)
> > +{
> > +   struct _irq_source *source;
> > +
> > +   source = container_of(kref, struct _irq_source, kref);
> > +
> > +   /* This also de-asserts */
> > +   kvm_free_irq_source_id(source->kvm, source->id);
> > +   kfree(source);
> > +}
> > +
> > +static void _irq_source_put(struct _irq_source *source)
> > +{
> > +   if (source)
> > +   kref_put(>kref, _irq_source_release);
> > +}
> > +
> > +static struct _irq_source *__attribute__ ((used)) /* white lie for now */
> > +_irq_source_get(struct _irq_source *source)
> > +{
> > +   if (source)
> > +   kref_get(>kref);
> > +
> > +   return source;
> > +}
> > +
> > +static struct _irq_source *_irq_source_alloc(struct kvm *kvm)
> > +{
> > +   struct _irq_source *source;
> > +   int id;
> > +
> > +   source = kzalloc(sizeof(*source), GFP_KERNEL);
> > +   if (!source)
> > +   return ERR_PTR(-ENOMEM);
> > +
> > +   id = kvm_request_irq_source_id(kvm);
> > +   if (id < 0) {
> > +   kfree(source);
> > +   return ERR_PTR(id);
> > +   }
> > +
> > +   kref_init(>kref);
> > +   spin_lock_init(>lock);
> > +   source->kvm = kvm;
> > +   source->id = id;
> > +
> > +   return 

Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
> In order to inject a level interrupt from an external source using an
> irqfd, we need to allocate a new irq_source_id.  This allows us to
> assert and (later) de-assert an interrupt line independently from
> users of KVM_IRQ_LINE and avoid lost interrupts.
> 
> We also add what may appear like a bit of excessive infrastructure
> around an object for storing this irq_source_id.  However, notice
> that we only provide a way to assert the interrupt here.  A follow-on
> interface will make use of the same irq_source_id to allow de-assert.
> 
> Signed-off-by: Alex Williamson 
> ---
> 
>  Documentation/virtual/kvm/api.txt |6 ++
>  arch/x86/kvm/x86.c|1 
>  include/linux/kvm.h   |3 +
>  virt/kvm/eventfd.c|  114 
> -
>  4 files changed, 120 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 100acde..c7267d5 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  The irqfd is 
> removed using
>  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
>  and kvm_irqfd.gsi.
>  
> +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level
> +triggered interrupt.  In this case a new irqchip input is allocated
> +which is logically OR'd with other inputs allowing multiple sources
> +to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
> +is only necessary on setup, teardown is identical to that above.
> +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
>  
>  5. The kvm_run structure
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a01a424..80bed07 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
>   case KVM_CAP_GET_TSC_KHZ:
>   case KVM_CAP_PCI_2_3:
>   case KVM_CAP_KVMCLOCK_CTRL:
> + case KVM_CAP_IRQFD_LEVEL:
>   r = 1;
>   break;
>   case KVM_CAP_COALESCED_MMIO:
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 2ce09aa..b2e6e4f 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
>  #define KVM_CAP_PPC_GET_SMMU_INFO 78
>  #define KVM_CAP_S390_COW 79
>  #define KVM_CAP_PPC_ALLOC_HTAB 80
> +#define KVM_CAP_IRQFD_LEVEL 81
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
>  #endif
>  
>  #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
> +/* Available with KVM_CAP_IRQFD_LEVEL */
> +#define KVM_IRQFD_FLAG_LEVEL (1 << 1)
>  
>  struct kvm_irqfd {
>   __u32 fd;
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 7d7e2aa..ecdbfea 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -36,6 +36,68 @@
>  #include "iodev.h"
>  
>  /*
> + * An irq_source_id can be created from KVM_IRQFD for level interrupt
> + * injections and shared with other interfaces for EOI or de-assert.
> + * Create an object with reference counting to make it easy to use.
> + */
> +struct _irq_source {
> + int id; /* the IRQ source ID */
> + bool level_asserted; /* Track assertion state and protect with lock */
> + spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */
> + struct kvm *kvm;
> + struct kref kref;
> +};
> +
> +static void _irq_source_release(struct kref *kref)
> +{
> + struct _irq_source *source;
> +
> + source = container_of(kref, struct _irq_source, kref);
> +
> + /* This also de-asserts */
> + kvm_free_irq_source_id(source->kvm, source->id);
> + kfree(source);
> +}
> +
> +static void _irq_source_put(struct _irq_source *source)
> +{
> + if (source)
> + kref_put(>kref, _irq_source_release);
> +}
> +
> +static struct _irq_source *__attribute__ ((used)) /* white lie for now */
> +_irq_source_get(struct _irq_source *source)
> +{
> + if (source)
> + kref_get(>kref);
> +
> + return source;
> +}
> +
> +static struct _irq_source *_irq_source_alloc(struct kvm *kvm)
> +{
> + struct _irq_source *source;
> + int id;
> +
> + source = kzalloc(sizeof(*source), GFP_KERNEL);
> + if (!source)
> + return ERR_PTR(-ENOMEM);
> +
> + id = kvm_request_irq_source_id(kvm);
> + if (id < 0) {
> + kfree(source);
> + return ERR_PTR(id);
> + }
> +
> + kref_init(>kref);
> + spin_lock_init(>lock);
> + source->kvm = kvm;
> + source->id = id;
> +
> + return source;
> +}
> +
> +/*
>   * 
>   * irqfd: Allows an fd to be used to inject an interrupt to the guest
>   *
> @@ -52,6 +114,8 @@ struct _irqfd {
>   /* Used for level IRQ fast-path */
>   int gsi;

Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
 In order to inject a level interrupt from an external source using an
 irqfd, we need to allocate a new irq_source_id.  This allows us to
 assert and (later) de-assert an interrupt line independently from
 users of KVM_IRQ_LINE and avoid lost interrupts.
 
 We also add what may appear like a bit of excessive infrastructure
 around an object for storing this irq_source_id.  However, notice
 that we only provide a way to assert the interrupt here.  A follow-on
 interface will make use of the same irq_source_id to allow de-assert.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
  Documentation/virtual/kvm/api.txt |6 ++
  arch/x86/kvm/x86.c|1 
  include/linux/kvm.h   |3 +
  virt/kvm/eventfd.c|  114 
 -
  4 files changed, 120 insertions(+), 4 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 100acde..c7267d5 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  The irqfd is 
 removed using
  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
  and kvm_irqfd.gsi.
  
 +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level
 +triggered interrupt.  In this case a new irqchip input is allocated
 +which is logically OR'd with other inputs allowing multiple sources
 +to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
 +is only necessary on setup, teardown is identical to that above.
 +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
  
  5. The kvm_run structure
  
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index a01a424..80bed07 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
   case KVM_CAP_GET_TSC_KHZ:
   case KVM_CAP_PCI_2_3:
   case KVM_CAP_KVMCLOCK_CTRL:
 + case KVM_CAP_IRQFD_LEVEL:
   r = 1;
   break;
   case KVM_CAP_COALESCED_MMIO:
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index 2ce09aa..b2e6e4f 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_PPC_GET_SMMU_INFO 78
  #define KVM_CAP_S390_COW 79
  #define KVM_CAP_PPC_ALLOC_HTAB 80
 +#define KVM_CAP_IRQFD_LEVEL 81
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
  #endif
  
  #define KVM_IRQFD_FLAG_DEASSIGN (1  0)
 +/* Available with KVM_CAP_IRQFD_LEVEL */
 +#define KVM_IRQFD_FLAG_LEVEL (1  1)
  
  struct kvm_irqfd {
   __u32 fd;
 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index 7d7e2aa..ecdbfea 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -36,6 +36,68 @@
  #include iodev.h
  
  /*
 + * An irq_source_id can be created from KVM_IRQFD for level interrupt
 + * injections and shared with other interfaces for EOI or de-assert.
 + * Create an object with reference counting to make it easy to use.
 + */
 +struct _irq_source {
 + int id; /* the IRQ source ID */
 + bool level_asserted; /* Track assertion state and protect with lock */
 + spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */
 + struct kvm *kvm;
 + struct kref kref;
 +};
 +
 +static void _irq_source_release(struct kref *kref)
 +{
 + struct _irq_source *source;
 +
 + source = container_of(kref, struct _irq_source, kref);
 +
 + /* This also de-asserts */
 + kvm_free_irq_source_id(source-kvm, source-id);
 + kfree(source);
 +}
 +
 +static void _irq_source_put(struct _irq_source *source)
 +{
 + if (source)
 + kref_put(source-kref, _irq_source_release);
 +}
 +
 +static struct _irq_source *__attribute__ ((used)) /* white lie for now */
 +_irq_source_get(struct _irq_source *source)
 +{
 + if (source)
 + kref_get(source-kref);
 +
 + return source;
 +}
 +
 +static struct _irq_source *_irq_source_alloc(struct kvm *kvm)
 +{
 + struct _irq_source *source;
 + int id;
 +
 + source = kzalloc(sizeof(*source), GFP_KERNEL);
 + if (!source)
 + return ERR_PTR(-ENOMEM);
 +
 + id = kvm_request_irq_source_id(kvm);
 + if (id  0) {
 + kfree(source);
 + return ERR_PTR(id);
 + }
 +
 + kref_init(source-kref);
 + spin_lock_init(source-lock);
 + source-kvm = kvm;
 + source-id = id;
 +
 + return source;
 +}
 +
 +/*
   * 
   * irqfd: Allows an fd to be used to inject an interrupt to the guest
   *
 @@ -52,6 +114,8 @@ struct _irqfd {
   /* Used for level IRQ fast-path */
   int gsi;
   struct work_struct inject;
 + /* IRQ source ID for level triggered irqfds */
 + struct 

Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote:
 On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
  In order to inject a level interrupt from an external source using an
  irqfd, we need to allocate a new irq_source_id.  This allows us to
  assert and (later) de-assert an interrupt line independently from
  users of KVM_IRQ_LINE and avoid lost interrupts.
  
  We also add what may appear like a bit of excessive infrastructure
  around an object for storing this irq_source_id.  However, notice
  that we only provide a way to assert the interrupt here.  A follow-on
  interface will make use of the same irq_source_id to allow de-assert.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---
  
   Documentation/virtual/kvm/api.txt |6 ++
   arch/x86/kvm/x86.c|1 
   include/linux/kvm.h   |3 +
   virt/kvm/eventfd.c|  114 
  -
   4 files changed, 120 insertions(+), 4 deletions(-)
  
  diff --git a/Documentation/virtual/kvm/api.txt 
  b/Documentation/virtual/kvm/api.txt
  index 100acde..c7267d5 100644
  --- a/Documentation/virtual/kvm/api.txt
  +++ b/Documentation/virtual/kvm/api.txt
  @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  The irqfd is 
  removed using
   the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
   and kvm_irqfd.gsi.
   
  +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level
  +triggered interrupt.  In this case a new irqchip input is allocated
  +which is logically OR'd with other inputs allowing multiple sources
  +to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
  +is only necessary on setup, teardown is identical to that above.
  +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
   
   5. The kvm_run structure
   
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index a01a424..80bed07 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  case KVM_CAP_GET_TSC_KHZ:
  case KVM_CAP_PCI_2_3:
  case KVM_CAP_KVMCLOCK_CTRL:
  +   case KVM_CAP_IRQFD_LEVEL:
  r = 1;
  break;
  case KVM_CAP_COALESCED_MMIO:
  diff --git a/include/linux/kvm.h b/include/linux/kvm.h
  index 2ce09aa..b2e6e4f 100644
  --- a/include/linux/kvm.h
  +++ b/include/linux/kvm.h
  @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
   #define KVM_CAP_PPC_GET_SMMU_INFO 78
   #define KVM_CAP_S390_COW 79
   #define KVM_CAP_PPC_ALLOC_HTAB 80
  +#define KVM_CAP_IRQFD_LEVEL 81
   
   #ifdef KVM_CAP_IRQ_ROUTING
   
  @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
   #endif
   
   #define KVM_IRQFD_FLAG_DEASSIGN (1  0)
  +/* Available with KVM_CAP_IRQFD_LEVEL */
  +#define KVM_IRQFD_FLAG_LEVEL (1  1)
   
   struct kvm_irqfd {
  __u32 fd;
  diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
  index 7d7e2aa..ecdbfea 100644
  --- a/virt/kvm/eventfd.c
  +++ b/virt/kvm/eventfd.c
  @@ -36,6 +36,68 @@
   #include iodev.h
   
   /*
  + * An irq_source_id can be created from KVM_IRQFD for level interrupt
  + * injections and shared with other interfaces for EOI or de-assert.
  + * Create an object with reference counting to make it easy to use.
  + */
  +struct _irq_source {
  +   int id; /* the IRQ source ID */
  +   bool level_asserted; /* Track assertion state and protect with lock */
  +   spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */
  +   struct kvm *kvm;
  +   struct kref kref;
  +};
  +
  +static void _irq_source_release(struct kref *kref)
  +{
  +   struct _irq_source *source;
  +
  +   source = container_of(kref, struct _irq_source, kref);
  +
  +   /* This also de-asserts */
  +   kvm_free_irq_source_id(source-kvm, source-id);
  +   kfree(source);
  +}
  +
  +static void _irq_source_put(struct _irq_source *source)
  +{
  +   if (source)
  +   kref_put(source-kref, _irq_source_release);
  +}
  +
  +static struct _irq_source *__attribute__ ((used)) /* white lie for now */
  +_irq_source_get(struct _irq_source *source)
  +{
  +   if (source)
  +   kref_get(source-kref);
  +
  +   return source;
  +}
  +
  +static struct _irq_source *_irq_source_alloc(struct kvm *kvm)
  +{
  +   struct _irq_source *source;
  +   int id;
  +
  +   source = kzalloc(sizeof(*source), GFP_KERNEL);
  +   if (!source)
  +   return ERR_PTR(-ENOMEM);
  +
  +   id = kvm_request_irq_source_id(kvm);
  +   if (id  0) {
  +   kfree(source);
  +   return ERR_PTR(id);
  +   }
  +
  +   kref_init(source-kref);
  +   spin_lock_init(source-lock);
  +   source-kvm = kvm;
  +   source-id = id;
  +
  +   return source;
  +}
  +
  +/*
* 
* irqfd: Allows an fd to be used to inject an interrupt to the guest
*
  @@ -52,6 +114,8 @@ struct _irqfd {
  /* Used for level IRQ 

Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote:
 On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote:
  On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
   In order to inject a level interrupt from an external source using an
   irqfd, we need to allocate a new irq_source_id.  This allows us to
   assert and (later) de-assert an interrupt line independently from
   users of KVM_IRQ_LINE and avoid lost interrupts.
   
   We also add what may appear like a bit of excessive infrastructure
   around an object for storing this irq_source_id.  However, notice
   that we only provide a way to assert the interrupt here.  A follow-on
   interface will make use of the same irq_source_id to allow de-assert.
   
   Signed-off-by: Alex Williamson alex.william...@redhat.com
   ---
   
Documentation/virtual/kvm/api.txt |6 ++
arch/x86/kvm/x86.c|1 
include/linux/kvm.h   |3 +
virt/kvm/eventfd.c|  114 
   -
4 files changed, 120 insertions(+), 4 deletions(-)
   
   diff --git a/Documentation/virtual/kvm/api.txt 
   b/Documentation/virtual/kvm/api.txt
   index 100acde..c7267d5 100644
   --- a/Documentation/virtual/kvm/api.txt
   +++ b/Documentation/virtual/kvm/api.txt
   @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  The irqfd 
   is removed using
the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
and kvm_irqfd.gsi.

   +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level
   +triggered interrupt.  In this case a new irqchip input is allocated
   +which is logically OR'd with other inputs allowing multiple sources
   +to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
   +is only necessary on setup, teardown is identical to that above.
   +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.

5. The kvm_run structure

   diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
   index a01a424..80bed07 100644
   --- a/arch/x86/kvm/x86.c
   +++ b/arch/x86/kvm/x86.c
   @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 case KVM_CAP_GET_TSC_KHZ:
 case KVM_CAP_PCI_2_3:
 case KVM_CAP_KVMCLOCK_CTRL:
   + case KVM_CAP_IRQFD_LEVEL:
 r = 1;
 break;
 case KVM_CAP_COALESCED_MMIO:
   diff --git a/include/linux/kvm.h b/include/linux/kvm.h
   index 2ce09aa..b2e6e4f 100644
   --- a/include/linux/kvm.h
   +++ b/include/linux/kvm.h
   @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_PPC_GET_SMMU_INFO 78
#define KVM_CAP_S390_COW 79
#define KVM_CAP_PPC_ALLOC_HTAB 80
   +#define KVM_CAP_IRQFD_LEVEL 81

#ifdef KVM_CAP_IRQ_ROUTING

   @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
#endif

#define KVM_IRQFD_FLAG_DEASSIGN (1  0)
   +/* Available with KVM_CAP_IRQFD_LEVEL */
   +#define KVM_IRQFD_FLAG_LEVEL (1  1)

struct kvm_irqfd {
 __u32 fd;
   diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
   index 7d7e2aa..ecdbfea 100644
   --- a/virt/kvm/eventfd.c
   +++ b/virt/kvm/eventfd.c
   @@ -36,6 +36,68 @@
#include iodev.h

/*
   + * An irq_source_id can be created from KVM_IRQFD for level interrupt
   + * injections and shared with other interfaces for EOI or de-assert.
   + * Create an object with reference counting to make it easy to use.
   + */
   +struct _irq_source {
   + int id; /* the IRQ source ID */
   + bool level_asserted; /* Track assertion state and protect with lock */
   + spinlock_t lock; /* to avoid unnecessary re-assert/spurious eoi. */
   + struct kvm *kvm;
   + struct kref kref;
   +};
   +
   +static void _irq_source_release(struct kref *kref)
   +{
   + struct _irq_source *source;
   +
   + source = container_of(kref, struct _irq_source, kref);
   +
   + /* This also de-asserts */
   + kvm_free_irq_source_id(source-kvm, source-id);
   + kfree(source);
   +}
   +
   +static void _irq_source_put(struct _irq_source *source)
   +{
   + if (source)
   + kref_put(source-kref, _irq_source_release);
   +}
   +
   +static struct _irq_source *__attribute__ ((used)) /* white lie for now */
   +_irq_source_get(struct _irq_source *source)
   +{
   + if (source)
   + kref_get(source-kref);
   +
   + return source;
   +}
   +
   +static struct _irq_source *_irq_source_alloc(struct kvm *kvm)
   +{
   + struct _irq_source *source;
   + int id;
   +
   + source = kzalloc(sizeof(*source), GFP_KERNEL);
   + if (!source)
   + return ERR_PTR(-ENOMEM);
   +
   + id = kvm_request_irq_source_id(kvm);
   + if (id  0) {
   + kfree(source);
   + return ERR_PTR(id);
   + }
   +
   + kref_init(source-kref);
   + spin_lock_init(source-lock);
   + source-kvm = kvm;
   + source-id = id;
   +
   + return source;
   +}
   +
   +/*
 * 
 * irqfd: 

Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote:
 On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote:
  On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote:
   On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
In order to inject a level interrupt from an external source using an
irqfd, we need to allocate a new irq_source_id.  This allows us to
assert and (later) de-assert an interrupt line independently from
users of KVM_IRQ_LINE and avoid lost interrupts.

We also add what may appear like a bit of excessive infrastructure
around an object for storing this irq_source_id.  However, notice
that we only provide a way to assert the interrupt here.  A follow-on
interface will make use of the same irq_source_id to allow de-assert.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Documentation/virtual/kvm/api.txt |6 ++
 arch/x86/kvm/x86.c|1 
 include/linux/kvm.h   |3 +
 virt/kvm/eventfd.c|  114 
-
 4 files changed, 120 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 100acde..c7267d5 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  The 
irqfd is removed using
 the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
 and kvm_irqfd.gsi.
 
+The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level
+triggered interrupt.  In this case a new irqchip input is allocated
+which is logically OR'd with other inputs allowing multiple sources
+to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
+is only necessary on setup, teardown is identical to that above.
+KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
 
 5. The kvm_run structure
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a01a424..80bed07 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_GET_TSC_KHZ:
case KVM_CAP_PCI_2_3:
case KVM_CAP_KVMCLOCK_CTRL:
+   case KVM_CAP_IRQFD_LEVEL:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..b2e6e4f 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_GET_SMMU_INFO 78
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
+#define KVM_CAP_IRQFD_LEVEL 81
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
 #endif
 
 #define KVM_IRQFD_FLAG_DEASSIGN (1  0)
+/* Available with KVM_CAP_IRQFD_LEVEL */
+#define KVM_IRQFD_FLAG_LEVEL (1  1)
 
 struct kvm_irqfd {
__u32 fd;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 7d7e2aa..ecdbfea 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -36,6 +36,68 @@
 #include iodev.h
 
 /*
+ * An irq_source_id can be created from KVM_IRQFD for level interrupt
+ * injections and shared with other interfaces for EOI or de-assert.
+ * Create an object with reference counting to make it easy to use.
+ */
+struct _irq_source {
+   int id; /* the IRQ source ID */
+   bool level_asserted; /* Track assertion state and protect with 
lock */
+   spinlock_t lock; /* to avoid unnecessary re-assert/spurious 
eoi. */
+   struct kvm *kvm;
+   struct kref kref;
+};
+
+static void _irq_source_release(struct kref *kref)
+{
+   struct _irq_source *source;
+
+   source = container_of(kref, struct _irq_source, kref);
+
+   /* This also de-asserts */
+   kvm_free_irq_source_id(source-kvm, source-id);
+   kfree(source);
+}
+
+static void _irq_source_put(struct _irq_source *source)
+{
+   if (source)
+   kref_put(source-kref, _irq_source_release);
+}
+
+static struct _irq_source *__attribute__ ((used)) /* white lie for now 
*/
+_irq_source_get(struct _irq_source *source)
+{
+   if (source)
+   kref_get(source-kref);
+
+   return source;
+}
+
+static struct _irq_source *_irq_source_alloc(struct kvm *kvm)
+{
+   struct _irq_source *source;
+   int id;
+
+   source = kzalloc(sizeof(*source), GFP_KERNEL);
+   if (!source)
+   return 

Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 01:49:06PM +0300, Gleb Natapov wrote:
 On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote:
  On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote:
   On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote:
On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
 In order to inject a level interrupt from an external source using an
 irqfd, we need to allocate a new irq_source_id.  This allows us to
 assert and (later) de-assert an interrupt line independently from
 users of KVM_IRQ_LINE and avoid lost interrupts.
 
 We also add what may appear like a bit of excessive infrastructure
 around an object for storing this irq_source_id.  However, notice
 that we only provide a way to assert the interrupt here.  A follow-on
 interface will make use of the same irq_source_id to allow de-assert.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 ---
 
  Documentation/virtual/kvm/api.txt |6 ++
  arch/x86/kvm/x86.c|1 
  include/linux/kvm.h   |3 +
  virt/kvm/eventfd.c|  114 
 -
  4 files changed, 120 insertions(+), 4 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 100acde..c7267d5 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  The 
 irqfd is removed using
  the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
  and kvm_irqfd.gsi.
  
 +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level
 +triggered interrupt.  In this case a new irqchip input is allocated
 +which is logically OR'd with other inputs allowing multiple sources
 +to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
 +is only necessary on setup, teardown is identical to that above.
 +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
  
  5. The kvm_run structure
  
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index a01a424..80bed07 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
   case KVM_CAP_GET_TSC_KHZ:
   case KVM_CAP_PCI_2_3:
   case KVM_CAP_KVMCLOCK_CTRL:
 + case KVM_CAP_IRQFD_LEVEL:
   r = 1;
   break;
   case KVM_CAP_COALESCED_MMIO:
 diff --git a/include/linux/kvm.h b/include/linux/kvm.h
 index 2ce09aa..b2e6e4f 100644
 --- a/include/linux/kvm.h
 +++ b/include/linux/kvm.h
 @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
  #define KVM_CAP_PPC_GET_SMMU_INFO 78
  #define KVM_CAP_S390_COW 79
  #define KVM_CAP_PPC_ALLOC_HTAB 80
 +#define KVM_CAP_IRQFD_LEVEL 81
  
  #ifdef KVM_CAP_IRQ_ROUTING
  
 @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
  #endif
  
  #define KVM_IRQFD_FLAG_DEASSIGN (1  0)
 +/* Available with KVM_CAP_IRQFD_LEVEL */
 +#define KVM_IRQFD_FLAG_LEVEL (1  1)
  
  struct kvm_irqfd {
   __u32 fd;
 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index 7d7e2aa..ecdbfea 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -36,6 +36,68 @@
  #include iodev.h
  
  /*
 + * An irq_source_id can be created from KVM_IRQFD for level interrupt
 + * injections and shared with other interfaces for EOI or de-assert.
 + * Create an object with reference counting to make it easy to use.
 + */
 +struct _irq_source {
 + int id; /* the IRQ source ID */
 + bool level_asserted; /* Track assertion state and protect with 
 lock */
 + spinlock_t lock; /* to avoid unnecessary re-assert/spurious 
 eoi. */
 + struct kvm *kvm;
 + struct kref kref;
 +};
 +
 +static void _irq_source_release(struct kref *kref)
 +{
 + struct _irq_source *source;
 +
 + source = container_of(kref, struct _irq_source, kref);
 +
 + /* This also de-asserts */
 + kvm_free_irq_source_id(source-kvm, source-id);
 + kfree(source);
 +}
 +
 +static void _irq_source_put(struct _irq_source *source)
 +{
 + if (source)
 + kref_put(source-kref, _irq_source_release);
 +}
 +
 +static struct _irq_source *__attribute__ ((used)) /* white lie for 
 now */
 +_irq_source_get(struct _irq_source *source)
 +{
 + if (source)
 + kref_get(source-kref);
 +
 + return source;
 +}
 +
 +static struct _irq_source *_irq_source_alloc(struct kvm *kvm)
 +{
 + struct _irq_source *source;
 + 

Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2012 at 01:53:11PM +0300, Michael S. Tsirkin wrote:
 On Wed, Jul 18, 2012 at 01:49:06PM +0300, Gleb Natapov wrote:
  On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote:
   On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote:
On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote:
 On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
  In order to inject a level interrupt from an external source using 
  an
  irqfd, we need to allocate a new irq_source_id.  This allows us to
  assert and (later) de-assert an interrupt line independently from
  users of KVM_IRQ_LINE and avoid lost interrupts.
  
  We also add what may appear like a bit of excessive infrastructure
  around an object for storing this irq_source_id.  However, notice
  that we only provide a way to assert the interrupt here.  A 
  follow-on
  interface will make use of the same irq_source_id to allow 
  de-assert.
  
  Signed-off-by: Alex Williamson alex.william...@redhat.com
  ---
  
   Documentation/virtual/kvm/api.txt |6 ++
   arch/x86/kvm/x86.c|1 
   include/linux/kvm.h   |3 +
   virt/kvm/eventfd.c|  114 
  -
   4 files changed, 120 insertions(+), 4 deletions(-)
  
  diff --git a/Documentation/virtual/kvm/api.txt 
  b/Documentation/virtual/kvm/api.txt
  index 100acde..c7267d5 100644
  --- a/Documentation/virtual/kvm/api.txt
  +++ b/Documentation/virtual/kvm/api.txt
  @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  The 
  irqfd is removed using
   the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
   and kvm_irqfd.gsi.
   
  +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a 
  level
  +triggered interrupt.  In this case a new irqchip input is allocated
  +which is logically OR'd with other inputs allowing multiple sources
  +to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
  +is only necessary on setup, teardown is identical to that above.
  +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.
   
   5. The kvm_run structure
   
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index a01a424..80bed07 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
  case KVM_CAP_GET_TSC_KHZ:
  case KVM_CAP_PCI_2_3:
  case KVM_CAP_KVMCLOCK_CTRL:
  +   case KVM_CAP_IRQFD_LEVEL:
  r = 1;
  break;
  case KVM_CAP_COALESCED_MMIO:
  diff --git a/include/linux/kvm.h b/include/linux/kvm.h
  index 2ce09aa..b2e6e4f 100644
  --- a/include/linux/kvm.h
  +++ b/include/linux/kvm.h
  @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
   #define KVM_CAP_PPC_GET_SMMU_INFO 78
   #define KVM_CAP_S390_COW 79
   #define KVM_CAP_PPC_ALLOC_HTAB 80
  +#define KVM_CAP_IRQFD_LEVEL 81
   
   #ifdef KVM_CAP_IRQ_ROUTING
   
  @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
   #endif
   
   #define KVM_IRQFD_FLAG_DEASSIGN (1  0)
  +/* Available with KVM_CAP_IRQFD_LEVEL */
  +#define KVM_IRQFD_FLAG_LEVEL (1  1)
   
   struct kvm_irqfd {
  __u32 fd;
  diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
  index 7d7e2aa..ecdbfea 100644
  --- a/virt/kvm/eventfd.c
  +++ b/virt/kvm/eventfd.c
  @@ -36,6 +36,68 @@
   #include iodev.h
   
   /*
  + * An irq_source_id can be created from KVM_IRQFD for level 
  interrupt
  + * injections and shared with other interfaces for EOI or 
  de-assert.
  + * Create an object with reference counting to make it easy to use.
  + */
  +struct _irq_source {
  +   int id; /* the IRQ source ID */
  +   bool level_asserted; /* Track assertion state and protect with 
  lock */
  +   spinlock_t lock; /* to avoid unnecessary re-assert/spurious 
  eoi. */
  +   struct kvm *kvm;
  +   struct kref kref;
  +};
  +
  +static void _irq_source_release(struct kref *kref)
  +{
  +   struct _irq_source *source;
  +
  +   source = container_of(kref, struct _irq_source, kref);
  +
  +   /* This also de-asserts */
  +   kvm_free_irq_source_id(source-kvm, source-id);
  +   kfree(source);
  +}
  +
  +static void _irq_source_put(struct _irq_source *source)
  +{
  +   if (source)
  +   kref_put(source-kref, _irq_source_release);
  +}
  +
  +static struct _irq_source *__attribute__ ((used)) /* white lie for 
  now */
  +_irq_source_get(struct _irq_source *source)
  +{
  +   if (source)
  +   

Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 01:55:30PM +0300, Gleb Natapov wrote:
 On Wed, Jul 18, 2012 at 01:53:11PM +0300, Michael S. Tsirkin wrote:
  On Wed, Jul 18, 2012 at 01:49:06PM +0300, Gleb Natapov wrote:
   On Wed, Jul 18, 2012 at 01:48:44PM +0300, Michael S. Tsirkin wrote:
On Wed, Jul 18, 2012 at 01:44:29PM +0300, Gleb Natapov wrote:
 On Wed, Jul 18, 2012 at 01:41:14PM +0300, Michael S. Tsirkin wrote:
  On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
   In order to inject a level interrupt from an external source 
   using an
   irqfd, we need to allocate a new irq_source_id.  This allows us to
   assert and (later) de-assert an interrupt line independently from
   users of KVM_IRQ_LINE and avoid lost interrupts.
   
   We also add what may appear like a bit of excessive infrastructure
   around an object for storing this irq_source_id.  However, notice
   that we only provide a way to assert the interrupt here.  A 
   follow-on
   interface will make use of the same irq_source_id to allow 
   de-assert.
   
   Signed-off-by: Alex Williamson alex.william...@redhat.com
   ---
   
Documentation/virtual/kvm/api.txt |6 ++
arch/x86/kvm/x86.c|1 
include/linux/kvm.h   |3 +
virt/kvm/eventfd.c|  114 
   -
4 files changed, 120 insertions(+), 4 deletions(-)
   
   diff --git a/Documentation/virtual/kvm/api.txt 
   b/Documentation/virtual/kvm/api.txt
   index 100acde..c7267d5 100644
   --- a/Documentation/virtual/kvm/api.txt
   +++ b/Documentation/virtual/kvm/api.txt
   @@ -1981,6 +1981,12 @@ the guest using the specified gsi pin.  
   The irqfd is removed using
the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
and kvm_irqfd.gsi.

   +The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a 
   level
   +triggered interrupt.  In this case a new irqchip input is 
   allocated
   +which is logically OR'd with other inputs allowing multiple 
   sources
   +to independently assert level interrupts.  The 
   KVM_IRQFD_FLAG_LEVEL
   +is only necessary on setup, teardown is identical to that above.
   +KVM_IRQFD_FLAG_LEVEL support is indicated by KVM_CAP_IRQFD_LEVEL.

5. The kvm_run structure

   diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
   index a01a424..80bed07 100644
   --- a/arch/x86/kvm/x86.c
   +++ b/arch/x86/kvm/x86.c
   @@ -2148,6 +2148,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 case KVM_CAP_GET_TSC_KHZ:
 case KVM_CAP_PCI_2_3:
 case KVM_CAP_KVMCLOCK_CTRL:
   + case KVM_CAP_IRQFD_LEVEL:
 r = 1;
 break;
 case KVM_CAP_COALESCED_MMIO:
   diff --git a/include/linux/kvm.h b/include/linux/kvm.h
   index 2ce09aa..b2e6e4f 100644
   --- a/include/linux/kvm.h
   +++ b/include/linux/kvm.h
   @@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_PPC_GET_SMMU_INFO 78
#define KVM_CAP_S390_COW 79
#define KVM_CAP_PPC_ALLOC_HTAB 80
   +#define KVM_CAP_IRQFD_LEVEL 81

#ifdef KVM_CAP_IRQ_ROUTING

   @@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
#endif

#define KVM_IRQFD_FLAG_DEASSIGN (1  0)
   +/* Available with KVM_CAP_IRQFD_LEVEL */
   +#define KVM_IRQFD_FLAG_LEVEL (1  1)

struct kvm_irqfd {
 __u32 fd;
   diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
   index 7d7e2aa..ecdbfea 100644
   --- a/virt/kvm/eventfd.c
   +++ b/virt/kvm/eventfd.c
   @@ -36,6 +36,68 @@
#include iodev.h

/*
   + * An irq_source_id can be created from KVM_IRQFD for level 
   interrupt
   + * injections and shared with other interfaces for EOI or 
   de-assert.
   + * Create an object with reference counting to make it easy to 
   use.
   + */
   +struct _irq_source {
   + int id; /* the IRQ source ID */
   + bool level_asserted; /* Track assertion state and protect with 
   lock */
   + spinlock_t lock; /* to avoid unnecessary re-assert/spurious 
   eoi. */
   + struct kvm *kvm;
   + struct kref kref;
   +};
   +
   +static void _irq_source_release(struct kref *kref)
   +{
   + struct _irq_source *source;
   +
   + source = container_of(kref, struct _irq_source, kref);
   +
   + /* This also de-asserts */
   + kvm_free_irq_source_id(source-kvm, source-id);
   + kfree(source);
   +}
   +
   +static void _irq_source_put(struct _irq_source *source)
   +{
   + if (source)
   + kref_put(source-kref, _irq_source_release);
   +}
   +
   +static 

Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote:
   So as was discussed kvm_set_irq under spinlock is bad for 
   scalability
   with multiple VCPUs.  Why do we need a spinlock simply to protect
   level_asserted?  Let's use an atomic test and set/test and clear 
   and the
   problem goes away.
   
  That sad reality is that for level interrupt we already scan all 
  vcpus
  under spinlock.
 
 Where?
 
ioapic
   
   $ grep kvm_for_each_vcpu virt/kvm/ioapic.c
   $
   
   ?
   
  
  Come on Michael. You can do better than grep and actually look at what
  code does. The code that loops over all vcpus while delivering an irq is
  in kvm_irq_delivery_to_apic(). Now grep for that.
 
 Hmm, I see, it's actually done for edge if injected from ioapic too,
 right?
 
 So set_irq does a linear scan, and for each matching CPU it calls
 kvm_irq_delivery_to_apic which is another scan?
 So it's actually N^2 worst case for a broadcast?

No it isn't, I misread the code.


Anyway, maybe not trivially but this looks fixable to me: we could drop
the ioapic lock before calling kvm_irq_delivery_to_apic.

  --
  Gleb.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote:
 On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote:
So as was discussed kvm_set_irq under spinlock is bad for 
scalability
with multiple VCPUs.  Why do we need a spinlock simply to 
protect
level_asserted?  Let's use an atomic test and set/test and 
clear and the
problem goes away.

   That sad reality is that for level interrupt we already scan all 
   vcpus
   under spinlock.
  
  Where?
  
 ioapic

$ grep kvm_for_each_vcpu virt/kvm/ioapic.c
$

?

   
   Come on Michael. You can do better than grep and actually look at what
   code does. The code that loops over all vcpus while delivering an irq is
   in kvm_irq_delivery_to_apic(). Now grep for that.
  
  Hmm, I see, it's actually done for edge if injected from ioapic too,
  right?
  
  So set_irq does a linear scan, and for each matching CPU it calls
  kvm_irq_delivery_to_apic which is another scan?
  So it's actually N^2 worst case for a broadcast?
 
 No it isn't, I misread the code.
 
 
 Anyway, maybe not trivially but this looks fixable to me: we could drop
 the ioapic lock before calling kvm_irq_delivery_to_apic.
 
May be, may be not. Just saying lets drop lock whenever we don't feel
like holding one does not cut it. Back to original point though current
situation is that calling kvm_set_irq() under spinlock is not worse for
scalability than calling it not under one.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote:
 On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote:
  On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote:
 So as was discussed kvm_set_irq under spinlock is bad for 
 scalability
 with multiple VCPUs.  Why do we need a spinlock simply to 
 protect
 level_asserted?  Let's use an atomic test and set/test and 
 clear and the
 problem goes away.
 
That sad reality is that for level interrupt we already scan 
all vcpus
under spinlock.
   
   Where?
   
  ioapic
 
 $ grep kvm_for_each_vcpu virt/kvm/ioapic.c
 $
 
 ?
 

Come on Michael. You can do better than grep and actually look at what
code does. The code that loops over all vcpus while delivering an irq is
in kvm_irq_delivery_to_apic(). Now grep for that.
   
   Hmm, I see, it's actually done for edge if injected from ioapic too,
   right?
   
   So set_irq does a linear scan, and for each matching CPU it calls
   kvm_irq_delivery_to_apic which is another scan?
   So it's actually N^2 worst case for a broadcast?
  
  No it isn't, I misread the code.
  
  
  Anyway, maybe not trivially but this looks fixable to me: we could drop
  the ioapic lock before calling kvm_irq_delivery_to_apic.
  
 May be, may be not. Just saying lets drop lock whenever we don't feel
 like holding one does not cut it.

One thing we do is set remote_irr if interrupt was injected.
I agree these things are tricky.

One other question:

static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
{
union kvm_ioapic_redirect_entry *pent;
int injected = -1;

pent = ioapic-redirtbl[idx];

if (!pent-fields.mask) {
injected = ioapic_deliver(ioapic, idx);
if (injected  pent-fields.trig_mode == IOAPIC_LEVEL_TRIG)
pent-fields.remote_irr = 1;
}

return injected;
}


This if (injected) looks a bit strange since ioapic_deliver returns
-1 if no matching destinations. Should be if (injected  0)?



 Back to original point though current
 situation is that calling kvm_set_irq() under spinlock is not worse for
 scalability than calling it not under one.

Yes. Still the specific use can just use an atomic flag,
lock+bool is not needed, and we won't need to undo it later.

 --
   Gleb.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Alex Williamson
On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote:
 On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote:
  On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote:
   On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote:
  So as was discussed kvm_set_irq under spinlock is bad for 
  scalability
  with multiple VCPUs.  Why do we need a spinlock simply to 
  protect
  level_asserted?  Let's use an atomic test and set/test and 
  clear and the
  problem goes away.
  
 That sad reality is that for level interrupt we already scan 
 all vcpus
 under spinlock.

Where?

   ioapic
  
  $ grep kvm_for_each_vcpu virt/kvm/ioapic.c
  $
  
  ?
  
 
 Come on Michael. You can do better than grep and actually look at what
 code does. The code that loops over all vcpus while delivering an irq 
 is
 in kvm_irq_delivery_to_apic(). Now grep for that.

Hmm, I see, it's actually done for edge if injected from ioapic too,
right?

So set_irq does a linear scan, and for each matching CPU it calls
kvm_irq_delivery_to_apic which is another scan?
So it's actually N^2 worst case for a broadcast?
   
   No it isn't, I misread the code.
   
   
   Anyway, maybe not trivially but this looks fixable to me: we could drop
   the ioapic lock before calling kvm_irq_delivery_to_apic.
   
  May be, may be not. Just saying lets drop lock whenever we don't feel
  like holding one does not cut it.
 
 One thing we do is set remote_irr if interrupt was injected.
 I agree these things are tricky.
 
 One other question:
 
 static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
 {
 union kvm_ioapic_redirect_entry *pent;
 int injected = -1;
 
 pent = ioapic-redirtbl[idx];
 
 if (!pent-fields.mask) {
 injected = ioapic_deliver(ioapic, idx);
 if (injected  pent-fields.trig_mode == IOAPIC_LEVEL_TRIG)
 pent-fields.remote_irr = 1;
 }
 
 return injected;
 }
 
 
 This if (injected) looks a bit strange since ioapic_deliver returns
 -1 if no matching destinations. Should be if (injected  0)?
 
 
 
  Back to original point though current
  situation is that calling kvm_set_irq() under spinlock is not worse for
  scalability than calling it not under one.
 
 Yes. Still the specific use can just use an atomic flag,
 lock+bool is not needed, and we won't need to undo it later.


Actually, no, replacing it with an atomic is racy.

CPU0 (inject)   CPU1 (EOI)
atomic_cmpxchg(asserted, 0, 1)
atomic_cmpxchg(asserted, 1, 0)
kvm_set_irq(0)
kvm_set_irq(1)
eventfd_signal

The interrupt is now stuck on until another interrupt is injected.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 08:47:23AM -0600, Alex Williamson wrote:
 On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote:
  On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote:
   On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote:
On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote:
   So as was discussed kvm_set_irq under spinlock is bad for 
   scalability
   with multiple VCPUs.  Why do we need a spinlock simply to 
   protect
   level_asserted?  Let's use an atomic test and set/test 
   and clear and the
   problem goes away.
   
  That sad reality is that for level interrupt we already 
  scan all vcpus
  under spinlock.
 
 Where?
 
ioapic
   
   $ grep kvm_for_each_vcpu virt/kvm/ioapic.c
   $
   
   ?
   
  
  Come on Michael. You can do better than grep and actually look at 
  what
  code does. The code that loops over all vcpus while delivering an 
  irq is
  in kvm_irq_delivery_to_apic(). Now grep for that.
 
 Hmm, I see, it's actually done for edge if injected from ioapic too,
 right?
 
 So set_irq does a linear scan, and for each matching CPU it calls
 kvm_irq_delivery_to_apic which is another scan?
 So it's actually N^2 worst case for a broadcast?

No it isn't, I misread the code.


Anyway, maybe not trivially but this looks fixable to me: we could drop
the ioapic lock before calling kvm_irq_delivery_to_apic.

   May be, may be not. Just saying lets drop lock whenever we don't feel
   like holding one does not cut it.
  
  One thing we do is set remote_irr if interrupt was injected.
  I agree these things are tricky.
  
  One other question:
  
  static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
  {
  union kvm_ioapic_redirect_entry *pent;
  int injected = -1;
  
  pent = ioapic-redirtbl[idx];
  
  if (!pent-fields.mask) {
  injected = ioapic_deliver(ioapic, idx);
  if (injected  pent-fields.trig_mode == IOAPIC_LEVEL_TRIG)
  pent-fields.remote_irr = 1;
  }
  
  return injected;
  }
  
  
  This if (injected) looks a bit strange since ioapic_deliver returns
  -1 if no matching destinations. Should be if (injected  0)?
  
  
  
   Back to original point though current
   situation is that calling kvm_set_irq() under spinlock is not worse for
   scalability than calling it not under one.
  
  Yes. Still the specific use can just use an atomic flag,
  lock+bool is not needed, and we won't need to undo it later.
 
 
 Actually, no, replacing it with an atomic is racy.
 
 CPU0 (inject)   CPU1 (EOI)
 atomic_cmpxchg(asserted, 0, 1)
 atomic_cmpxchg(asserted, 1, 0)
 kvm_set_irq(0)
 kvm_set_irq(1)
 eventfd_signal
 
 The interrupt is now stuck on until another interrupt is injected.
 

Well EOI somehow happened here before interrupt so it's a bug somewhere
else?


-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Alex Williamson
On Wed, 2012-07-18 at 18:38 +0300, Michael S. Tsirkin wrote:
 On Wed, Jul 18, 2012 at 08:47:23AM -0600, Alex Williamson wrote:
  On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote:
   On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote:
On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote:
 On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote:
So as was discussed kvm_set_irq under spinlock is bad 
for scalability
with multiple VCPUs.  Why do we need a spinlock simply 
to protect
level_asserted?  Let's use an atomic test and set/test 
and clear and the
problem goes away.

   That sad reality is that for level interrupt we already 
   scan all vcpus
   under spinlock.
  
  Where?
  
 ioapic

$ grep kvm_for_each_vcpu virt/kvm/ioapic.c
$

?

   
   Come on Michael. You can do better than grep and actually look at 
   what
   code does. The code that loops over all vcpus while delivering an 
   irq is
   in kvm_irq_delivery_to_apic(). Now grep for that.
  
  Hmm, I see, it's actually done for edge if injected from ioapic too,
  right?
  
  So set_irq does a linear scan, and for each matching CPU it calls
  kvm_irq_delivery_to_apic which is another scan?
  So it's actually N^2 worst case for a broadcast?
 
 No it isn't, I misread the code.
 
 
 Anyway, maybe not trivially but this looks fixable to me: we could 
 drop
 the ioapic lock before calling kvm_irq_delivery_to_apic.
 
May be, may be not. Just saying lets drop lock whenever we don't feel
like holding one does not cut it.
   
   One thing we do is set remote_irr if interrupt was injected.
   I agree these things are tricky.
   
   One other question:
   
   static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
   {
   union kvm_ioapic_redirect_entry *pent;
   int injected = -1;
   
   pent = ioapic-redirtbl[idx];
   
   if (!pent-fields.mask) {
   injected = ioapic_deliver(ioapic, idx);
   if (injected  pent-fields.trig_mode == 
   IOAPIC_LEVEL_TRIG)
   pent-fields.remote_irr = 1;
   }
   
   return injected;
   }
   
   
   This if (injected) looks a bit strange since ioapic_deliver returns
   -1 if no matching destinations. Should be if (injected  0)?
   
   
   
Back to original point though current
situation is that calling kvm_set_irq() under spinlock is not worse for
scalability than calling it not under one.
   
   Yes. Still the specific use can just use an atomic flag,
   lock+bool is not needed, and we won't need to undo it later.
  
  
  Actually, no, replacing it with an atomic is racy.
  
  CPU0 (inject)   CPU1 (EOI)
  atomic_cmpxchg(asserted, 0, 1)
  atomic_cmpxchg(asserted, 1, 0)
  kvm_set_irq(0)
  kvm_set_irq(1)
  eventfd_signal
  
  The interrupt is now stuck on until another interrupt is injected.
  
 
 Well EOI somehow happened here before interrupt so it's a bug somewhere
 else?

Interrupts can be shared.  We also can't guarantee that the guest won't
write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter on
irq source id... I'm not sure it can.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 09:48:01AM -0600, Alex Williamson wrote:
 On Wed, 2012-07-18 at 18:38 +0300, Michael S. Tsirkin wrote:
  On Wed, Jul 18, 2012 at 08:47:23AM -0600, Alex Williamson wrote:
   On Wed, 2012-07-18 at 15:07 +0300, Michael S. Tsirkin wrote:
On Wed, Jul 18, 2012 at 02:48:44PM +0300, Gleb Natapov wrote:
 On Wed, Jul 18, 2012 at 02:39:10PM +0300, Michael S. Tsirkin wrote:
  On Wed, Jul 18, 2012 at 02:22:19PM +0300, Michael S. Tsirkin wrote:
 So as was discussed kvm_set_irq under spinlock is bad 
 for scalability
 with multiple VCPUs.  Why do we need a spinlock 
 simply to protect
 level_asserted?  Let's use an atomic test and 
 set/test and clear and the
 problem goes away.
 
That sad reality is that for level interrupt we already 
scan all vcpus
under spinlock.
   
   Where?
   
  ioapic
 
 $ grep kvm_for_each_vcpu virt/kvm/ioapic.c
 $
 
 ?
 

Come on Michael. You can do better than grep and actually look 
at what
code does. The code that loops over all vcpus while delivering 
an irq is
in kvm_irq_delivery_to_apic(). Now grep for that.
   
   Hmm, I see, it's actually done for edge if injected from ioapic 
   too,
   right?
   
   So set_irq does a linear scan, and for each matching CPU it calls
   kvm_irq_delivery_to_apic which is another scan?
   So it's actually N^2 worst case for a broadcast?
  
  No it isn't, I misread the code.
  
  
  Anyway, maybe not trivially but this looks fixable to me: we could 
  drop
  the ioapic lock before calling kvm_irq_delivery_to_apic.
  
 May be, may be not. Just saying lets drop lock whenever we don't feel
 like holding one does not cut it.

One thing we do is set remote_irr if interrupt was injected.
I agree these things are tricky.

One other question:

static int ioapic_service(struct kvm_ioapic *ioapic, unsigned int idx)
{
union kvm_ioapic_redirect_entry *pent;
int injected = -1;

pent = ioapic-redirtbl[idx];

if (!pent-fields.mask) {
injected = ioapic_deliver(ioapic, idx);
if (injected  pent-fields.trig_mode == 
IOAPIC_LEVEL_TRIG)
pent-fields.remote_irr = 1;
}

return injected;
}


This if (injected) looks a bit strange since ioapic_deliver returns
-1 if no matching destinations. Should be if (injected  0)?



 Back to original point though current
 situation is that calling kvm_set_irq() under spinlock is not worse 
 for
 scalability than calling it not under one.

Yes. Still the specific use can just use an atomic flag,
lock+bool is not needed, and we won't need to undo it later.
   
   
   Actually, no, replacing it with an atomic is racy.
   
   CPU0 (inject)   CPU1 (EOI)
   atomic_cmpxchg(asserted, 0, 1)
   atomic_cmpxchg(asserted, 1, 0)
   kvm_set_irq(0)
   kvm_set_irq(1)
   eventfd_signal
   
   The interrupt is now stuck on until another interrupt is injected.
   
  
  Well EOI somehow happened here before interrupt so it's a bug somewhere
  else?
 
 Interrupts can be shared.  We also can't guarantee that the guest won't
 write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter on
 irq source id... I'm not sure it can.

I guess if Avi OKs adding another kvm_set_irq under spinlock that's
the best we can do for now.

If not, maybe we can teach kvm_set_irq to return an indication
of the previous status. Specifically kvm_irq_line_state
could do test_and_set/test_and_clear and if already set/clear
we return 0 immediately.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Marcelo Tosatti
On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote:
  Back to original point though current
  situation is that calling kvm_set_irq() under spinlock is not worse 
  for
  scalability than calling it not under one.
 
 Yes. Still the specific use can just use an atomic flag,
 lock+bool is not needed, and we won't need to undo it later.


Actually, no, replacing it with an atomic is racy.

CPU0 (inject)   CPU1 (EOI)
atomic_cmpxchg(asserted, 0, 1)
atomic_cmpxchg(asserted, 1, 0)
kvm_set_irq(0)
kvm_set_irq(1)
eventfd_signal

The interrupt is now stuck on until another interrupt is injected.

   
   Well EOI somehow happened here before interrupt so it's a bug somewhere
   else?
  
  Interrupts can be shared.  We also can't guarantee that the guest won't
  write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter on
  irq source id... I'm not sure it can.
 
 I guess if Avi OKs adding another kvm_set_irq under spinlock that's
 the best we can do for now.

Why can't a mutex be used instead of a spinlock again?


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Gleb Natapov
On Wed, Jul 18, 2012 at 03:42:09PM -0300, Marcelo Tosatti wrote:
 On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote:
   Back to original point though current
   situation is that calling kvm_set_irq() under spinlock is not 
   worse for
   scalability than calling it not under one.
  
  Yes. Still the specific use can just use an atomic flag,
  lock+bool is not needed, and we won't need to undo it later.
 
 
 Actually, no, replacing it with an atomic is racy.
 
 CPU0 (inject)   CPU1 (EOI)
 atomic_cmpxchg(asserted, 0, 1)
 atomic_cmpxchg(asserted, 1, 0)
 kvm_set_irq(0)
 kvm_set_irq(1)
 eventfd_signal
 
 The interrupt is now stuck on until another interrupt is injected.
 

Well EOI somehow happened here before interrupt so it's a bug somewhere
else?
   
   Interrupts can be shared.  We also can't guarantee that the guest won't
   write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter on
   irq source id... I'm not sure it can.
  
  I guess if Avi OKs adding another kvm_set_irq under spinlock that's
  the best we can do for now.
 
 Why can't a mutex be used instead of a spinlock again?
 
Why was it changed at the first place? Commit says that the function is
called from unsleepable context, but no stack trace.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Alex Williamson
On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote:
 On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote:
   Back to original point though current
   situation is that calling kvm_set_irq() under spinlock is not 
   worse for
   scalability than calling it not under one.
  
  Yes. Still the specific use can just use an atomic flag,
  lock+bool is not needed, and we won't need to undo it later.
 
 
 Actually, no, replacing it with an atomic is racy.
 
 CPU0 (inject)   CPU1 (EOI)
 atomic_cmpxchg(asserted, 0, 1)
 atomic_cmpxchg(asserted, 1, 0)
 kvm_set_irq(0)
 kvm_set_irq(1)
 eventfd_signal
 
 The interrupt is now stuck on until another interrupt is injected.
 

Well EOI somehow happened here before interrupt so it's a bug somewhere
else?
   
   Interrupts can be shared.  We also can't guarantee that the guest won't
   write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter on
   irq source id... I'm not sure it can.
  
  I guess if Avi OKs adding another kvm_set_irq under spinlock that's
  the best we can do for now.
 
 Why can't a mutex be used instead of a spinlock again?

eventfd_signal calls the inject function from atomic context.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Alex Williamson
On Wed, 2012-07-18 at 13:07 -0600, Alex Williamson wrote:
 On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote:
  On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote:
Back to original point though current
situation is that calling kvm_set_irq() under spinlock is not 
worse for
scalability than calling it not under one.
   
   Yes. Still the specific use can just use an atomic flag,
   lock+bool is not needed, and we won't need to undo it later.
  
  
  Actually, no, replacing it with an atomic is racy.
  
  CPU0 (inject)   CPU1 (EOI)
  atomic_cmpxchg(asserted, 0, 1)
  atomic_cmpxchg(asserted, 1, 0)
  kvm_set_irq(0)
  kvm_set_irq(1)
  eventfd_signal
  
  The interrupt is now stuck on until another interrupt is injected.
  
 
 Well EOI somehow happened here before interrupt so it's a bug 
 somewhere
 else?

Interrupts can be shared.  We also can't guarantee that the guest won't
write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter on
irq source id... I'm not sure it can.
   
   I guess if Avi OKs adding another kvm_set_irq under spinlock that's
   the best we can do for now.
  
  Why can't a mutex be used instead of a spinlock again?
 
 eventfd_signal calls the inject function from atomic context.

Actually, that's called from a workq.  I'll have to switch it back and
turn on lockdep to remember why I couldn't sleep there.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 01:13:06PM -0600, Alex Williamson wrote:
 On Wed, 2012-07-18 at 13:07 -0600, Alex Williamson wrote:
  On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote:
   On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote:
 Back to original point though current
 situation is that calling kvm_set_irq() under spinlock is not 
 worse for
 scalability than calling it not under one.

Yes. Still the specific use can just use an atomic flag,
lock+bool is not needed, and we won't need to undo it later.
   
   
   Actually, no, replacing it with an atomic is racy.
   
   CPU0 (inject)   CPU1 (EOI)
   atomic_cmpxchg(asserted, 0, 1)
   atomic_cmpxchg(asserted, 1, 
   0)
   kvm_set_irq(0)
   kvm_set_irq(1)
   eventfd_signal
   
   The interrupt is now stuck on until another interrupt is injected.
   
  
  Well EOI somehow happened here before interrupt so it's a bug 
  somewhere
  else?
 
 Interrupts can be shared.  We also can't guarantee that the guest 
 won't
 write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter 
 on
 irq source id... I'm not sure it can.

I guess if Avi OKs adding another kvm_set_irq under spinlock that's
the best we can do for now.
   
   Why can't a mutex be used instead of a spinlock again?
  
  eventfd_signal calls the inject function from atomic context.
 
 Actually, that's called from a workq.  I'll have to switch it back and
 turn on lockdep to remember why I couldn't sleep there.


I'll try to fix kvm_set_irq so it returns 0 if level was already 0.
Then you do not need extra state.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Alex Williamson
On Wed, 2012-07-18 at 13:13 -0600, Alex Williamson wrote:
 On Wed, 2012-07-18 at 13:07 -0600, Alex Williamson wrote:
  On Wed, 2012-07-18 at 15:42 -0300, Marcelo Tosatti wrote:
   On Wed, Jul 18, 2012 at 06:58:24PM +0300, Michael S. Tsirkin wrote:
 Back to original point though current
 situation is that calling kvm_set_irq() under spinlock is not 
 worse for
 scalability than calling it not under one.

Yes. Still the specific use can just use an atomic flag,
lock+bool is not needed, and we won't need to undo it later.
   
   
   Actually, no, replacing it with an atomic is racy.
   
   CPU0 (inject)   CPU1 (EOI)
   atomic_cmpxchg(asserted, 0, 1)
   atomic_cmpxchg(asserted, 1, 
   0)
   kvm_set_irq(0)
   kvm_set_irq(1)
   eventfd_signal
   
   The interrupt is now stuck on until another interrupt is injected.
   
  
  Well EOI somehow happened here before interrupt so it's a bug 
  somewhere
  else?
 
 Interrupts can be shared.  We also can't guarantee that the guest 
 won't
 write a bogus EOI to the ioapic.  The irq ack notifier doesn't filter 
 on
 irq source id... I'm not sure it can.

I guess if Avi OKs adding another kvm_set_irq under spinlock that's
the best we can do for now.
   
   Why can't a mutex be used instead of a spinlock again?
  
  eventfd_signal calls the inject function from atomic context.
 
 Actually, that's called from a workq.  I'll have to switch it back and
 turn on lockdep to remember why I couldn't sleep there.

switching to a mutex results in:

BUG: sleeping function called from invalid context at kernel/mutex.c:269
in_atomic(): 1, irqs_disabled(): 0, pid: 30025, name: qemu-system-x86
INFO: lockdep is turned off.
Pid: 30025, comm: qemu-system-x86 Not tainted 3.5.0-rc4+ #109
Call Trace:
 [81088425] __might_sleep+0xf5/0x130
 [81564c6f] mutex_lock_nested+0x2f/0x60
 [a07db7d5] eoifd_event+0x25/0x70 [kvm]
 [a07daea4] kvm_notify_acked_irq+0xa4/0x140 [kvm]
 [a07dae2a] ? kvm_notify_acked_irq+0x2a/0x140 [kvm]
 [a07d9bb4] kvm_ioapic_update_eoi+0x84/0xf0 [kvm]
 [a0806c43] apic_set_eoi+0x123/0x130 [kvm]
 [a0806fd8] apic_reg_write+0x388/0x670 [kvm]
 [a07eb03c] ? vcpu_enter_guest+0x32c/0x740 [kvm]
 [a0807481] kvm_lapic_set_eoi+0x21/0x30 [kvm]
 [a04ba3f9] handle_apic_access+0x69/0x80 [kvm_intel]
 [a04ba02a] vmx_handle_exit+0xaa/0x260 [kvm_intel]


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Marcelo Tosatti
On Wed, Jul 18, 2012 at 02:28:34PM -0600, Alex Williamson wrote:
  turn on lockdep to remember why I couldn't sleep there.
 
 switching to a mutex results in:
 
 BUG: sleeping function called from invalid context at kernel/mutex.c:269
 in_atomic(): 1, irqs_disabled(): 0, pid: 30025, name: qemu-system-x86
 INFO: lockdep is turned off.
 Pid: 30025, comm: qemu-system-x86 Not tainted 3.5.0-rc4+ #109
 Call Trace:
  [81088425] __might_sleep+0xf5/0x130
  [81564c6f] mutex_lock_nested+0x2f/0x60
  [a07db7d5] eoifd_event+0x25/0x70 [kvm]
  [a07daea4] kvm_notify_acked_irq+0xa4/0x140 [kvm]
  [a07dae2a] ? kvm_notify_acked_irq+0x2a/0x140 [kvm]
  [a07d9bb4] kvm_ioapic_update_eoi+0x84/0xf0 [kvm]
  [a0806c43] apic_set_eoi+0x123/0x130 [kvm]
  [a0806fd8] apic_reg_write+0x388/0x670 [kvm]
  [a07eb03c] ? vcpu_enter_guest+0x32c/0x740 [kvm]
  [a0807481] kvm_lapic_set_eoi+0x21/0x30 [kvm]
  [a04ba3f9] handle_apic_access+0x69/0x80 [kvm_intel]
  [a04ba02a] vmx_handle_exit+0xaa/0x260 [kvm_intel]

Its RCU from ack notifiers, OK. 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-18 Thread Michael S. Tsirkin
On Wed, Jul 18, 2012 at 06:23:34PM -0300, Marcelo Tosatti wrote:
 On Wed, Jul 18, 2012 at 02:28:34PM -0600, Alex Williamson wrote:
   turn on lockdep to remember why I couldn't sleep there.
  
  switching to a mutex results in:
  
  BUG: sleeping function called from invalid context at kernel/mutex.c:269
  in_atomic(): 1, irqs_disabled(): 0, pid: 30025, name: qemu-system-x86
  INFO: lockdep is turned off.
  Pid: 30025, comm: qemu-system-x86 Not tainted 3.5.0-rc4+ #109
  Call Trace:
   [81088425] __might_sleep+0xf5/0x130
   [81564c6f] mutex_lock_nested+0x2f/0x60
   [a07db7d5] eoifd_event+0x25/0x70 [kvm]
   [a07daea4] kvm_notify_acked_irq+0xa4/0x140 [kvm]
   [a07dae2a] ? kvm_notify_acked_irq+0x2a/0x140 [kvm]
   [a07d9bb4] kvm_ioapic_update_eoi+0x84/0xf0 [kvm]
   [a0806c43] apic_set_eoi+0x123/0x130 [kvm]
   [a0806fd8] apic_reg_write+0x388/0x670 [kvm]
   [a07eb03c] ? vcpu_enter_guest+0x32c/0x740 [kvm]
   [a0807481] kvm_lapic_set_eoi+0x21/0x30 [kvm]
   [a04ba3f9] handle_apic_access+0x69/0x80 [kvm_intel]
   [a04ba02a] vmx_handle_exit+0xaa/0x260 [kvm_intel]
 
 Its RCU from ack notifiers, OK. 

I'm testing a patch that moves all bitmap handling to under
pic/ioapic lock. After this we can teach kvm_set_irq to report
when a bit that is cleared/set is already clear/set, without
races.

And then no tracking will be necessary in irqfd - we can just
call kvm_set_irq(..., 0) and look at the return status.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-17 Thread Michael S. Tsirkin
On Tue, Jul 17, 2012 at 04:16:04PM -0600, Alex Williamson wrote:
> On Wed, 2012-07-18 at 01:00 +0300, Michael S. Tsirkin wrote:
> > On Tue, Jul 17, 2012 at 03:57:41PM -0600, Alex Williamson wrote:
> > > On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
> > > > > @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work)
> > > > >* It is now safe to release the object's resources
> > > > >*/
> > > > >   eventfd_ctx_put(irqfd->eventfd);
> > > > > +
> > > > > + _irq_source_put(irqfd->source);
> > > > > +
> > > > >   kfree(irqfd);
> > > > >  }
> > > > >  
> > > > 
> > > > Hang on, this is a bug I think. This is done asynchronously.  So this
> > > > means that I can assign MAX number of irqfds, then close one, and now
> > >   What is this?
> > > Do you mean max irq source ids?
> > 
> > Yes, this is what I meant. Sorry about being unclear.
> > 
> > > > assign will fail because deassign did not get freed.
> > > > 
> > > > Maybe we can solve this by flushing wq before assign?
> > > > Looks a bit fragile but may be enough - need to document well.
> 
> Why is this fragile?  We could even make it part of a retry so we don't
> call it unless we need to.
> 

It just ties in assign and deassign. Maybe it's ok - but pls
add a comment explaining the whole design.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-17 Thread Alex Williamson
On Wed, 2012-07-18 at 01:00 +0300, Michael S. Tsirkin wrote:
> On Tue, Jul 17, 2012 at 03:57:41PM -0600, Alex Williamson wrote:
> > On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote:
> > > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
> > > > @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work)
> > > >  * It is now safe to release the object's resources
> > > >  */
> > > > eventfd_ctx_put(irqfd->eventfd);
> > > > +
> > > > +   _irq_source_put(irqfd->source);
> > > > +
> > > > kfree(irqfd);
> > > >  }
> > > >  
> > > 
> > > Hang on, this is a bug I think. This is done asynchronously.  So this
> > > means that I can assign MAX number of irqfds, then close one, and now
> >   What is this?
> > Do you mean max irq source ids?
> 
> Yes, this is what I meant. Sorry about being unclear.
> 
> > > assign will fail because deassign did not get freed.
> > > 
> > > Maybe we can solve this by flushing wq before assign?
> > > Looks a bit fragile but may be enough - need to document well.

Why is this fragile?  We could even make it part of a retry so we don't
call it unless we need to.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-17 Thread Michael S. Tsirkin
On Tue, Jul 17, 2012 at 03:57:41PM -0600, Alex Williamson wrote:
> On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote:
> > On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
> > > @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work)
> > >* It is now safe to release the object's resources
> > >*/
> > >   eventfd_ctx_put(irqfd->eventfd);
> > > +
> > > + _irq_source_put(irqfd->source);
> > > +
> > >   kfree(irqfd);
> > >  }
> > >  
> > 
> > Hang on, this is a bug I think. This is done asynchronously.  So this
> > means that I can assign MAX number of irqfds, then close one, and now
>   What is this?
> Do you mean max irq source ids?

Yes, this is what I meant. Sorry about being unclear.

> > assign will fail because deassign did not get freed.
> > 
> > Maybe we can solve this by flushing wq before assign?
> > Looks a bit fragile but may be enough - need to document well.
> > 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-17 Thread Alex Williamson
On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote:
> On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
> > @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work)
> >  * It is now safe to release the object's resources
> >  */
> > eventfd_ctx_put(irqfd->eventfd);
> > +
> > +   _irq_source_put(irqfd->source);
> > +
> > kfree(irqfd);
> >  }
> >  
> 
> Hang on, this is a bug I think. This is done asynchronously.  So this
> means that I can assign MAX number of irqfds, then close one, and now
  What is this?
Do you mean max irq source ids?

> assign will fail because deassign did not get freed.
> 
> Maybe we can solve this by flushing wq before assign?
> Looks a bit fragile but may be enough - need to document well.
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-17 Thread Michael S. Tsirkin
On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
> @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work)
>* It is now safe to release the object's resources
>*/
>   eventfd_ctx_put(irqfd->eventfd);
> +
> + _irq_source_put(irqfd->source);
> +
>   kfree(irqfd);
>  }
>  

Hang on, this is a bug I think. This is done asynchronously.  So this
means that I can assign MAX number of irqfds, then close one, and now
assign will fail because deassign did not get freed.

Maybe we can solve this by flushing wq before assign?
Looks a bit fragile but may be enough - need to document well.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-17 Thread Michael S. Tsirkin
On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
 @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work)
* It is now safe to release the object's resources
*/
   eventfd_ctx_put(irqfd-eventfd);
 +
 + _irq_source_put(irqfd-source);
 +
   kfree(irqfd);
  }
  

Hang on, this is a bug I think. This is done asynchronously.  So this
means that I can assign MAX number of irqfds, then close one, and now
assign will fail because deassign did not get freed.

Maybe we can solve this by flushing wq before assign?
Looks a bit fragile but may be enough - need to document well.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-17 Thread Alex Williamson
On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote:
 On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
  @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work)
   * It is now safe to release the object's resources
   */
  eventfd_ctx_put(irqfd-eventfd);
  +
  +   _irq_source_put(irqfd-source);
  +
  kfree(irqfd);
   }
   
 
 Hang on, this is a bug I think. This is done asynchronously.  So this
 means that I can assign MAX number of irqfds, then close one, and now
  What is this?
Do you mean max irq source ids?

 assign will fail because deassign did not get freed.
 
 Maybe we can solve this by flushing wq before assign?
 Looks a bit fragile but may be enough - need to document well.
 



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-17 Thread Michael S. Tsirkin
On Tue, Jul 17, 2012 at 03:57:41PM -0600, Alex Williamson wrote:
 On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote:
  On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
   @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work)
  * It is now safe to release the object's resources
  */
 eventfd_ctx_put(irqfd-eventfd);
   +
   + _irq_source_put(irqfd-source);
   +
 kfree(irqfd);
}

  
  Hang on, this is a bug I think. This is done asynchronously.  So this
  means that I can assign MAX number of irqfds, then close one, and now
   What is this?
 Do you mean max irq source ids?

Yes, this is what I meant. Sorry about being unclear.

  assign will fail because deassign did not get freed.
  
  Maybe we can solve this by flushing wq before assign?
  Looks a bit fragile but may be enough - need to document well.
  
 
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-17 Thread Alex Williamson
On Wed, 2012-07-18 at 01:00 +0300, Michael S. Tsirkin wrote:
 On Tue, Jul 17, 2012 at 03:57:41PM -0600, Alex Williamson wrote:
  On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote:
   On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
@@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work)
 * It is now safe to release the object's resources
 */
eventfd_ctx_put(irqfd-eventfd);
+
+   _irq_source_put(irqfd-source);
+
kfree(irqfd);
 }
 
   
   Hang on, this is a bug I think. This is done asynchronously.  So this
   means that I can assign MAX number of irqfds, then close one, and now
    What is this?
  Do you mean max irq source ids?
 
 Yes, this is what I meant. Sorry about being unclear.
 
   assign will fail because deassign did not get freed.
   
   Maybe we can solve this by flushing wq before assign?
   Looks a bit fragile but may be enough - need to document well.

Why is this fragile?  We could even make it part of a retry so we don't
call it unless we need to.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 1/4] kvm: Extend irqfd to support level interrupts

2012-07-17 Thread Michael S. Tsirkin
On Tue, Jul 17, 2012 at 04:16:04PM -0600, Alex Williamson wrote:
 On Wed, 2012-07-18 at 01:00 +0300, Michael S. Tsirkin wrote:
  On Tue, Jul 17, 2012 at 03:57:41PM -0600, Alex Williamson wrote:
   On Wed, 2012-07-18 at 00:26 +0300, Michael S. Tsirkin wrote:
On Mon, Jul 16, 2012 at 02:33:47PM -0600, Alex Williamson wrote:
 @@ -96,6 +183,9 @@ irqfd_shutdown(struct work_struct *work)
* It is now safe to release the object's resources
*/
   eventfd_ctx_put(irqfd-eventfd);
 +
 + _irq_source_put(irqfd-source);
 +
   kfree(irqfd);
  }
  

Hang on, this is a bug I think. This is done asynchronously.  So this
means that I can assign MAX number of irqfds, then close one, and now
 What is this?
   Do you mean max irq source ids?
  
  Yes, this is what I meant. Sorry about being unclear.
  
assign will fail because deassign did not get freed.

Maybe we can solve this by flushing wq before assign?
Looks a bit fragile but may be enough - need to document well.
 
 Why is this fragile?  We could even make it part of a retry so we don't
 call it unless we need to.
 

It just ties in assign and deassign. Maybe it's ok - but pls
add a comment explaining the whole design.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/