Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-09-01 Thread Michael S. Tsirkin
On Tue, Sep 01, 2015 at 12:47:36PM +0800, Jason Wang wrote:
> 
> 
> On 09/01/2015 12:31 PM, Michael S. Tsirkin wrote:
> > On Tue, Sep 01, 2015 at 11:33:43AM +0800, Jason Wang wrote:
> >>
> >> On 08/31/2015 07:33 PM, Michael S. Tsirkin wrote:
> >>> On Mon, Aug 31, 2015 at 04:03:59PM +0800, Jason Wang wrote:
> >
> > On 08/31/2015 03:29 PM, Michael S. Tsirkin wrote:
> >>> Thinking more about this, invoking the 0-length write after
> >>> the != 0 length one would be better: it would mean we 
> >>> only
> >>> handle the userspace MMIO like this.
> >>> Right.
> >>>
> >>> Using current unittest. This patch is about 2.9% slower than 
> >>> before, and
> >>> invoking 0-length write after is still 1.1% slower 
> >>> (mmio-datamatch-eventfd).
> >>>
> >>> /patch/result/-+%/
> >>> /base/2957/0/
> >>> /V3/3043/+2.9%/
> >>> /V3+invoking != 0 length first/2990/+1.1%/
> >>>
> >>> So looks like the best method is not searching KVM_FAST_MMIO_BUS 
> >>> during
> >>> KVM_MMIO_BUS. Instead, let userspace to register both datamatch 
> >>> and
> >>> wildcard in this case. Does this sound good to you?
> >>> No - we can't change userspace.
> > Actually, the change was as simple as following. So I don't get the
> > reason why.
> >>> Because it's too late - we committed to a specific userspace ABI
> >>> when this was merged in kernel, we must maintain it.
> >> Ok ( Though I don't think it has real users for this now because it was
> >> actually broken).
> > It actually worked most of the time - you only trigger a use after free
> > on deregister.
> >
> 
> It doesn't work for amd and intel machine without ept.

I thought it does :(

> >>> Even if I thought yours is a good API (and I don't BTW - it's exposing
> >>> internal implementation details) it's too late to change it.
> >> I believe we should document the special treatment in kernel of zero
> >> length mmio eventfd in api.txt? If yes, is this an exposing? If not, how
> >> can userspace know the advantages of this and use it? For better API,
> >> probably we need another new flag just for fast mmio and obsolete
> >> current one by failing the assigning for zero length mmio eventfd.
> > I sent a patch to update api.txt already as part of
> > kvm: add KVM_CAP_IOEVENTFD_PF capability.
> > I should probably split it out.
> >
> > Sorry, I don't think the api change you propose makes sense - just fix the
> > crash in the existing one.
> >
> 
> Ok, so I believe the fix should go:
> 
> - having two ioeventfds when we want to assign zero length mmio eventfd

You mean the in-kernel data structures?

> - change the kvm_io_bus_sort_cmp() and can handle zero length correctly

This one's for amd/non ept, right? I'd rather we implemented the
fast mmio optimization for these.

> What's your thought?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-09-01 Thread Michael S. Tsirkin
On Tue, Sep 01, 2015 at 04:22:22PM +0800, Jason Wang wrote:
> 
> 
> On 09/01/2015 02:54 PM, Michael S. Tsirkin wrote:
> > On Tue, Sep 01, 2015 at 12:47:36PM +0800, Jason Wang wrote:
> >>
> >> On 09/01/2015 12:31 PM, Michael S. Tsirkin wrote:
> >>> On Tue, Sep 01, 2015 at 11:33:43AM +0800, Jason Wang wrote:
>  On 08/31/2015 07:33 PM, Michael S. Tsirkin wrote:
> > On Mon, Aug 31, 2015 at 04:03:59PM +0800, Jason Wang wrote:
> >>> On 08/31/2015 03:29 PM, Michael S. Tsirkin wrote:
> > Thinking more about this, invoking the 0-length write after
> > the != 0 length one would be better: it would mean 
> > we only
> > handle the userspace MMIO like this.
> > Right.
> >
> > Using current unittest. This patch is about 2.9% slower than 
> > before, and
> > invoking 0-length write after is still 1.1% slower 
> > (mmio-datamatch-eventfd).
> >
> > /patch/result/-+%/
> > /base/2957/0/
> > /V3/3043/+2.9%/
> > /V3+invoking != 0 length first/2990/+1.1%/
> >
> > So looks like the best method is not searching 
> > KVM_FAST_MMIO_BUS during
> > KVM_MMIO_BUS. Instead, let userspace to register both datamatch 
> > and
> > wildcard in this case. Does this sound good to you?
> > No - we can't change userspace.
> >>> Actually, the change was as simple as following. So I don't get the
> >>> reason why.
> > Because it's too late - we committed to a specific userspace ABI
> > when this was merged in kernel, we must maintain it.
>  Ok ( Though I don't think it has real users for this now because it was
>  actually broken).
> >>> It actually worked most of the time - you only trigger a use after free
> >>> on deregister.
> >>>
> >> It doesn't work for amd and intel machine without ept.
> > I thought it does :(
> >
> > Even if I thought yours is a good API (and I don't BTW - it's exposing
> > internal implementation details) it's too late to change it.
>  I believe we should document the special treatment in kernel of zero
>  length mmio eventfd in api.txt? If yes, is this an exposing? If not, how
>  can userspace know the advantages of this and use it? For better API,
>  probably we need another new flag just for fast mmio and obsolete
>  current one by failing the assigning for zero length mmio eventfd.
> >>> I sent a patch to update api.txt already as part of
> >>> kvm: add KVM_CAP_IOEVENTFD_PF capability.
> >>> I should probably split it out.
> >>>
> >>> Sorry, I don't think the api change you propose makes sense - just fix the
> >>> crash in the existing one.
> >>>
> >> Ok, so I believe the fix should go:
> >>
> >> - having two ioeventfds when we want to assign zero length mmio eventfd
> > You mean the in-kernel data structures?
> 
> Yes.
> 
> >
> >> - change the kvm_io_bus_sort_cmp() and can handle zero length correctly
> > This one's for amd/non ept, right? I'd rather we implemented the
> > fast mmio optimization for these.
> 
> Agree, but we'd better fix it and backport it to stable first?

I would say fix it upstream first. Worry about stable later.  And I
don't see a lot of value in adding a temporary hack - it's not too much
work to just do the optimal thing directly.

But I won't nack a temporary solution if you insist.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-09-01 Thread Jason Wang


On 09/01/2015 02:54 PM, Michael S. Tsirkin wrote:
> On Tue, Sep 01, 2015 at 12:47:36PM +0800, Jason Wang wrote:
>>
>> On 09/01/2015 12:31 PM, Michael S. Tsirkin wrote:
>>> On Tue, Sep 01, 2015 at 11:33:43AM +0800, Jason Wang wrote:
 On 08/31/2015 07:33 PM, Michael S. Tsirkin wrote:
> On Mon, Aug 31, 2015 at 04:03:59PM +0800, Jason Wang wrote:
>>> On 08/31/2015 03:29 PM, Michael S. Tsirkin wrote:
> Thinking more about this, invoking the 0-length write after
> the != 0 length one would be better: it would mean we 
> only
> handle the userspace MMIO like this.
> Right.
>
> Using current unittest. This patch is about 2.9% slower than 
> before, and
> invoking 0-length write after is still 1.1% slower 
> (mmio-datamatch-eventfd).
>
> /patch/result/-+%/
> /base/2957/0/
> /V3/3043/+2.9%/
> /V3+invoking != 0 length first/2990/+1.1%/
>
> So looks like the best method is not searching KVM_FAST_MMIO_BUS 
> during
> KVM_MMIO_BUS. Instead, let userspace to register both datamatch 
> and
> wildcard in this case. Does this sound good to you?
> No - we can't change userspace.
>>> Actually, the change was as simple as following. So I don't get the
>>> reason why.
> Because it's too late - we committed to a specific userspace ABI
> when this was merged in kernel, we must maintain it.
 Ok ( Though I don't think it has real users for this now because it was
 actually broken).
>>> It actually worked most of the time - you only trigger a use after free
>>> on deregister.
>>>
>> It doesn't work for amd and intel machine without ept.
> I thought it does :(
>
> Even if I thought yours is a good API (and I don't BTW - it's exposing
> internal implementation details) it's too late to change it.
 I believe we should document the special treatment in kernel of zero
 length mmio eventfd in api.txt? If yes, is this an exposing? If not, how
 can userspace know the advantages of this and use it? For better API,
 probably we need another new flag just for fast mmio and obsolete
 current one by failing the assigning for zero length mmio eventfd.
>>> I sent a patch to update api.txt already as part of
>>> kvm: add KVM_CAP_IOEVENTFD_PF capability.
>>> I should probably split it out.
>>>
>>> Sorry, I don't think the api change you propose makes sense - just fix the
>>> crash in the existing one.
>>>
>> Ok, so I believe the fix should go:
>>
>> - having two ioeventfds when we want to assign zero length mmio eventfd
> You mean the in-kernel data structures?

Yes.

>
>> - change the kvm_io_bus_sort_cmp() and can handle zero length correctly
> This one's for amd/non ept, right? I'd rather we implemented the
> fast mmio optimization for these.

Agree, but we'd better fix it and backport it to stable first?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-31 Thread Michael S. Tsirkin
On Mon, Aug 31, 2015 at 11:12:07AM +0800, Jason Wang wrote:
> 
> 
> On 08/26/2015 01:10 PM, Jason Wang wrote:
> > On 08/25/2015 07:51 PM, Michael S. Tsirkin wrote:
> >> > On Tue, Aug 25, 2015 at 05:05:47PM +0800, Jason Wang wrote:
>  >> > We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
>  >> > and another is KVM_FAST_MMIO_BUS. This leads to issue:
>  >> > 
>  >> > - kvm_io_bus_destroy() knows nothing about the devices on two buses
>  >> >   points to a single dev. Which will lead double free [1] during 
>  >> > exit.
>  >> > - wildcard eventfd ignores data len, so it was registered as a
>  >> >   kvm_io_range with zero length. This will fail the binary search in
>  >> >   kvm_io_bus_get_first_dev() when we try to emulate through
>  >> >   KVM_MMIO_BUS. This will cause userspace io emulation request 
>  >> > instead
>  >> >   of a eventfd notification (virtqueue kick will be trapped by qemu
>  >> >   instead of vhost in this case).
>  >> > 
>  >> > Fixing this by don't register wildcard mmio eventfd on two
>  >> > buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes 
>  >> > the
>  >> > double free issue of kvm_io_bus_destroy(). For the arch/setups that
>  >> > does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, 
>  >> > try
>  >> > KVM_FAST_MMIO_BUS first to see it it has a match.
>  >> > 
>  >> > [1] Panic caused by double free:
>  >> > 
>  >> > CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 
>  >> > 3.19.0-26-generic #28-Ubuntu
>  >> > Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 
>  >> > 09/12/2013
>  >> > task: 88009ae0c4b0 ti: 88020e7f task.ti: 
>  >> > 88020e7f
>  >> > RIP: 0010:[]  [] 
>  >> > ioeventfd_release+0x28/0x60 [kvm]
>  >> > RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
>  >> > RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
>  >> > RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
>  >> > RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
>  >> > R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
>  >> > R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
>  >> > FS:  7fc1ee3e6700() GS:88023e24() 
>  >> > knlGS:
>  >> > CS:  0010 DS:  ES:  CR0: 80050033
>  >> > CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
>  >> > Stack:
>  >> > 88021e7cc000  88020e7f3be8 c07e2622
>  >> > 88020e7f3c38 c07df69a 880232524160 88020e792d80
>  >> >   880219b78c00 0008 8802321686a8
>  >> > Call Trace:
>  >> > [] ioeventfd_destructor+0x12/0x20 [kvm]
>  >> > [] kvm_put_kvm+0xca/0x210 [kvm]
>  >> > [] kvm_vcpu_release+0x18/0x20 [kvm]
>  >> > [] __fput+0xe7/0x250
>  >> > [] fput+0xe/0x10
>  >> > [] task_work_run+0xd4/0xf0
>  >> > [] do_exit+0x368/0xa50
>  >> > [] ? recalc_sigpending+0x1f/0x60
>  >> > [] do_group_exit+0x45/0xb0
>  >> > [] get_signal+0x291/0x750
>  >> > [] do_signal+0x28/0xab0
>  >> > [] ? do_futex+0xdb/0x5d0
>  >> > [] ? __wake_up_locked_key+0x18/0x20
>  >> > [] ? SyS_futex+0x76/0x170
>  >> > [] do_notify_resume+0x69/0xb0
>  >> > [] int_signal+0x12/0x17
>  >> > Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 
>  >> > 48 8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 
>  >> > 08 <48> 89 10 48 b8 00 01 10 00 00
>  >> > RIP  [] ioeventfd_release+0x28/0x60 [kvm]
>  >> > RSP 
>  >> > 
>  >> > Cc: Gleb Natapov 
>  >> > Cc: Paolo Bonzini 
>  >> > Cc: Michael S. Tsirkin 
>  >> > Signed-off-by: Jason Wang 
>  >> > ---
>  >> > Changes from V2:
>  >> > - Tweak styles and comment suggested by Cornelia.
>  >> > Changes from v1:
>  >> > - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
>  >> >   needed to save lots of unnecessary changes.
>  >> > ---
>  >> >  virt/kvm/eventfd.c  | 31 +--
>  >> >  virt/kvm/kvm_main.c | 16 ++--
>  >> >  2 files changed, 23 insertions(+), 24 deletions(-)
>  >> > 
>  >> > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
>  >> > index 9ff4193..c3ffdc3 100644
>  >> > --- a/virt/kvm/eventfd.c
>  >> > +++ b/virt/kvm/eventfd.c
>  >> > @@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, 
>  >> > struct _ioeventfd *p)
>  >> > return false;
>  >> >  }
>  >> >  
>  >> > -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
>  >> > +static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd 
>  

Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-31 Thread Jason Wang


On 08/31/2015 07:33 PM, Michael S. Tsirkin wrote:
> On Mon, Aug 31, 2015 at 04:03:59PM +0800, Jason Wang wrote:
>> > 
>> > 
>> > On 08/31/2015 03:29 PM, Michael S. Tsirkin wrote:
>>> > > Thinking more about this, invoking the 0-length write after
>>> > > > >> > the != 0 length one would be better: it would mean 
>>> > > > >> > we only
>>> > > > >> > handle the userspace MMIO like this.
>>> > >>> > > Right.
>>> > >>> > >
> > >> > 
> > >> > Using current unittest. This patch is about 2.9% slower than 
> > >> > before, and
> > >> > invoking 0-length write after is still 1.1% slower 
> > >> > (mmio-datamatch-eventfd).
> > >> > 
> > >> > /patch/result/-+%/
> > >> > /base/2957/0/
> > >> > /V3/3043/+2.9%/
> > >> > /V3+invoking != 0 length first/2990/+1.1%/
> > >> > 
> > >> > So looks like the best method is not searching KVM_FAST_MMIO_BUS 
> > >> > during
> > >> > KVM_MMIO_BUS. Instead, let userspace to register both datamatch and
> > >> > wildcard in this case. Does this sound good to you?
>>> > > No - we can't change userspace.
>> > 
>> > Actually, the change was as simple as following. So I don't get the
>> > reason why.
> Because it's too late - we committed to a specific userspace ABI
> when this was merged in kernel, we must maintain it.

Ok ( Though I don't think it has real users for this now because it was
actually broken).

> Even if I thought yours is a good API (and I don't BTW - it's exposing
> internal implementation details) it's too late to change it.

I believe we should document the special treatment in kernel of zero
length mmio eventfd in api.txt? If yes, is this an exposing? If not, how
can userspace know the advantages of this and use it? For better API,
probably we need another new flag just for fast mmio and obsolete
current one by failing the assigning for zero length mmio eventfd.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-31 Thread Jason Wang


On 08/31/2015 03:29 PM, Michael S. Tsirkin wrote:
> Thinking more about this, invoking the 0-length write after
> > >> > the != 0 length one would be better: it would mean we only
> > >> > handle the userspace MMIO like this.
>>> > > Right.
>>> > >
>> > 
>> > Using current unittest. This patch is about 2.9% slower than before, and
>> > invoking 0-length write after is still 1.1% slower 
>> > (mmio-datamatch-eventfd).
>> > 
>> > /patch/result/-+%/
>> > /base/2957/0/
>> > /V3/3043/+2.9%/
>> > /V3+invoking != 0 length first/2990/+1.1%/
>> > 
>> > So looks like the best method is not searching KVM_FAST_MMIO_BUS during
>> > KVM_MMIO_BUS. Instead, let userspace to register both datamatch and
>> > wildcard in this case. Does this sound good to you?
> No - we can't change userspace.

Actually, the change was as simple as following. So I don't get the
reason why.

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 9935029..42ee986 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -288,6 +288,8 @@ static int
virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy,
 if (modern) {
 memory_region_add_eventfd(modern_mr, modern_addr, 2,
   true, n, notifier);
+memory_region_add_eventfd(modern_mr, modern_addr, 0,
+  false, n, notifier);
 }
 if (legacy) {
 memory_region_add_eventfd(legacy_mr, legacy_addr, 2,
@@ -297,6 +299,8 @@ static int
virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy,
 if (modern) {
 memory_region_del_eventfd(modern_mr, modern_addr, 2,
   true, n, notifier);
+memory_region_del_eventfd(modern_mr, modern_addr, 0,
+  false, n, notifier);
 }
 if (legacy) {
 memory_region_del_eventfd(legacy_mr, legacy_addr, 2,

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-31 Thread Michael S. Tsirkin
On Mon, Aug 31, 2015 at 04:03:59PM +0800, Jason Wang wrote:
> 
> 
> On 08/31/2015 03:29 PM, Michael S. Tsirkin wrote:
> > Thinking more about this, invoking the 0-length write after
> > > >> > the != 0 length one would be better: it would mean we only
> > > >> > handle the userspace MMIO like this.
> >>> > > Right.
> >>> > >
> >> > 
> >> > Using current unittest. This patch is about 2.9% slower than before, and
> >> > invoking 0-length write after is still 1.1% slower 
> >> > (mmio-datamatch-eventfd).
> >> > 
> >> > /patch/result/-+%/
> >> > /base/2957/0/
> >> > /V3/3043/+2.9%/
> >> > /V3+invoking != 0 length first/2990/+1.1%/
> >> > 
> >> > So looks like the best method is not searching KVM_FAST_MMIO_BUS during
> >> > KVM_MMIO_BUS. Instead, let userspace to register both datamatch and
> >> > wildcard in this case. Does this sound good to you?
> > No - we can't change userspace.
> 
> Actually, the change was as simple as following. So I don't get the
> reason why.

Because it's too late - we committed to a specific userspace ABI
when this was merged in kernel, we must maintain it.
Even if I thought yours is a good API (and I don't BTW - it's exposing
internal implementation details) it's too late to change it.

> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 9935029..42ee986 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -288,6 +288,8 @@ static int
> virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy,
>  if (modern) {
>  memory_region_add_eventfd(modern_mr, modern_addr, 2,
>true, n, notifier);
> +memory_region_add_eventfd(modern_mr, modern_addr, 0,
> +  false, n, notifier);
>  }
>  if (legacy) {
>  memory_region_add_eventfd(legacy_mr, legacy_addr, 2,
> @@ -297,6 +299,8 @@ static int
> virtio_pci_set_host_notifier_internal(VirtIOPCIProxy *proxy,
>  if (modern) {
>  memory_region_del_eventfd(modern_mr, modern_addr, 2,
>true, n, notifier);
> +memory_region_del_eventfd(modern_mr, modern_addr, 0,
> +  false, n, notifier);
>  }
>  if (legacy) {
>  memory_region_del_eventfd(legacy_mr, legacy_addr, 2,
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-31 Thread Michael S. Tsirkin
On Tue, Sep 01, 2015 at 11:33:43AM +0800, Jason Wang wrote:
> 
> 
> On 08/31/2015 07:33 PM, Michael S. Tsirkin wrote:
> > On Mon, Aug 31, 2015 at 04:03:59PM +0800, Jason Wang wrote:
> >> > 
> >> > 
> >> > On 08/31/2015 03:29 PM, Michael S. Tsirkin wrote:
> >>> > > Thinking more about this, invoking the 0-length write after
> >>> > > > >> > the != 0 length one would be better: it would mean 
> >>> > > > >> > we only
> >>> > > > >> > handle the userspace MMIO like this.
> >>> > >>> > > Right.
> >>> > >>> > >
> > > >> > 
> > > >> > Using current unittest. This patch is about 2.9% slower than 
> > > >> > before, and
> > > >> > invoking 0-length write after is still 1.1% slower 
> > > >> > (mmio-datamatch-eventfd).
> > > >> > 
> > > >> > /patch/result/-+%/
> > > >> > /base/2957/0/
> > > >> > /V3/3043/+2.9%/
> > > >> > /V3+invoking != 0 length first/2990/+1.1%/
> > > >> > 
> > > >> > So looks like the best method is not searching KVM_FAST_MMIO_BUS 
> > > >> > during
> > > >> > KVM_MMIO_BUS. Instead, let userspace to register both datamatch 
> > > >> > and
> > > >> > wildcard in this case. Does this sound good to you?
> >>> > > No - we can't change userspace.
> >> > 
> >> > Actually, the change was as simple as following. So I don't get the
> >> > reason why.
> > Because it's too late - we committed to a specific userspace ABI
> > when this was merged in kernel, we must maintain it.
> 
> Ok ( Though I don't think it has real users for this now because it was
> actually broken).

It actually worked most of the time - you only trigger a use after free
on deregister.

> > Even if I thought yours is a good API (and I don't BTW - it's exposing
> > internal implementation details) it's too late to change it.
> 
> I believe we should document the special treatment in kernel of zero
> length mmio eventfd in api.txt? If yes, is this an exposing? If not, how
> can userspace know the advantages of this and use it? For better API,
> probably we need another new flag just for fast mmio and obsolete
> current one by failing the assigning for zero length mmio eventfd.

I sent a patch to update api.txt already as part of
kvm: add KVM_CAP_IOEVENTFD_PF capability.
I should probably split it out.

Sorry, I don't think the api change you propose makes sense - just fix the
crash in the existing one.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-31 Thread Jason Wang


On 09/01/2015 12:31 PM, Michael S. Tsirkin wrote:
> On Tue, Sep 01, 2015 at 11:33:43AM +0800, Jason Wang wrote:
>>
>> On 08/31/2015 07:33 PM, Michael S. Tsirkin wrote:
>>> On Mon, Aug 31, 2015 at 04:03:59PM +0800, Jason Wang wrote:
>
> On 08/31/2015 03:29 PM, Michael S. Tsirkin wrote:
>>> Thinking more about this, invoking the 0-length write after
>>> the != 0 length one would be better: it would mean we 
>>> only
>>> handle the userspace MMIO like this.
>>> Right.
>>>
>>> Using current unittest. This patch is about 2.9% slower than 
>>> before, and
>>> invoking 0-length write after is still 1.1% slower 
>>> (mmio-datamatch-eventfd).
>>>
>>> /patch/result/-+%/
>>> /base/2957/0/
>>> /V3/3043/+2.9%/
>>> /V3+invoking != 0 length first/2990/+1.1%/
>>>
>>> So looks like the best method is not searching KVM_FAST_MMIO_BUS 
>>> during
>>> KVM_MMIO_BUS. Instead, let userspace to register both datamatch and
>>> wildcard in this case. Does this sound good to you?
>>> No - we can't change userspace.
> Actually, the change was as simple as following. So I don't get the
> reason why.
>>> Because it's too late - we committed to a specific userspace ABI
>>> when this was merged in kernel, we must maintain it.
>> Ok ( Though I don't think it has real users for this now because it was
>> actually broken).
> It actually worked most of the time - you only trigger a use after free
> on deregister.
>

It doesn't work for amd and intel machine without ept.

>>> Even if I thought yours is a good API (and I don't BTW - it's exposing
>>> internal implementation details) it's too late to change it.
>> I believe we should document the special treatment in kernel of zero
>> length mmio eventfd in api.txt? If yes, is this an exposing? If not, how
>> can userspace know the advantages of this and use it? For better API,
>> probably we need another new flag just for fast mmio and obsolete
>> current one by failing the assigning for zero length mmio eventfd.
> I sent a patch to update api.txt already as part of
> kvm: add KVM_CAP_IOEVENTFD_PF capability.
> I should probably split it out.
>
> Sorry, I don't think the api change you propose makes sense - just fix the
> crash in the existing one.
>

Ok, so I believe the fix should go:

- having two ioeventfds when we want to assign zero length mmio eventfd
- change the kvm_io_bus_sort_cmp() and can handle zero length correctly

What's your thought?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-30 Thread Jason Wang


On 08/26/2015 01:10 PM, Jason Wang wrote:
 On 08/25/2015 07:51 PM, Michael S. Tsirkin wrote:
  On Tue, Aug 25, 2015 at 05:05:47PM +0800, Jason Wang wrote:
   We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
   and another is KVM_FAST_MMIO_BUS. This leads to issue:
   
   - kvm_io_bus_destroy() knows nothing about the devices on two buses
 points to a single dev. Which will lead double free [1] during exit.
   - wildcard eventfd ignores data len, so it was registered as a
 kvm_io_range with zero length. This will fail the binary search in
 kvm_io_bus_get_first_dev() when we try to emulate through
 KVM_MMIO_BUS. This will cause userspace io emulation request instead
 of a eventfd notification (virtqueue kick will be trapped by qemu
 instead of vhost in this case).
   
   Fixing this by don't register wildcard mmio eventfd on two
   buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
   double free issue of kvm_io_bus_destroy(). For the arch/setups that
   does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
   KVM_FAST_MMIO_BUS first to see it it has a match.
   
   [1] Panic caused by double free:
   
   CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic 
   #28-Ubuntu
   Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 
   09/12/2013
   task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
   RIP: 0010:[c07e25d8]  [c07e25d8] 
   ioeventfd_release+0x28/0x60 [kvm]
   RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
   RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
   RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
   RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
   R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
   R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
   FS:  7fc1ee3e6700() GS:88023e24() 
   knlGS:
   CS:  0010 DS:  ES:  CR0: 80050033
   CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
   Stack:
   88021e7cc000  88020e7f3be8 c07e2622
   88020e7f3c38 c07df69a 880232524160 88020e792d80
 880219b78c00 0008 8802321686a8
   Call Trace:
   [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
   [c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
   [c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
   [811f69f7] __fput+0xe7/0x250
   [811f6bae] fput+0xe/0x10
   [81093f04] task_work_run+0xd4/0xf0
   [81079358] do_exit+0x368/0xa50
   [81082c8f] ? recalc_sigpending+0x1f/0x60
   [81079ad5] do_group_exit+0x45/0xb0
   [81085c71] get_signal+0x291/0x750
   [810144d8] do_signal+0x28/0xab0
   [810f3a3b] ? do_futex+0xdb/0x5d0
   [810b7028] ? __wake_up_locked_key+0x18/0x20
   [810f3fa6] ? SyS_futex+0x76/0x170
   [81014fc9] do_notify_resume+0x69/0xb0
   [817cb9af] int_signal+0x12/0x17
   Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 
   8b 7f 20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 
   48 89 10 48 b8 00 01 10 00 00
   RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
   RSP 88020e7f3bc8
   
   Cc: Gleb Natapov g...@kernel.org
   Cc: Paolo Bonzini pbonz...@redhat.com
   Cc: Michael S. Tsirkin m...@redhat.com
   Signed-off-by: Jason Wang jasow...@redhat.com
   ---
   Changes from V2:
   - Tweak styles and comment suggested by Cornelia.
   Changes from v1:
   - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
 needed to save lots of unnecessary changes.
   ---
virt/kvm/eventfd.c  | 31 +--
virt/kvm/kvm_main.c | 16 ++--
2 files changed, 23 insertions(+), 24 deletions(-)
   
   diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
   index 9ff4193..c3ffdc3 100644
   --- a/virt/kvm/eventfd.c
   +++ b/virt/kvm/eventfd.c
   @@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, 
   struct _ioeventfd *p)
 return false;
}

   -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
   +static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd 
   *args)
{
   - if (flags  KVM_IOEVENTFD_FLAG_PIO)
   + if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
 return KVM_PIO_BUS;
   - if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
   + if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
 return KVM_VIRTIO_CCW_NOTIFY_BUS;
   - return KVM_MMIO_BUS;
   + /* When length is ignored, MMIO is put on a separate bus, for
   +  * faster lookups.
   +  */
   + return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;
}

static int
   @@ -779,7 +782,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct 
   kvm_ioeventfd *args)
 

Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Jason Wang


On 08/25/2015 07:51 PM, Michael S. Tsirkin wrote:
 On Tue, Aug 25, 2015 at 05:05:47PM +0800, Jason Wang wrote:
  We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
  and another is KVM_FAST_MMIO_BUS. This leads to issue:
  
  - kvm_io_bus_destroy() knows nothing about the devices on two buses
points to a single dev. Which will lead double free [1] during exit.
  - wildcard eventfd ignores data len, so it was registered as a
kvm_io_range with zero length. This will fail the binary search in
kvm_io_bus_get_first_dev() when we try to emulate through
KVM_MMIO_BUS. This will cause userspace io emulation request instead
of a eventfd notification (virtqueue kick will be trapped by qemu
instead of vhost in this case).
  
  Fixing this by don't register wildcard mmio eventfd on two
  buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
  double free issue of kvm_io_bus_destroy(). For the arch/setups that
  does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
  KVM_FAST_MMIO_BUS first to see it it has a match.
  
  [1] Panic caused by double free:
  
  CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic 
  #28-Ubuntu
  Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
  task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
  RIP: 0010:[c07e25d8]  [c07e25d8] 
  ioeventfd_release+0x28/0x60 [kvm]
  RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
  RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
  RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
  RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
  R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
  R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
  FS:  7fc1ee3e6700() GS:88023e24() 
  knlGS:
  CS:  0010 DS:  ES:  CR0: 80050033
  CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
  Stack:
  88021e7cc000  88020e7f3be8 c07e2622
  88020e7f3c38 c07df69a 880232524160 88020e792d80
    880219b78c00 0008 8802321686a8
  Call Trace:
  [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
  [c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
  [c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
  [811f69f7] __fput+0xe7/0x250
  [811f6bae] fput+0xe/0x10
  [81093f04] task_work_run+0xd4/0xf0
  [81079358] do_exit+0x368/0xa50
  [81082c8f] ? recalc_sigpending+0x1f/0x60
  [81079ad5] do_group_exit+0x45/0xb0
  [81085c71] get_signal+0x291/0x750
  [810144d8] do_signal+0x28/0xab0
  [810f3a3b] ? do_futex+0xdb/0x5d0
  [810b7028] ? __wake_up_locked_key+0x18/0x20
  [810f3fa6] ? SyS_futex+0x76/0x170
  [81014fc9] do_notify_resume+0x69/0xb0
  [817cb9af] int_signal+0x12/0x17
  Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 
  20 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 
  b8 00 01 10 00 00
  RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
  RSP 88020e7f3bc8
  
  Cc: Gleb Natapov g...@kernel.org
  Cc: Paolo Bonzini pbonz...@redhat.com
  Cc: Michael S. Tsirkin m...@redhat.com
  Signed-off-by: Jason Wang jasow...@redhat.com
  ---
  Changes from V2:
  - Tweak styles and comment suggested by Cornelia.
  Changes from v1:
  - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
needed to save lots of unnecessary changes.
  ---
   virt/kvm/eventfd.c  | 31 +--
   virt/kvm/kvm_main.c | 16 ++--
   2 files changed, 23 insertions(+), 24 deletions(-)
  
  diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
  index 9ff4193..c3ffdc3 100644
  --- a/virt/kvm/eventfd.c
  +++ b/virt/kvm/eventfd.c
  @@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
  _ioeventfd *p)
 return false;
   }
   
  -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
  +static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd *args)
   {
  -  if (flags  KVM_IOEVENTFD_FLAG_PIO)
  +  if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
 return KVM_PIO_BUS;
  -  if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
  +  if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
 return KVM_VIRTIO_CCW_NOTIFY_BUS;
  -  return KVM_MMIO_BUS;
  +  /* When length is ignored, MMIO is put on a separate bus, for
  +   * faster lookups.
  +   */
  +  return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;
   }
   
   static int
  @@ -779,7 +782,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct 
  kvm_ioeventfd *args)
 struct eventfd_ctx   *eventfd;
 int   ret;
   
  -  bus_idx = ioeventfd_bus_from_flags(args-flags);
  +  bus_idx = ioeventfd_bus_from_args(args);
 /* must be 

[PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Jason Wang
We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
and another is KVM_FAST_MMIO_BUS. This leads to issue:

- kvm_io_bus_destroy() knows nothing about the devices on two buses
  points to a single dev. Which will lead double free [1] during exit.
- wildcard eventfd ignores data len, so it was registered as a
  kvm_io_range with zero length. This will fail the binary search in
  kvm_io_bus_get_first_dev() when we try to emulate through
  KVM_MMIO_BUS. This will cause userspace io emulation request instead
  of a eventfd notification (virtqueue kick will be trapped by qemu
  instead of vhost in this case).

Fixing this by don't register wildcard mmio eventfd on two
buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
double free issue of kvm_io_bus_destroy(). For the arch/setups that
does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
KVM_FAST_MMIO_BUS first to see it it has a match.

[1] Panic caused by double free:

CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic #28-Ubuntu
Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
RIP: 0010:[c07e25d8]  [c07e25d8] 
ioeventfd_release+0x28/0x60 [kvm]
RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
FS:  7fc1ee3e6700() GS:88023e24() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
Stack:
88021e7cc000  88020e7f3be8 c07e2622
88020e7f3c38 c07df69a 880232524160 88020e792d80
  880219b78c00 0008 8802321686a8
Call Trace:
[c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
[c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
[c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
[811f69f7] __fput+0xe7/0x250
[811f6bae] fput+0xe/0x10
[81093f04] task_work_run+0xd4/0xf0
[81079358] do_exit+0x368/0xa50
[81082c8f] ? recalc_sigpending+0x1f/0x60
[81079ad5] do_group_exit+0x45/0xb0
[81085c71] get_signal+0x291/0x750
[810144d8] do_signal+0x28/0xab0
[810f3a3b] ? do_futex+0xdb/0x5d0
[810b7028] ? __wake_up_locked_key+0x18/0x20
[810f3fa6] ? SyS_futex+0x76/0x170
[81014fc9] do_notify_resume+0x69/0xb0
[817cb9af] int_signal+0x12/0x17
Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 
e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 01 
10 00 00
RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
RSP 88020e7f3bc8

Cc: Gleb Natapov g...@kernel.org
Cc: Paolo Bonzini pbonz...@redhat.com
Cc: Michael S. Tsirkin m...@redhat.com
Signed-off-by: Jason Wang jasow...@redhat.com
---
Changes from V2:
- Tweak styles and comment suggested by Cornelia.
Changes from v1:
- change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
  needed to save lots of unnecessary changes.
---
 virt/kvm/eventfd.c  | 31 +--
 virt/kvm/kvm_main.c | 16 ++--
 2 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 9ff4193..c3ffdc3 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
_ioeventfd *p)
return false;
 }
 
-static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
+static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd *args)
 {
-   if (flags  KVM_IOEVENTFD_FLAG_PIO)
+   if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
return KVM_PIO_BUS;
-   if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
+   if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
return KVM_VIRTIO_CCW_NOTIFY_BUS;
-   return KVM_MMIO_BUS;
+   /* When length is ignored, MMIO is put on a separate bus, for
+* faster lookups.
+*/
+   return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;
 }
 
 static int
@@ -779,7 +782,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
*args)
struct eventfd_ctx   *eventfd;
int   ret;
 
-   bus_idx = ioeventfd_bus_from_flags(args-flags);
+   bus_idx = ioeventfd_bus_from_args(args);
/* must be natural-word sized, or 0 to ignore length */
switch (args-len) {
case 0:
@@ -843,16 +846,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd 
*args)
if (ret  0)
goto unlock_fail;
 
-   /* When length 

Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Cornelia Huck
On Tue, 25 Aug 2015 17:05:47 +0800
Jason Wang jasow...@redhat.com wrote:

 We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
 and another is KVM_FAST_MMIO_BUS. This leads to issue:
 
 - kvm_io_bus_destroy() knows nothing about the devices on two buses
   points to a single dev. Which will lead double free [1] during exit.
 - wildcard eventfd ignores data len, so it was registered as a
   kvm_io_range with zero length. This will fail the binary search in
   kvm_io_bus_get_first_dev() when we try to emulate through
   KVM_MMIO_BUS. This will cause userspace io emulation request instead
   of a eventfd notification (virtqueue kick will be trapped by qemu
   instead of vhost in this case).
 
 Fixing this by don't register wildcard mmio eventfd on two
 buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
 double free issue of kvm_io_bus_destroy(). For the arch/setups that
 does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
 KVM_FAST_MMIO_BUS first to see it it has a match.
 
 [1] Panic caused by double free:
 
 CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic 
 #28-Ubuntu
 Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
 RIP: 0010:[c07e25d8]  [c07e25d8] 
 ioeventfd_release+0x28/0x60 [kvm]
 RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
 RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
 R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
 FS:  7fc1ee3e6700() GS:88023e24() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
 Stack:
 88021e7cc000  88020e7f3be8 c07e2622
 88020e7f3c38 c07df69a 880232524160 88020e792d80
   880219b78c00 0008 8802321686a8
 Call Trace:
 [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
 [c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
 [c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
 [811f69f7] __fput+0xe7/0x250
 [811f6bae] fput+0xe/0x10
 [81093f04] task_work_run+0xd4/0xf0
 [81079358] do_exit+0x368/0xa50
 [81082c8f] ? recalc_sigpending+0x1f/0x60
 [81079ad5] do_group_exit+0x45/0xb0
 [81085c71] get_signal+0x291/0x750
 [810144d8] do_signal+0x28/0xab0
 [810f3a3b] ? do_futex+0xdb/0x5d0
 [810b7028] ? __wake_up_locked_key+0x18/0x20
 [810f3fa6] ? SyS_futex+0x76/0x170
 [81014fc9] do_notify_resume+0x69/0xb0
 [817cb9af] int_signal+0x12/0x17
 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 
 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 
 01 10 00 00
 RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
 RSP 88020e7f3bc8
 
 Cc: Gleb Natapov g...@kernel.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
 Changes from V2:
 - Tweak styles and comment suggested by Cornelia.
 Changes from v1:
 - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
   needed to save lots of unnecessary changes.
 ---
  virt/kvm/eventfd.c  | 31 +--
  virt/kvm/kvm_main.c | 16 ++--
  2 files changed, 23 insertions(+), 24 deletions(-)

Acked-by: Cornelia Huck cornelia.h...@de.ibm.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Michael S. Tsirkin
On Tue, Aug 25, 2015 at 05:05:47PM +0800, Jason Wang wrote:
 We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
 and another is KVM_FAST_MMIO_BUS. This leads to issue:
 
 - kvm_io_bus_destroy() knows nothing about the devices on two buses
   points to a single dev. Which will lead double free [1] during exit.
 - wildcard eventfd ignores data len, so it was registered as a
   kvm_io_range with zero length. This will fail the binary search in
   kvm_io_bus_get_first_dev() when we try to emulate through
   KVM_MMIO_BUS. This will cause userspace io emulation request instead
   of a eventfd notification (virtqueue kick will be trapped by qemu
   instead of vhost in this case).
 
 Fixing this by don't register wildcard mmio eventfd on two
 buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
 double free issue of kvm_io_bus_destroy(). For the arch/setups that
 does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
 KVM_FAST_MMIO_BUS first to see it it has a match.
 
 [1] Panic caused by double free:
 
 CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic 
 #28-Ubuntu
 Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
 RIP: 0010:[c07e25d8]  [c07e25d8] 
 ioeventfd_release+0x28/0x60 [kvm]
 RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
 RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
 R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
 FS:  7fc1ee3e6700() GS:88023e24() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
 Stack:
 88021e7cc000  88020e7f3be8 c07e2622
 88020e7f3c38 c07df69a 880232524160 88020e792d80
   880219b78c00 0008 8802321686a8
 Call Trace:
 [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
 [c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
 [c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
 [811f69f7] __fput+0xe7/0x250
 [811f6bae] fput+0xe/0x10
 [81093f04] task_work_run+0xd4/0xf0
 [81079358] do_exit+0x368/0xa50
 [81082c8f] ? recalc_sigpending+0x1f/0x60
 [81079ad5] do_group_exit+0x45/0xb0
 [81085c71] get_signal+0x291/0x750
 [810144d8] do_signal+0x28/0xab0
 [810f3a3b] ? do_futex+0xdb/0x5d0
 [810b7028] ? __wake_up_locked_key+0x18/0x20
 [810f3fa6] ? SyS_futex+0x76/0x170
 [81014fc9] do_notify_resume+0x69/0xb0
 [817cb9af] int_signal+0x12/0x17
 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 
 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 
 01 10 00 00
 RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
 RSP 88020e7f3bc8
 
 Cc: Gleb Natapov g...@kernel.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
 Changes from V2:
 - Tweak styles and comment suggested by Cornelia.
 Changes from v1:
 - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
   needed to save lots of unnecessary changes.
 ---
  virt/kvm/eventfd.c  | 31 +--
  virt/kvm/kvm_main.c | 16 ++--
  2 files changed, 23 insertions(+), 24 deletions(-)
 
 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index 9ff4193..c3ffdc3 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
 _ioeventfd *p)
   return false;
  }
  
 -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
 +static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd *args)
  {
 - if (flags  KVM_IOEVENTFD_FLAG_PIO)
 + if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
   return KVM_PIO_BUS;
 - if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
 + if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
   return KVM_VIRTIO_CCW_NOTIFY_BUS;
 - return KVM_MMIO_BUS;
 + /* When length is ignored, MMIO is put on a separate bus, for
 +  * faster lookups.
 +  */
 + return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;
  }
  
  static int
 @@ -779,7 +782,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct 
 kvm_ioeventfd *args)
   struct eventfd_ctx   *eventfd;
   int   ret;
  
 - bus_idx = ioeventfd_bus_from_flags(args-flags);
 + bus_idx = ioeventfd_bus_from_args(args);
   /* must be natural-word sized, or 0 to ignore length */
   switch (args-len) {
   case 0:
 @@ -843,16 +846,6 @@ 

Re: [PATCH V3 2/3] kvm: don't register wildcard MMIO EVENTFD on two buses

2015-08-25 Thread Michael S. Tsirkin
On Tue, Aug 25, 2015 at 05:05:47PM +0800, Jason Wang wrote:
 We register wildcard mmio eventfd on two buses, one for KVM_MMIO_BUS
 and another is KVM_FAST_MMIO_BUS. This leads to issue:
 
 - kvm_io_bus_destroy() knows nothing about the devices on two buses
   points to a single dev. Which will lead double free [1] during exit.
 - wildcard eventfd ignores data len, so it was registered as a
   kvm_io_range with zero length. This will fail the binary search in
   kvm_io_bus_get_first_dev() when we try to emulate through
   KVM_MMIO_BUS. This will cause userspace io emulation request instead
   of a eventfd notification (virtqueue kick will be trapped by qemu
   instead of vhost in this case).
 
 Fixing this by don't register wildcard mmio eventfd on two
 buses. Instead, only register it in KVM_FAST_MMIO_BUS. This fixes the
 double free issue of kvm_io_bus_destroy(). For the arch/setups that
 does not utilize KVM_FAST_MMIO_BUS, before searching KVM_MMIO_BUS, try
 KVM_FAST_MMIO_BUS first to see it it has a match.
 
 [1] Panic caused by double free:
 
 CPU: 1 PID: 2894 Comm: qemu-system-x86 Not tainted 3.19.0-26-generic 
 #28-Ubuntu
 Hardware name: LENOVO 2356BG6/2356BG6, BIOS G7ET96WW (2.56 ) 09/12/2013
 task: 88009ae0c4b0 ti: 88020e7f task.ti: 88020e7f
 RIP: 0010:[c07e25d8]  [c07e25d8] 
 ioeventfd_release+0x28/0x60 [kvm]
 RSP: 0018:88020e7f3bc8  EFLAGS: 00010292
 RAX: dead00200200 RBX: 8801ec19c900 RCX: 00018200016d
 RDX: 8801ec19cf80 RSI: ea0008bf1d40 RDI: 8801ec19c900
 RBP: 88020e7f3bd8 R08: 2fc75a01 R09: 00018200016d
 R10: c07df6ae R11: 88022fc75a98 R12: 88021e7cc000
 R13: 88021e7cca48 R14: 88021e7cca50 R15: 8801ec19c880
 FS:  7fc1ee3e6700() GS:88023e24() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 7f8f389d8000 CR3: 00023dc13000 CR4: 001427e0
 Stack:
 88021e7cc000  88020e7f3be8 c07e2622
 88020e7f3c38 c07df69a 880232524160 88020e792d80
   880219b78c00 0008 8802321686a8
 Call Trace:
 [c07e2622] ioeventfd_destructor+0x12/0x20 [kvm]
 [c07df69a] kvm_put_kvm+0xca/0x210 [kvm]
 [c07df818] kvm_vcpu_release+0x18/0x20 [kvm]
 [811f69f7] __fput+0xe7/0x250
 [811f6bae] fput+0xe/0x10
 [81093f04] task_work_run+0xd4/0xf0
 [81079358] do_exit+0x368/0xa50
 [81082c8f] ? recalc_sigpending+0x1f/0x60
 [81079ad5] do_group_exit+0x45/0xb0
 [81085c71] get_signal+0x291/0x750
 [810144d8] do_signal+0x28/0xab0
 [810f3a3b] ? do_futex+0xdb/0x5d0
 [810b7028] ? __wake_up_locked_key+0x18/0x20
 [810f3fa6] ? SyS_futex+0x76/0x170
 [81014fc9] do_notify_resume+0x69/0xb0
 [817cb9af] int_signal+0x12/0x17
 Code: 5d c3 90 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 7f 20 
 e8 06 d6 a5 c0 48 8b 43 08 48 8b 13 48 89 df 48 89 42 08 48 89 10 48 b8 00 
 01 10 00 00
 RIP  [c07e25d8] ioeventfd_release+0x28/0x60 [kvm]
 RSP 88020e7f3bc8
 
 Cc: Gleb Natapov g...@kernel.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Signed-off-by: Jason Wang jasow...@redhat.com

Saw v3 too late. Pls see my comments on v2.

 ---
 Changes from V2:
 - Tweak styles and comment suggested by Cornelia.
 Changes from v1:
 - change ioeventfd_bus_from_flags() to return KVM_FAST_MMIO_BUS when
   needed to save lots of unnecessary changes.
 ---
  virt/kvm/eventfd.c  | 31 +--
  virt/kvm/kvm_main.c | 16 ++--
  2 files changed, 23 insertions(+), 24 deletions(-)
 
 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index 9ff4193..c3ffdc3 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -762,13 +762,16 @@ ioeventfd_check_collision(struct kvm *kvm, struct 
 _ioeventfd *p)
   return false;
  }
  
 -static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
 +static enum kvm_bus ioeventfd_bus_from_args(struct kvm_ioeventfd *args)
  {
 - if (flags  KVM_IOEVENTFD_FLAG_PIO)
 + if (args-flags  KVM_IOEVENTFD_FLAG_PIO)
   return KVM_PIO_BUS;
 - if (flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
 + if (args-flags  KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY)
   return KVM_VIRTIO_CCW_NOTIFY_BUS;
 - return KVM_MMIO_BUS;
 + /* When length is ignored, MMIO is put on a separate bus, for
 +  * faster lookups.
 +  */
 + return args-len ? KVM_MMIO_BUS : KVM_FAST_MMIO_BUS;
  }
  
  static int
 @@ -779,7 +782,7 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct 
 kvm_ioeventfd *args)
   struct eventfd_ctx   *eventfd;
   int   ret;
  
 - bus_idx = ioeventfd_bus_from_flags(args-flags);
 + bus_idx = ioeventfd_bus_from_args(args);
   /* must be natural-word sized, or 0 to ignore length */
   switch (args-len) {