Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-11 Thread Gonglei (Arei)

> -Original Message-
> From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> Behalf Of David Hildenbrand
> Sent: Monday, June 11, 2018 8:36 PM
> To: Gonglei (Arei) ; 浙大邮箱 
> Subject: Re: An emulation failure occurs,if I hotplug vcpus immediately after 
> the
> VM start
> 
> On 11.06.2018 14:25, Gonglei (Arei) wrote:
> >
> > Hi David and Paolo,
> >
> >> -Original Message-
> >> From: David Hildenbrand [mailto:da...@redhat.com]
> >> Sent: Monday, June 11, 2018 6:44 PM
> >> To: 浙大邮箱 
> >> Cc: Paolo Bonzini ; Gonglei (Arei)
> >> ; Igor Mammedov ;
> >> xuyandong ; Zhanghailiang
> >> ; wangxin (U)
> >> ; lidonglin ;
> >> k...@vger.kernel.org; qemu-devel@nongnu.org; Huangweidong (C)
> >> 
> >> Subject: Re: An emulation failure occurs,if I hotplug vcpus immediately 
> >> after
> the
> >> VM start
> >>
> >> On 07.06.2018 18:03, 浙大邮箱 wrote:
> >>> Hi,all
> >>> I still have a question after reading your discussion: Will seabios 
> >>> detect the
> >> change of address space even if we add_region and del_region
> automatically? I
> >> guess that seabios may not take this change into consideration.
> >>
> >> Hi,
> >>
> >> We would just change the way how KVM memory slots are updated. This is
> >> right now not atomic, but would be later on. It should not have any
> >> other effect.
> >>
> > Yes. Do you have any plans to do that?
> 
> Well, I have plans to work on atomically resizable memory regions
> (atomic del + add), and what Paolo described could also work for that
> use case. However, I won't have time to look into that in the near
> future. So if somebody else wants to jump it, perfect. If not, it will
> have to wait unfortunately.
> 
Got it. :)

Thanks,
-Gonglei


Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-11 Thread David Hildenbrand
On 11.06.2018 14:25, Gonglei (Arei) wrote:
> 
> Hi David and Paolo,
> 
>> -Original Message-
>> From: David Hildenbrand [mailto:da...@redhat.com]
>> Sent: Monday, June 11, 2018 6:44 PM
>> To: 浙大邮箱 
>> Cc: Paolo Bonzini ; Gonglei (Arei)
>> ; Igor Mammedov ;
>> xuyandong ; Zhanghailiang
>> ; wangxin (U)
>> ; lidonglin ;
>> k...@vger.kernel.org; qemu-devel@nongnu.org; Huangweidong (C)
>> 
>> Subject: Re: An emulation failure occurs,if I hotplug vcpus immediately 
>> after the
>> VM start
>>
>> On 07.06.2018 18:03, 浙大邮箱 wrote:
>>> Hi,all
>>> I still have a question after reading your discussion: Will seabios detect 
>>> the
>> change of address space even if we add_region and del_region automatically? I
>> guess that seabios may not take this change into consideration.
>>
>> Hi,
>>
>> We would just change the way how KVM memory slots are updated. This is
>> right now not atomic, but would be later on. It should not have any
>> other effect.
>>
> Yes. Do you have any plans to do that? 

Well, I have plans to work on atomically resizable memory regions
(atomic del + add), and what Paolo described could also work for that
use case. However, I won't have time to look into that in the near
future. So if somebody else wants to jump it, perfect. If not, it will
have to wait unfortunately.

> 
> Thanks,
> -Gonglei
> 


-- 

Thanks,

David / dhildenb



Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-11 Thread Gonglei (Arei)

Hi David and Paolo,

> -Original Message-
> From: David Hildenbrand [mailto:da...@redhat.com]
> Sent: Monday, June 11, 2018 6:44 PM
> To: 浙大邮箱 
> Cc: Paolo Bonzini ; Gonglei (Arei)
> ; Igor Mammedov ;
> xuyandong ; Zhanghailiang
> ; wangxin (U)
> ; lidonglin ;
> k...@vger.kernel.org; qemu-devel@nongnu.org; Huangweidong (C)
> 
> Subject: Re: An emulation failure occurs,if I hotplug vcpus immediately after 
> the
> VM start
> 
> On 07.06.2018 18:03, 浙大邮箱 wrote:
> > Hi,all
> > I still have a question after reading your discussion: Will seabios detect 
> > the
> change of address space even if we add_region and del_region automatically? I
> guess that seabios may not take this change into consideration.
> 
> Hi,
> 
> We would just change the way how KVM memory slots are updated. This is
> right now not atomic, but would be later on. It should not have any
> other effect.
> 
Yes. Do you have any plans to do that? 

Thanks,
-Gonglei


Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-11 Thread David Hildenbrand
On 07.06.2018 18:03, 浙大邮箱 wrote:
> Hi,all
> I still have a question after reading your discussion: Will seabios detect 
> the change of address space even if we add_region and del_region 
> automatically? I guess that seabios may not take this change into 
> consideration.

Hi,

We would just change the way how KVM memory slots are updated. This is
right now not atomic, but would be later on. It should not have any
other effect.

Thanks

> 
> Looking forward to your replies,
> Thanks
> 
>>> On Jun 7, 2018, at 20:55, David Hildenbrand  wrote:
>>>
>>> On 07.06.2018 14:36, Paolo Bonzini wrote:
>>> On 07/06/2018 13:36, David Hildenbrand wrote:
> The dirty bitmap would be synced in kvm_region_del (so it's not true
> that kvm_region_del would disappear, but almost :)).
 I was rather concerned when doing a KVM_SET_USER_MEMORY_REGIONS while
 some (already present) memory region is performing dirty tracking and
 therefore has a dirty_bitmap pointer assigned.

 As we have to expect that all different kinds of parameters can change
 (e.g. the size of a slot as I pointed out), the old bitmap cannot be
 reused. At least not atomically -  we could create a new one and the
 simply or the old content.

 Well, we could make that dirty tracking a special case ("all dirty
 tracking data will be lost in case doing a KVM_SET_USER_MEMORY_REGIONS"
 - but this again could lead to races (if the bitmap sync happens before
 KVM_SET_USER_MEMORY_REGIONS)). It's tricky to get it right :)
>>>
>>> At the point where QEMU calls region_del, the guest is not supposed to
>>> access the region anymore, so it's okay to do the final sync there.  The
>>> same race exists now already.
>>
>> The point is that KVM_SET_USER_MEMORY_REGIONS is the big hammer, not the
>> fine grained region_add/region_del. All regions are suddenly involved.
>> So we have to handle dirty_bitmaps somehow.
>>
>> But I agree that those that would actually change
>> (split/resized/whatever) should not be currently dirty tracked (or at if
>> they are, they should be able to live with the possible race).
>>
>> Devil is in the detail :)
>>
>>>
>>> Paolo
>>>
> The rmap is more interesting.  Perhaps it can be just rebuilt on every
> KVM_SET_USER_MEMORY_REGIONS call.
>>
>>
>> -- 
>>
>> Thanks,
>>
>> David / dhildenb
> 


-- 

Thanks,

David / dhildenb



Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-07 Thread 浙大邮箱
Hi,all
I still have a question after reading your discussion: Will seabios detect the 
change of address space even if we add_region and del_region automatically? I 
guess that seabios may not take this change into consideration.

Looking forward to your replies,
Thanks

>> On Jun 7, 2018, at 20:55, David Hildenbrand  wrote:
>> 
>> On 07.06.2018 14:36, Paolo Bonzini wrote:
>> On 07/06/2018 13:36, David Hildenbrand wrote:
 The dirty bitmap would be synced in kvm_region_del (so it's not true
 that kvm_region_del would disappear, but almost :)).
>>> I was rather concerned when doing a KVM_SET_USER_MEMORY_REGIONS while
>>> some (already present) memory region is performing dirty tracking and
>>> therefore has a dirty_bitmap pointer assigned.
>>> 
>>> As we have to expect that all different kinds of parameters can change
>>> (e.g. the size of a slot as I pointed out), the old bitmap cannot be
>>> reused. At least not atomically -  we could create a new one and the
>>> simply or the old content.
>>> 
>>> Well, we could make that dirty tracking a special case ("all dirty
>>> tracking data will be lost in case doing a KVM_SET_USER_MEMORY_REGIONS"
>>> - but this again could lead to races (if the bitmap sync happens before
>>> KVM_SET_USER_MEMORY_REGIONS)). It's tricky to get it right :)
>> 
>> At the point where QEMU calls region_del, the guest is not supposed to
>> access the region anymore, so it's okay to do the final sync there.  The
>> same race exists now already.
> 
> The point is that KVM_SET_USER_MEMORY_REGIONS is the big hammer, not the
> fine grained region_add/region_del. All regions are suddenly involved.
> So we have to handle dirty_bitmaps somehow.
> 
> But I agree that those that would actually change
> (split/resized/whatever) should not be currently dirty tracked (or at if
> they are, they should be able to live with the possible race).
> 
> Devil is in the detail :)
> 
>> 
>> Paolo
>> 
 The rmap is more interesting.  Perhaps it can be just rebuilt on every
 KVM_SET_USER_MEMORY_REGIONS call.
> 
> 
> -- 
> 
> Thanks,
> 
> David / dhildenb




Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-07 Thread David Hildenbrand
On 07.06.2018 14:36, Paolo Bonzini wrote:
> On 07/06/2018 13:36, David Hildenbrand wrote:
>>> The dirty bitmap would be synced in kvm_region_del (so it's not true
>>> that kvm_region_del would disappear, but almost :)).
>> I was rather concerned when doing a KVM_SET_USER_MEMORY_REGIONS while
>> some (already present) memory region is performing dirty tracking and
>> therefore has a dirty_bitmap pointer assigned.
>>
>> As we have to expect that all different kinds of parameters can change
>> (e.g. the size of a slot as I pointed out), the old bitmap cannot be
>> reused. At least not atomically -  we could create a new one and the
>> simply or the old content.
>>
>> Well, we could make that dirty tracking a special case ("all dirty
>> tracking data will be lost in case doing a KVM_SET_USER_MEMORY_REGIONS"
>> - but this again could lead to races (if the bitmap sync happens before
>> KVM_SET_USER_MEMORY_REGIONS)). It's tricky to get it right :)
> 
> At the point where QEMU calls region_del, the guest is not supposed to
> access the region anymore, so it's okay to do the final sync there.  The
> same race exists now already.

The point is that KVM_SET_USER_MEMORY_REGIONS is the big hammer, not the
fine grained region_add/region_del. All regions are suddenly involved.
So we have to handle dirty_bitmaps somehow.

But I agree that those that would actually change
(split/resized/whatever) should not be currently dirty tracked (or at if
they are, they should be able to live with the possible race).

Devil is in the detail :)

> 
> Paolo
> 
>>> The rmap is more interesting.  Perhaps it can be just rebuilt on every
>>> KVM_SET_USER_MEMORY_REGIONS call.
> 


-- 

Thanks,

David / dhildenb



Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-07 Thread Paolo Bonzini
On 07/06/2018 13:36, David Hildenbrand wrote:
>> The dirty bitmap would be synced in kvm_region_del (so it's not true
>> that kvm_region_del would disappear, but almost :)).
> I was rather concerned when doing a KVM_SET_USER_MEMORY_REGIONS while
> some (already present) memory region is performing dirty tracking and
> therefore has a dirty_bitmap pointer assigned.
> 
> As we have to expect that all different kinds of parameters can change
> (e.g. the size of a slot as I pointed out), the old bitmap cannot be
> reused. At least not atomically -  we could create a new one and the
> simply or the old content.
> 
> Well, we could make that dirty tracking a special case ("all dirty
> tracking data will be lost in case doing a KVM_SET_USER_MEMORY_REGIONS"
> - but this again could lead to races (if the bitmap sync happens before
> KVM_SET_USER_MEMORY_REGIONS)). It's tricky to get it right :)

At the point where QEMU calls region_del, the guest is not supposed to
access the region anymore, so it's okay to do the final sync there.  The
same race exists now already.

Paolo

>> The rmap is more interesting.  Perhaps it can be just rebuilt on every
>> KVM_SET_USER_MEMORY_REGIONS call.




Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-07 Thread Gonglei (Arei)

> -Original Message-
> From: David Hildenbrand [mailto:da...@redhat.com]
> Sent: Thursday, June 07, 2018 6:40 PM
> Subject: Re: An emulation failure occurs,if I hotplug vcpus immediately after 
> the
> VM start
> 
> On 06.06.2018 15:57, Paolo Bonzini wrote:
> > On 06/06/2018 15:28, Gonglei (Arei) wrote:
> >> gonglei: mem.slot: 3, mem.guest_phys_addr=0xc,
> >> mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x0
> >> gonglei: mem.slot: 3, mem.guest_phys_addr=0xc,
> >> mem.userspace_addr=0x7fc343ec, mem.flags=0,
> memory_size=0x9000
> >>
> >> When the memory region is cleared, the KVM will tell the slot to be
> >> invalid (which it is set to KVM_MEMSLOT_INVALID).
> >>
> >> If SeaBIOS accesses this memory and cause page fault, it will find an
> >> invalid value according to gfn (by __gfn_to_pfn_memslot), and finally
> >> it will return an invalid value, and finally it will return a
> >> failure.
> >>
> >> So, My questions are:
> >>
> >> 1) Why don't we hold kvm->slots_lock during page fault processing?
> >
> > Because it's protected by SRCU.  We don't need kvm->slots_lock on the
> > read side.
> >
> >> 2) How do we assure that vcpus will not access the corresponding
> >> region when deleting an memory slot?
> >
> > We don't.  It's generally a guest bug if they do, but the problem here
> > is that QEMU is splitting a memory region in two parts and that is not
> > atomic.
> 
> BTW, one ugly (but QEMU-only) fix would be to temporarily pause all
> VCPUs, do the change and then unpause all VCPUs.
> 

The updating process of memory region is triggered by vcpu thread, not
the main process though.

Thanks,
-Gonglei

> >
> > One fix could be to add a KVM_SET_USER_MEMORY_REGIONS ioctl that
> > replaces the entire memory map atomically.
> >
> > Paolo
> >
> 
> 
> --
> 
> Thanks,
> 
> David / dhildenb


Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-07 Thread David Hildenbrand
On 07.06.2018 13:13, Gonglei (Arei) wrote:
> 
>> -Original Message-
>> From: David Hildenbrand [mailto:da...@redhat.com]
>> Sent: Thursday, June 07, 2018 6:40 PM
>> Subject: Re: An emulation failure occurs,if I hotplug vcpus immediately 
>> after the
>> VM start
>>
>> On 06.06.2018 15:57, Paolo Bonzini wrote:
>>> On 06/06/2018 15:28, Gonglei (Arei) wrote:
 gonglei: mem.slot: 3, mem.guest_phys_addr=0xc,
 mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x0
 gonglei: mem.slot: 3, mem.guest_phys_addr=0xc,
 mem.userspace_addr=0x7fc343ec, mem.flags=0,
>> memory_size=0x9000

 When the memory region is cleared, the KVM will tell the slot to be
 invalid (which it is set to KVM_MEMSLOT_INVALID).

 If SeaBIOS accesses this memory and cause page fault, it will find an
 invalid value according to gfn (by __gfn_to_pfn_memslot), and finally
 it will return an invalid value, and finally it will return a
 failure.

 So, My questions are:

 1) Why don't we hold kvm->slots_lock during page fault processing?
>>>
>>> Because it's protected by SRCU.  We don't need kvm->slots_lock on the
>>> read side.
>>>
 2) How do we assure that vcpus will not access the corresponding
 region when deleting an memory slot?
>>>
>>> We don't.  It's generally a guest bug if they do, but the problem here
>>> is that QEMU is splitting a memory region in two parts and that is not
>>> atomic.
>>
>> BTW, one ugly (but QEMU-only) fix would be to temporarily pause all
>> VCPUs, do the change and then unpause all VCPUs.
>>
> 
> The updating process of memory region is triggered by vcpu thread, not
> the main process though.

Yes, I also already ran into this problem. Because it involves calling
pause_all_vcpus() from a VCPU thread. I sent a patch for that already,
but we were able to solve the s390x problem differently.

https://patchwork.kernel.org/patch/10331305/

The major problem of pause_all_vcpus() is that it will temporarily drop
the iothread mutex, which can result in "funny" side effects :) Handling
parallel call to pause_all_vcpus() is the smaller issue.

So right now, it can only be used from the main thread.

> 
> Thanks,
> -Gonglei
> 
>>>
>>> One fix could be to add a KVM_SET_USER_MEMORY_REGIONS ioctl that
>>> replaces the entire memory map atomically.
>>>
>>> Paolo
>>>
>>
>>
>> --
>>
>> Thanks,
>>
>> David / dhildenb


-- 

Thanks,

David / dhildenb



Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-07 Thread David Hildenbrand
On 07.06.2018 13:02, Paolo Bonzini wrote:
> On 07/06/2018 12:37, David Hildenbrand wrote:
>>
>> I have a related requirement, which would be to atomically grow a
>> memory regions. So instead of region_del(old)+region_add(new), I would
>> have to do it in one shot (atomically).
>>
>> AFAICS an atomic replace of the memory map would work for this, too.
>> However I am not sure how we want to handle all kinds of tracking data
>> that is connected to e.g. x86 memory slots (e.g. rmap, dirty bitmap ...).
> 
> The dirty bitmap would be synced in kvm_region_del (so it's not true
> that kvm_region_del would disappear, but almost :)).

I was rather concerned when doing a KVM_SET_USER_MEMORY_REGIONS while
some (already present) memory region is performing dirty tracking and
therefore has a dirty_bitmap pointer assigned.

As we have to expect that all different kinds of parameters can change
(e.g. the size of a slot as I pointed out), the old bitmap cannot be
reused. At least not atomically -  we could create a new one and the
simply or the old content.

Well, we could make that dirty tracking a special case ("all dirty
tracking data will be lost in case doing a KVM_SET_USER_MEMORY_REGIONS"
- but this again could lead to races (if the bitmap sync happens before
KVM_SET_USER_MEMORY_REGIONS)). It's tricky to get it right :)

> 
> The rmap is more interesting.  Perhaps it can be just rebuilt on every
> KVM_SET_USER_MEMORY_REGIONS call.
> 
> Paolo
> 


-- 

Thanks,

David / dhildenb



Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-07 Thread Paolo Bonzini
On 07/06/2018 12:37, David Hildenbrand wrote:
> 
> I have a related requirement, which would be to atomically grow a
> memory regions. So instead of region_del(old)+region_add(new), I would
> have to do it in one shot (atomically).
> 
> AFAICS an atomic replace of the memory map would work for this, too.
> However I am not sure how we want to handle all kinds of tracking data
> that is connected to e.g. x86 memory slots (e.g. rmap, dirty bitmap ...).

The dirty bitmap would be synced in kvm_region_del (so it's not true
that kvm_region_del would disappear, but almost :)).

The rmap is more interesting.  Perhaps it can be just rebuilt on every
KVM_SET_USER_MEMORY_REGIONS call.

Paolo



Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-07 Thread David Hildenbrand
On 06.06.2018 15:57, Paolo Bonzini wrote:
> On 06/06/2018 15:28, Gonglei (Arei) wrote:
>> gonglei: mem.slot: 3, mem.guest_phys_addr=0xc,
>> mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x0 
>> gonglei: mem.slot: 3, mem.guest_phys_addr=0xc,
>> mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x9000
>>
>> When the memory region is cleared, the KVM will tell the slot to be 
>> invalid (which it is set to KVM_MEMSLOT_INVALID).
>>
>> If SeaBIOS accesses this memory and cause page fault, it will find an
>> invalid value according to gfn (by __gfn_to_pfn_memslot), and finally
>> it will return an invalid value, and finally it will return a
>> failure.
>>
>> So, My questions are:
>>
>> 1) Why don't we hold kvm->slots_lock during page fault processing?
> 
> Because it's protected by SRCU.  We don't need kvm->slots_lock on the
> read side.
> 
>> 2) How do we assure that vcpus will not access the corresponding 
>> region when deleting an memory slot?
> 
> We don't.  It's generally a guest bug if they do, but the problem here
> is that QEMU is splitting a memory region in two parts and that is not
> atomic.

BTW, one ugly (but QEMU-only) fix would be to temporarily pause all
VCPUs, do the change and then unpause all VCPUs.

> 
> One fix could be to add a KVM_SET_USER_MEMORY_REGIONS ioctl that
> replaces the entire memory map atomically.
> 
> Paolo
> 


-- 

Thanks,

David / dhildenb



Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-07 Thread David Hildenbrand
On 06.06.2018 15:57, Paolo Bonzini wrote:
> On 06/06/2018 15:28, Gonglei (Arei) wrote:
>> gonglei: mem.slot: 3, mem.guest_phys_addr=0xc,
>> mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x0 
>> gonglei: mem.slot: 3, mem.guest_phys_addr=0xc,
>> mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x9000
>>
>> When the memory region is cleared, the KVM will tell the slot to be 
>> invalid (which it is set to KVM_MEMSLOT_INVALID).
>>
>> If SeaBIOS accesses this memory and cause page fault, it will find an
>> invalid value according to gfn (by __gfn_to_pfn_memslot), and finally
>> it will return an invalid value, and finally it will return a
>> failure.
>>
>> So, My questions are:
>>
>> 1) Why don't we hold kvm->slots_lock during page fault processing?
> 
> Because it's protected by SRCU.  We don't need kvm->slots_lock on the
> read side.
> 
>> 2) How do we assure that vcpus will not access the corresponding 
>> region when deleting an memory slot?
> 
> We don't.  It's generally a guest bug if they do, but the problem here
> is that QEMU is splitting a memory region in two parts and that is not
> atomic.
> 
> One fix could be to add a KVM_SET_USER_MEMORY_REGIONS ioctl that
> replaces the entire memory map atomically.
> 
> Paolo
> 

Hi Paolo,

I have a related requirement, which would be to atomically grow a
memory regions. So instead of region_del(old)+region_add(new), I would
have to do it in one shot (atomically).

AFAICS an atomic replace of the memory map would work for this, too.
However I am not sure how we want to handle all kinds of tracking data
that is connected to e.g. x86 memory slots (e.g. rmap, dirty bitmap ...).

And for a generic KVM_SET_USER_MEMORY_REGIONS, we would have to handle
this somehow.

-- 

Thanks,

David / dhildenb



Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-06 Thread Paolo Bonzini
On 06/06/2018 16:18, xuyandong wrote:
>> We don't.  It's generally a guest bug if they do, but the problem here is 
>> that
>> QEMU is splitting a memory region in two parts and that is not atomic.
>>  
>> One fix could be to add a KVM_SET_USER_MEMORY_REGIONS ioctl that
>> replaces the entire memory map atomically.
>>
>> Paolo
> After we add a KVM_SET_USER_MEMORY_REGIONS ioctl that replaces the entire
> memory map atomically, how to use it in address_space_update_topology?
> Shall we checkout the spilt memory region before 
> " address_space_update_topology_pass(as, old_view, new_view, false); 
> address_space_update_topology_pass(as, old_view, new_view, true);

You would add the regions to an array in kvm_region_add, and send the
ioctl in the .commit callback of MemoryListener.

kvm_region_del would disappear.  The .commit callback would also look at
the array from the previous execution, and call memory_region_unref on
the regions in there.

Paolo



Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-06 Thread xuyandong


> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Wednesday, June 06, 2018 9:58 PM
> To: Gonglei (Arei) ; Igor Mammedov
> ; xuyandong 
> Cc: Zhanghailiang ; wangxin (U)
> ; lidonglin ;
> k...@vger.kernel.org; qemu-devel@nongnu.org; Huangweidong (C)
> 
> Subject: Re: An emulation failure occurs,if I hotplug vcpus immediately after
> the VM start
> 
> On 06/06/2018 15:28, Gonglei (Arei) wrote:
> > gonglei: mem.slot: 3, mem.guest_phys_addr=0xc,
> > mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x0
> > gonglei: mem.slot: 3, mem.guest_phys_addr=0xc,
> > mem.userspace_addr=0x7fc343ec, mem.flags=0,
> memory_size=0x9000
> >
> > When the memory region is cleared, the KVM will tell the slot to be
> > invalid (which it is set to KVM_MEMSLOT_INVALID).
> >
> > If SeaBIOS accesses this memory and cause page fault, it will find an
> > invalid value according to gfn (by __gfn_to_pfn_memslot), and finally
> > it will return an invalid value, and finally it will return a failure.
> >
> > So, My questions are:
> >
> > 1) Why don't we hold kvm->slots_lock during page fault processing?
> 
> Because it's protected by SRCU.  We don't need kvm->slots_lock on the read
> side.
> 
> > 2) How do we assure that vcpus will not access the corresponding
> > region when deleting an memory slot?
> 
> We don't.  It's generally a guest bug if they do, but the problem here is that
> QEMU is splitting a memory region in two parts and that is not atomic.
>   
> One fix could be to add a KVM_SET_USER_MEMORY_REGIONS ioctl that
> replaces the entire memory map atomically.
> 
> Paolo

After we add a KVM_SET_USER_MEMORY_REGIONS ioctl that replaces the entire
memory map atomically, how to use it in address_space_update_topology?
Shall we checkout the spilt memory region before 
" address_space_update_topology_pass(as, old_view, new_view, false); 
address_space_update_topology_pass(as, old_view, new_view, true);
".




Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-06 Thread Paolo Bonzini
On 06/06/2018 15:28, Gonglei (Arei) wrote:
> gonglei: mem.slot: 3, mem.guest_phys_addr=0xc,
> mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x0 
> gonglei: mem.slot: 3, mem.guest_phys_addr=0xc,
> mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x9000
> 
> When the memory region is cleared, the KVM will tell the slot to be 
> invalid (which it is set to KVM_MEMSLOT_INVALID).
> 
> If SeaBIOS accesses this memory and cause page fault, it will find an
> invalid value according to gfn (by __gfn_to_pfn_memslot), and finally
> it will return an invalid value, and finally it will return a
> failure.
> 
> So, My questions are:
> 
> 1) Why don't we hold kvm->slots_lock during page fault processing?

Because it's protected by SRCU.  We don't need kvm->slots_lock on the
read side.

> 2) How do we assure that vcpus will not access the corresponding 
> region when deleting an memory slot?

We don't.  It's generally a guest bug if they do, but the problem here
is that QEMU is splitting a memory region in two parts and that is not
atomic.

One fix could be to add a KVM_SET_USER_MEMORY_REGIONS ioctl that
replaces the entire memory map atomically.

Paolo



Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-06 Thread Gonglei (Arei)
Hi Igor,

Thanks for your response firstly. :)

> -Original Message-
> From: Igor Mammedov [mailto:imamm...@redhat.com]
> Sent: Friday, June 01, 2018 6:23 PM
> 
> On Fri, 1 Jun 2018 08:17:12 +
> xuyandong  wrote:
> 
> > Hi there,
> >
> > I am doing some test on qemu vcpu hotplug and I run into some trouble.
> > An emulation failure occurs and qemu prints the following msg:
> >
> > KVM internal error. Suberror: 1
> > emulation failure
> > EAX= EBX= ECX= EDX=0600
> > ESI= EDI= EBP= ESP=fff8
> > EIP=ff53 EFL=00010082 [--S] CPL=0 II=0 A20=1 SMM=0 HLT=0
> > ES =   9300
> > CS =f000 000f  9b00
> > SS =   9300
> > DS =   9300
> > FS =   9300
> > GS =   9300
> > LDT=   8200
> > TR =   8b00if
> > GDT=  
> > IDT=  
> > CR0=6010 CR2= CR3= CR4=
> > DR0= DR1= DR2=
> DR3=
> > DR6=0ff0 DR7=0400
> > EFER=
> > Code=31 d2 eb 04 66 83 ca ff 66 89 d0 66 5b 66 c3 66 89 d0 66 c3  66 68
> 21 8a 00 00 e9 08 d7 66 56 66 53 66 83 ec 0c 66 89 c3 66 e8 ce 7b ff ff 66 89 
> c6
> >
> > I notice that guest is still running SeabBIOS in real mode when the vcpu has
> just been pluged.
> > This emulation failure can be steadly reproduced if I am doing vcpu hotplug
> during VM launch process.
> > After some digging, I find this KVM internal error shows up because KVM
> cannot emulate some MMIO (gpa 0xfff53 ).
> >
> > So I am confused,
> > (1) does qemu support vcpu hotplug even if guest is running seabios ?
> There is no code that forbids it, and I would expect it not to trigger error
> and be NOP.
> 
> > (2) the gpa (0xfff53) is an address of BIOS ROM section, why does kvm
> confirm it as a mmio address incorrectly?
> KVM trace and bios debug log might give more information to guess where to
> look
> or even better would be to debug Seabios and find out what exactly
> goes wrong if you could do it.

This issue can't be reproduced when we opened Seabios debug log or KVM trace. :(

After a few days of debugging, we found that this problem occurs every time 
when 
the memory region is cleared (memory_size is 0) and the VFIO device is 
hot-plugged. 

The key function is kvm_set_user_memory_region(), I added some logs in it.

gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc751e0, mem.flags=0, memory_size=0x2
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc751e0, mem.flags=0, memory_size=0x0
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x1
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x0
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0xbff4
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x0
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0xbff4
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x0
gonglei: mem.slot: 3, mem.guest_phys_addr=0xc, 
mem.userspace_addr=0x7fc343ec, mem.flags=0, memory_size=0x9000

When the memory region is cleared, the KVM will tell the slot to be
invalid (which it is set to KVM_MEMSLOT_INVALID). 

If SeaBIOS accesses this memory and cause page fault, it will find an invalid 
value according to 
gfn (by __gfn_to_pfn_memslot), and finally it will return an invalid value, and 
finally it will return a failure.

The function calls process is as follows in KVM:

kvm_mmu_page_fault
tdp_page_fault
try_async_pf
__gfn_to_pfn_memslot
__direct_map // return true;
x86_emulate_instruction
handle_emulation_failure

The function calls process is as follows in Qemu:

Breakpoint 1, kvm_set_user_memory_region (kml=0x564aa1e2c890, 
slot=0x564aa1e2d230) at /mnt/sdb/gonglei/qemu/kvm-all.c:261
(gdb) bt
#0  kvm_set_user_memory_region (kml=0x564aa1e2c890, slot=0x564aa1e2d230) at 
/mnt/sdb/gonglei/qemu/kvm-all.c:261
#1  0x564a9e7e3096 in kvm_set_phys_mem (kml=0x564aa1e2c890, 
section=0x7febeb296500, add=false) at /mnt/sdb/gonglei/qemu/kvm-all.c:887
#2  0x564a9e7e34c7 in kvm_region_del (listener=0x564aa1e2c890, 
section=0x7febeb296500) at /mnt/sdb/gonglei/qemu/kvm-all.c:999
#3  0x564a9e7ea884 in address_space_update_topology_pass (as=0x564a9f2b2640 
, old_view=0x7febdc3449c0, 

Re: [Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-01 Thread Igor Mammedov
On Fri, 1 Jun 2018 08:17:12 +
xuyandong  wrote:

> Hi there,
> 
> I am doing some test on qemu vcpu hotplug and I run into some trouble.
> An emulation failure occurs and qemu prints the following msg:
> 
> KVM internal error. Suberror: 1
> emulation failure
> EAX= EBX= ECX= EDX=0600
> ESI= EDI= EBP= ESP=fff8
> EIP=ff53 EFL=00010082 [--S] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =   9300
> CS =f000 000f  9b00
> SS =   9300
> DS =   9300
> FS =   9300
> GS =   9300
> LDT=   8200
> TR =   8b00if
> GDT=  
> IDT=  
> CR0=6010 CR2= CR3= CR4=
> DR0= DR1= DR2= 
> DR3=
> DR6=0ff0 DR7=0400
> EFER=
> Code=31 d2 eb 04 66 83 ca ff 66 89 d0 66 5b 66 c3 66 89 d0 66 c3  66 68 
> 21 8a 00 00 e9 08 d7 66 56 66 53 66 83 ec 0c 66 89 c3 66 e8 ce 7b ff ff 66 89 
> c6
> 
> I notice that guest is still running SeabBIOS in real mode when the vcpu has 
> just been pluged.
> This emulation failure can be steadly reproduced if I am doing vcpu hotplug 
> during VM launch process.
> After some digging, I find this KVM internal error shows up because KVM 
> cannot emulate some MMIO (gpa 0xfff53 ).
> 
> So I am confused,
> (1) does qemu support vcpu hotplug even if guest is running seabios ?
There is no code that forbids it, and I would expect it not to trigger error
and be NOP.

> (2) the gpa (0xfff53) is an address of BIOS ROM section, why does kvm confirm 
> it as a mmio address incorrectly?
KVM trace and bios debug log might give more information to guess where to look
or even better would be to debug Seabios and find out what exactly
goes wrong if you could do it.




[Qemu-devel] An emulation failure occurs, if I hotplug vcpus immediately after the VM start

2018-06-01 Thread xuyandong
Hi there,

I am doing some test on qemu vcpu hotplug and I run into some trouble.
An emulation failure occurs and qemu prints the following msg:

KVM internal error. Suberror: 1
emulation failure
EAX= EBX= ECX= EDX=0600
ESI= EDI= EBP= ESP=fff8
EIP=ff53 EFL=00010082 [--S] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =   9300
CS =f000 000f  9b00
SS =   9300
DS =   9300
FS =   9300
GS =   9300
LDT=   8200
TR =   8b00if
GDT=  
IDT=  
CR0=6010 CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=31 d2 eb 04 66 83 ca ff 66 89 d0 66 5b 66 c3 66 89 d0 66 c3  66 68 21 
8a 00 00 e9 08 d7 66 56 66 53 66 83 ec 0c 66 89 c3 66 e8 ce 7b ff ff 66 89 c6

I notice that guest is still running SeabBIOS in real mode when the vcpu has 
just been pluged.
This emulation failure can be steadly reproduced if I am doing vcpu hotplug 
during VM launch process.
After some digging, I find this KVM internal error shows up because KVM cannot 
emulate some MMIO (gpa 0xfff53 ).

So I am confused,
(1) does qemu support vcpu hotplug even if guest is running seabios ?
(2) the gpa (0xfff53) is an address of BIOS ROM section, why does kvm confirm 
it as a mmio address incorrectly?