Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-11 Thread David Hildenbrand

On 11.08.23 18:54, Peter Xu wrote:

On Fri, Aug 11, 2023 at 06:25:14PM +0200, David Hildenbrand wrote:

On 11.08.23 18:22, Peter Xu wrote:

On Fri, Aug 11, 2023 at 06:17:05PM +0200, David Hildenbrand wrote:

We wouldn't touch "-mem-path".


But still the same issue when someone uses -object memory-backend-file for
hugetlb, mapping privately, expecting ram discard to work?

Basically I see that example as, "hugetlb" in general made the private
mapping over RW file usable, so forbidden that anywhere may take a risk.


These users can be directed to using hugetlb

a) using MAP_SHARED
b) using memory-backend-memfd, if MAP_PRIVATE is desired

Am I missing any important use case? Are we being a bit to careful about
virtio-balloon and postcopy simply not being available for these corner
cases?


The current immediate issue is not really mem=rw + fd=rw + private case
(which was a known issue), but how to make mem=rw + fd=ro + private work
for ThinnerBloger, iiuc.

I'd just think it safer to expose that cap to solve problem A (vm
templating) without affecting problem B (fallcate-over-private not working
right), when B is uncertain.


Right, and I'm thinking about if B is worth the effort.



I'm also copy Daniel & libvirt list in case there's quick comment from
there. Say, maybe libvirt never use private mapping on hugetlb files over
memory-backend-file at all, then it's probably fine.


libvirt certainly allows setting  with type="file">.


Could be that they also end up mapping "" to 
memory-backend-file instead of memory-backend-memfd (e.g., compatibility 
with older kernels?).




In all cases, you and Igor should have the final grasp; no stand on a
strong opinon from my side.


I do value your opinion, so I'm still trying to figure out if there are 
sane use cases that really need a new parameter. Let's recap:


When opening the file R/O, resulting in fallocate() refusing to work:
* virtio-balloon will fail to discard RAM but continue to "be alive"
* virtio-mem will discard any private pages, but cannot free up disk
  blocks using fallocate.
* postcopy would fail early

Postcopy:
* Works on shmem (MAP_SHARED / MAP_PRIVATE)
* Works on hugetlb (MAP_SHARED / MAP_PRIVATE)
* Does not work on file-backed memory (including MAP_PRIVATE)

We can ignore virtio-mem for now. What remains is postcopy and 
virtio-balloon.


memory-backend-file with MAP_PRIVATE on shmem/tmpfs results in a double 
memory consumption, so we can mostly cross that out as "sane use case". 
Rather make such users aware of that :D


memory-backend-file with MAP_PRIVATE on hugetlb works. virtio-balloon is 
not really compatible with hugetlb, free-page-reporting might work 
(although quite non-nonsensical). So postcopy as the most important use 
case remains.


memory-backend-file with MAP_PRIVATE on file-backed memory works. 
postcopy does not apply. virtio-balloon should work I guess.



So the two use cases that are left are:
* postcopy with hugetlb would fail
* virtio-balloon with file-backed memory cannot free up disk blocks

Am I missing a case?

--
Cheers,

David / dhildenb



Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation

2022-03-21 Thread David Hildenbrand
>>> From what I understand:
>>>
>>> QEMU
>>>  - add a "deprecated-features" feature group (more-or-less David's code)
>>>
>>> libvirt
>>>  - recognize a new model name "host-recommended"
>>>  - query QEMU for host-model + deprecated-features and cache it in caps
>>> file (something like )
>>>  - when guest is defined with "host-recommended", pull  from
>>> caps when guest is started (similar to how host-model works today)
>>>
>>> If this is sufficient, then I can then get to work on this.
>>>
>>> My question is what would be the best way to include the deprecated
>>> features when calculating a baseline or comparison. Both work with the
>>> host-model and may no longer present an accurate result. Say, for
>>> example, we baseline a z15 with a gen17 (which will outright not support
>>> CSSKE). With today's implementation, this might result in a ridiculously
>>> old CPU model which also does not support CSSKE. The ideal response
>>> would be a z15 - deprecated features (i.e. host-recommended on a z15),
>>> but we'd need a way to flag to QEMU that we want to exclude the
>>> deprecated features. Or am I totally wrong about this?
>>
>> For baselining, it would be reasonable to always disable deprecated
>> features, and to ignore them during the model selection. Should be
>> fairly easy to implement, let me know if you need any pointers.
>>
> 
> Thanks David. I'll take a look when I can. I may not be very active this
> week due to personal items, but intend to knock this out as soon as
> things settle down on my end.

No need to rush :)

> 
>> I *assume* that for comparison there is nothing to do.
>>
> 
> I think you're right, at least on QEMU's end.
> 
> For libvirt, IIRC, comparison will compare the CPU model cached under
> the hostCPU tag to whatever is in the XML. If comparing, say, a gen17
> host (no csske support) with a gen15 XML, the result should come up as
> "incompatible". To a user, they may think "what the heck, shouldn't old
> gen run on new gen?"

I assume you mean an expanded host model on a z15 that still shows
"csske=true". And it would be correct: the deprecated feature still
around on the older machine (indicated in the host model) is not around
on the newer machine (not indicated in the host model). So starting a VM
with the "host-model" on the old machine cannot be migrated to the new
machine. You'd need to start the VM with the new host-TOBENAMED CPU
model. Comparing with that would work as expected, as the deprecated
features would not be included.

> 
> Doesn't the comparison QAPI report which features cause the result of
> "incompatible"? Would it make sense to amend the libvirt API to report
> features causing this issue? I believe this is what the --error flag is
> meant to do, but as far as I know, nothing useful is currently reported.

Most probably it was never implemented on s390x. Makes sense to me.

> 
> Something like this (assume we're a gen17 host, and cpu.xml contains a
> gen15 host-model)
> 
> # virsh hypervisor-cpu-compare cpu.xml --error
> error: Failed to compare hypervisor CPU with cpu.xml
> error: the CPU is incompatible with host CPU
> error: host CPU does not support: csske

I guess instead of "host CPU" you'd want to indicate one of the two CPU
models provided. Not sure how to differentiate them from the XML.


-- 
Thanks,

David / dhildenb



Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation

2022-03-21 Thread David Hildenbrand
On 21.03.22 10:25, Daniel P. Berrangé wrote:
> On Fri, Mar 18, 2022 at 01:23:03PM -0400, Collin Walling wrote:
>> On 3/15/22 15:08, David Hildenbrand wrote:
>>> On 15.03.22 18:40, Boris Fiuczynski wrote:
>>>> On 3/15/22 4:58 PM, David Hildenbrand wrote:
>>>>> On 11.03.22 13:44, Christian Borntraeger wrote:
>>>>>>
>>>>>>
>>>>>> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>>>>>>> On 11.03.22 05:17, Collin Walling wrote:
>>>>>>>> The s390x architecture has a growing list of features that will no 
>>>>>>>> longer
>>>>>>>> be supported on future hardware releases. This introduces an issue with
>>>>>>>> migration such that guests, running on models with these features 
>>>>>>>> enabled,
>>>>>>>> will be rejected outright by machines that do not support these 
>>>>>>>> features.
>>>>>>>>
>>>>>>>> A current example is the CSSKE feature that has been deprecated for 
>>>>>>>> some time.
>>>>>>>> It has been publicly announced that gen15 will be the last release to
>>>>>>>> support this feature, however we have postponed this to gen16a. A 
>>>>>>>> possible
>>>>>>>> solution to remedy this would be to create a new QEMU QMP Response 
>>>>>>>> that allows
>>>>>>>> users to query for deprecated/unsupported features.
>>>>>>>>
>>>>>>>> This presents two parts of the puzzle: how to report deprecated 
>>>>>>>> features to
>>>>>>>> a user (libvirt) and how should libvirt handle this information.
>>>>>>>>
>>>>>>>> First, let's discuss the latter. The patch presented alongside this 
>>>>>>>> cover letter
>>>>>>>> attempts to solve the migration issue by hard-coding the CSSKE feature 
>>>>>>>> to be
>>>>>>>> disabled for all s390x CPU models. This is done by simply appending 
>>>>>>>> the CSSKE
>>>>>>>> feature with the disabled policy to the host-model.
>>>>>>>>
>>>>>>>> libvirt pseudo:
>>>>>>>>
>>>>>>>> if arch is s390x
>>>>>>>>   set CSSKE to disabled for host-model
>>>>>>>
>>>>>>> That violates host-model semantics and possibly the user intend. There
>>>>>>> would have to be some toggle to manually specify this, for example, a
>>>>>>> new model type or a some magical flag.
>>>>>>
>>>>>> What we actually want to do is to disable csske completely from QEMU and
>>>>>> thus from the host-model. Then it would not violate the spec.
>>>>>> But this has all kind of issues (you cannot migrate from older versions
>>>>>> of software and machines) although the hardware still can provide the 
>>>>>> feature.
>>>>>>
>>>>>> The hardware guys promised me to deprecate things two generations earlier
>>>>>> and we usually deprecate things that are not used or where software has a
>>>>>> runtime switch.
>>>>>>
>>>>>>   From what I hear from you is that you do not want to modify the 
>>>>>> host-model
>>>>>> semantics to something more useful but rather define a new thing (e.g. 
>>>>>> "host-sane") ?
>>>>>
>>>>> My take would be, to keep the host model consistent, meaning, the
>>>>> semantics in QEMU exactly match the semantics in Libvirt. It defines the
>>>>> maximum CPU model that's runnable under KVM. If a feature is not
>>>>> included (e.g., csske) that feature cannot be enabled in any way.
>>>>>
>>>>> The "host model" has the semantics of resembling the actual host CPU.
>>>>> This is only partially true, because we support some features the host
>>>>> might not support (e.g., zPCI IIRC) and obviously don't support all host
>>>>> features in QEMU.
>>>>>
>>>>> So instead of playing games on the libvirt side with the host model, I
>>>>> see the following alternatives:
>&g

Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation

2022-03-18 Thread David Hildenbrand
On 18.03.22 18:23, Collin Walling wrote:
> On 3/15/22 15:08, David Hildenbrand wrote:
>> On 15.03.22 18:40, Boris Fiuczynski wrote:
>>> On 3/15/22 4:58 PM, David Hildenbrand wrote:
>>>> On 11.03.22 13:44, Christian Borntraeger wrote:
>>>>>
>>>>>
>>>>> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>>>>>> On 11.03.22 05:17, Collin Walling wrote:
>>>>>>> The s390x architecture has a growing list of features that will no 
>>>>>>> longer
>>>>>>> be supported on future hardware releases. This introduces an issue with
>>>>>>> migration such that guests, running on models with these features 
>>>>>>> enabled,
>>>>>>> will be rejected outright by machines that do not support these 
>>>>>>> features.
>>>>>>>
>>>>>>> A current example is the CSSKE feature that has been deprecated for 
>>>>>>> some time.
>>>>>>> It has been publicly announced that gen15 will be the last release to
>>>>>>> support this feature, however we have postponed this to gen16a. A 
>>>>>>> possible
>>>>>>> solution to remedy this would be to create a new QEMU QMP Response that 
>>>>>>> allows
>>>>>>> users to query for deprecated/unsupported features.
>>>>>>>
>>>>>>> This presents two parts of the puzzle: how to report deprecated 
>>>>>>> features to
>>>>>>> a user (libvirt) and how should libvirt handle this information.
>>>>>>>
>>>>>>> First, let's discuss the latter. The patch presented alongside this 
>>>>>>> cover letter
>>>>>>> attempts to solve the migration issue by hard-coding the CSSKE feature 
>>>>>>> to be
>>>>>>> disabled for all s390x CPU models. This is done by simply appending the 
>>>>>>> CSSKE
>>>>>>> feature with the disabled policy to the host-model.
>>>>>>>
>>>>>>> libvirt pseudo:
>>>>>>>
>>>>>>> if arch is s390x
>>>>>>>   set CSSKE to disabled for host-model
>>>>>>
>>>>>> That violates host-model semantics and possibly the user intend. There
>>>>>> would have to be some toggle to manually specify this, for example, a
>>>>>> new model type or a some magical flag.
>>>>>
>>>>> What we actually want to do is to disable csske completely from QEMU and
>>>>> thus from the host-model. Then it would not violate the spec.
>>>>> But this has all kind of issues (you cannot migrate from older versions
>>>>> of software and machines) although the hardware still can provide the 
>>>>> feature.
>>>>>
>>>>> The hardware guys promised me to deprecate things two generations earlier
>>>>> and we usually deprecate things that are not used or where software has a
>>>>> runtime switch.
>>>>>
>>>>>   From what I hear from you is that you do not want to modify the 
>>>>> host-model
>>>>> semantics to something more useful but rather define a new thing (e.g. 
>>>>> "host-sane") ?
>>>>
>>>> My take would be, to keep the host model consistent, meaning, the
>>>> semantics in QEMU exactly match the semantics in Libvirt. It defines the
>>>> maximum CPU model that's runnable under KVM. If a feature is not
>>>> included (e.g., csske) that feature cannot be enabled in any way.
>>>>
>>>> The "host model" has the semantics of resembling the actual host CPU.
>>>> This is only partially true, because we support some features the host
>>>> might not support (e.g., zPCI IIRC) and obviously don't support all host
>>>> features in QEMU.
>>>>
>>>> So instead of playing games on the libvirt side with the host model, I
>>>> see the following alternatives:
>>>>
>>>> 1. Remove the problematic features from the host model in QEMU, like "we
>>>> just don't support this feature". Consequently, any migration of a VM
>>>> with csske=on to a new QEMU version will fail, similar to having an
>>>> older QEMU version without support for a certain feature.
>>&g

Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation

2022-03-15 Thread David Hildenbrand
On 15.03.22 18:40, Boris Fiuczynski wrote:
> On 3/15/22 4:58 PM, David Hildenbrand wrote:
>> On 11.03.22 13:44, Christian Borntraeger wrote:
>>>
>>>
>>> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>>>> On 11.03.22 05:17, Collin Walling wrote:
>>>>> The s390x architecture has a growing list of features that will no longer
>>>>> be supported on future hardware releases. This introduces an issue with
>>>>> migration such that guests, running on models with these features enabled,
>>>>> will be rejected outright by machines that do not support these features.
>>>>>
>>>>> A current example is the CSSKE feature that has been deprecated for some 
>>>>> time.
>>>>> It has been publicly announced that gen15 will be the last release to
>>>>> support this feature, however we have postponed this to gen16a. A possible
>>>>> solution to remedy this would be to create a new QEMU QMP Response that 
>>>>> allows
>>>>> users to query for deprecated/unsupported features.
>>>>>
>>>>> This presents two parts of the puzzle: how to report deprecated features 
>>>>> to
>>>>> a user (libvirt) and how should libvirt handle this information.
>>>>>
>>>>> First, let's discuss the latter. The patch presented alongside this cover 
>>>>> letter
>>>>> attempts to solve the migration issue by hard-coding the CSSKE feature to 
>>>>> be
>>>>> disabled for all s390x CPU models. This is done by simply appending the 
>>>>> CSSKE
>>>>> feature with the disabled policy to the host-model.
>>>>>
>>>>> libvirt pseudo:
>>>>>
>>>>> if arch is s390x
>>>>>   set CSSKE to disabled for host-model
>>>>
>>>> That violates host-model semantics and possibly the user intend. There
>>>> would have to be some toggle to manually specify this, for example, a
>>>> new model type or a some magical flag.
>>>
>>> What we actually want to do is to disable csske completely from QEMU and
>>> thus from the host-model. Then it would not violate the spec.
>>> But this has all kind of issues (you cannot migrate from older versions
>>> of software and machines) although the hardware still can provide the 
>>> feature.
>>>
>>> The hardware guys promised me to deprecate things two generations earlier
>>> and we usually deprecate things that are not used or where software has a
>>> runtime switch.
>>>
>>>   From what I hear from you is that you do not want to modify the host-model
>>> semantics to something more useful but rather define a new thing (e.g. 
>>> "host-sane") ?
>>
>> My take would be, to keep the host model consistent, meaning, the
>> semantics in QEMU exactly match the semantics in Libvirt. It defines the
>> maximum CPU model that's runnable under KVM. If a feature is not
>> included (e.g., csske) that feature cannot be enabled in any way.
>>
>> The "host model" has the semantics of resembling the actual host CPU.
>> This is only partially true, because we support some features the host
>> might not support (e.g., zPCI IIRC) and obviously don't support all host
>> features in QEMU.
>>
>> So instead of playing games on the libvirt side with the host model, I
>> see the following alternatives:
>>
>> 1. Remove the problematic features from the host model in QEMU, like "we
>> just don't support this feature". Consequently, any migration of a VM
>> with csske=on to a new QEMU version will fail, similar to having an
>> older QEMU version without support for a certain feature.
>>
>> "host-passthrough" would change between QEMU versions ... which I see as
>> problematic.
>>
>> 2. Introduce a new CPU model that has these new semantics: "host model"
>> - deprecated features. Migration of older VMs with csske=on to a new
>> QEMU version will work. Make libvirt use/expand that new CPU model
>>
>> It doesn't necessarily have to be an actual new cpu model. We can use a
>> feature group, like "-cpu host,deprectated-features=false". What's
>> inside "deprecated-features" will actually change between QEMU versions,
>> but we don't really care, as the expanded CPU model won't change.
>>
>> "host-passthrough" won't change between QEMU versions ...
>>
>&

Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation

2022-03-15 Thread David Hildenbrand
On 15.03.22 16:58, David Hildenbrand wrote:
> On 11.03.22 13:44, Christian Borntraeger wrote:
>>
>>
>> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>>> On 11.03.22 05:17, Collin Walling wrote:
>>>> The s390x architecture has a growing list of features that will no longer
>>>> be supported on future hardware releases. This introduces an issue with
>>>> migration such that guests, running on models with these features enabled,
>>>> will be rejected outright by machines that do not support these features.
>>>>
>>>> A current example is the CSSKE feature that has been deprecated for some 
>>>> time.
>>>> It has been publicly announced that gen15 will be the last release to
>>>> support this feature, however we have postponed this to gen16a. A possible
>>>> solution to remedy this would be to create a new QEMU QMP Response that 
>>>> allows
>>>> users to query for deprecated/unsupported features.
>>>>
>>>> This presents two parts of the puzzle: how to report deprecated features to
>>>> a user (libvirt) and how should libvirt handle this information.
>>>>
>>>> First, let's discuss the latter. The patch presented alongside this cover 
>>>> letter
>>>> attempts to solve the migration issue by hard-coding the CSSKE feature to 
>>>> be
>>>> disabled for all s390x CPU models. This is done by simply appending the 
>>>> CSSKE
>>>> feature with the disabled policy to the host-model.
>>>>
>>>> libvirt pseudo:
>>>>
>>>> if arch is s390x
>>>>  set CSSKE to disabled for host-model
>>>
>>> That violates host-model semantics and possibly the user intend. There
>>> would have to be some toggle to manually specify this, for example, a
>>> new model type or a some magical flag.
>>
>> What we actually want to do is to disable csske completely from QEMU and
>> thus from the host-model. Then it would not violate the spec.
>> But this has all kind of issues (you cannot migrate from older versions
>> of software and machines) although the hardware still can provide the 
>> feature.
>>
>> The hardware guys promised me to deprecate things two generations earlier
>> and we usually deprecate things that are not used or where software has a
>> runtime switch.
>>
>>  From what I hear from you is that you do not want to modify the host-model
>> semantics to something more useful but rather define a new thing (e.g. 
>> "host-sane") ?
> 
> My take would be, to keep the host model consistent, meaning, the
> semantics in QEMU exactly match the semantics in Libvirt. It defines the
> maximum CPU model that's runnable under KVM. If a feature is not
> included (e.g., csske) that feature cannot be enabled in any way.
> 
> The "host model" has the semantics of resembling the actual host CPU.
> This is only partially true, because we support some features the host
> might not support (e.g., zPCI IIRC) and obviously don't support all host
> features in QEMU.
> 
> So instead of playing games on the libvirt side with the host model, I
> see the following alternatives:
> 
> 1. Remove the problematic features from the host model in QEMU, like "we
> just don't support this feature". Consequently, any migration of a VM
> with csske=on to a new QEMU version will fail, similar to having an
> older QEMU version without support for a certain feature.
> 
> "host-passthrough" would change between QEMU versions ... which I see as
> problematic.
> 
> 2. Introduce a new CPU model that has these new semantics: "host model"
> - deprecated features. Migration of older VMs with csske=on to a new
> QEMU version will work. Make libvirt use/expand that new CPU model
> 
> It doesn't necessarily have to be an actual new cpu model. We can use a
> feature group, like "-cpu host,deprectated-features=false". What's
> inside "deprecated-features" will actually change between QEMU versions,
> but we don't really care, as the expanded CPU model won't change.
> 
> "host-passthrough" won't change between QEMU versions ...
> 
> 3. As Daniel suggested, don't use the host model, but a CPU model
> indicated as "suggested".
> 
> The real issue is that in reality, we don't simply always use a model
> like "gen15a", but usually want optional features, if they are around.
> Prime examples are "sie" and friends.
> 
> 
> 
> I tend to prefer 2. With 3. I see issues with opt

Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation

2022-03-15 Thread David Hildenbrand
On 11.03.22 13:44, Christian Borntraeger wrote:
> 
> 
> Am 11.03.22 um 10:30 schrieb David Hildenbrand:
>> On 11.03.22 05:17, Collin Walling wrote:
>>> The s390x architecture has a growing list of features that will no longer
>>> be supported on future hardware releases. This introduces an issue with
>>> migration such that guests, running on models with these features enabled,
>>> will be rejected outright by machines that do not support these features.
>>>
>>> A current example is the CSSKE feature that has been deprecated for some 
>>> time.
>>> It has been publicly announced that gen15 will be the last release to
>>> support this feature, however we have postponed this to gen16a. A possible
>>> solution to remedy this would be to create a new QEMU QMP Response that 
>>> allows
>>> users to query for deprecated/unsupported features.
>>>
>>> This presents two parts of the puzzle: how to report deprecated features to
>>> a user (libvirt) and how should libvirt handle this information.
>>>
>>> First, let's discuss the latter. The patch presented alongside this cover 
>>> letter
>>> attempts to solve the migration issue by hard-coding the CSSKE feature to be
>>> disabled for all s390x CPU models. This is done by simply appending the 
>>> CSSKE
>>> feature with the disabled policy to the host-model.
>>>
>>> libvirt pseudo:
>>>
>>> if arch is s390x
>>>  set CSSKE to disabled for host-model
>>
>> That violates host-model semantics and possibly the user intend. There
>> would have to be some toggle to manually specify this, for example, a
>> new model type or a some magical flag.
> 
> What we actually want to do is to disable csske completely from QEMU and
> thus from the host-model. Then it would not violate the spec.
> But this has all kind of issues (you cannot migrate from older versions
> of software and machines) although the hardware still can provide the feature.
> 
> The hardware guys promised me to deprecate things two generations earlier
> and we usually deprecate things that are not used or where software has a
> runtime switch.
> 
>  From what I hear from you is that you do not want to modify the host-model
> semantics to something more useful but rather define a new thing (e.g. 
> "host-sane") ?

My take would be, to keep the host model consistent, meaning, the
semantics in QEMU exactly match the semantics in Libvirt. It defines the
maximum CPU model that's runnable under KVM. If a feature is not
included (e.g., csske) that feature cannot be enabled in any way.

The "host model" has the semantics of resembling the actual host CPU.
This is only partially true, because we support some features the host
might not support (e.g., zPCI IIRC) and obviously don't support all host
features in QEMU.

So instead of playing games on the libvirt side with the host model, I
see the following alternatives:

1. Remove the problematic features from the host model in QEMU, like "we
just don't support this feature". Consequently, any migration of a VM
with csske=on to a new QEMU version will fail, similar to having an
older QEMU version without support for a certain feature.

"host-passthrough" would change between QEMU versions ... which I see as
problematic.

2. Introduce a new CPU model that has these new semantics: "host model"
- deprecated features. Migration of older VMs with csske=on to a new
QEMU version will work. Make libvirt use/expand that new CPU model

It doesn't necessarily have to be an actual new cpu model. We can use a
feature group, like "-cpu host,deprectated-features=false". What's
inside "deprecated-features" will actually change between QEMU versions,
but we don't really care, as the expanded CPU model won't change.

"host-passthrough" won't change between QEMU versions ...

3. As Daniel suggested, don't use the host model, but a CPU model
indicated as "suggested".

The real issue is that in reality, we don't simply always use a model
like "gen15a", but usually want optional features, if they are around.
Prime examples are "sie" and friends.



I tend to prefer 2. With 3. I see issues with optional features like
"sie" and friends. Often, you really want "give me all you got, but
disable deprecated features that might cause problems in the future".

-- 
Thanks,

David / dhildenb



Re: [PATCH RFC 1/1] qemu: capabilities: disable csske for host cpu

2022-03-11 Thread David Hildenbrand
On 11.03.22 13:54, Christian Borntraeger wrote:
> Am 11.03.22 um 13:27 schrieb David Hildenbrand:
>> On 11.03.22 13:12, Christian Borntraeger wrote:
>>>
>>>
>>> Am 11.03.22 um 10:23 schrieb David Hildenbrand:
>>>> On 11.03.22 10:17, Daniel P. Berrangé wrote:
>>>>> On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote:
>>>>>> CPU models past gen16a will no longer support the csske feature. In
>>>>>> order to secure migration of guests running on machines that still
>>>>>> support this feature to machines that do not, let's disable csske
>>>>>> in the host-model.
>>>>
>>>> Sorry to say, removing CPU features is a no-go when wanting to guarantee
>>>> forward migration without taking care about CPU model details manually
>>>> and simply using the host model. Self-made HW vendor problem.
>>>
>>> And this simply does not reflect reality. Intel and Power have removed TX
>>> for example. We can now sit back and please ourselves how we live in our
>>> world of dreams. Or we can try to define an interface that deals with
>>> reality and actually solves problems.
>>>
>>
>> Ehm, so, I spell out the obvious and get such a reaction? Okay, thank you.
> 
> Sorry, reading my writing again shows that I clearly miscommunicated in a
> very bad style. My point was rather trying to solve a problem instead
> I wrote something up in a hurry which resulted in something offensive.
> 
> Please accept my apologies.
> 

No hard feelings, I understand that this is an important thing to sort
out for IBM.

-- 
Thanks,

David / dhildenb



Re: [PATCH RFC 1/1] qemu: capabilities: disable csske for host cpu

2022-03-11 Thread David Hildenbrand
On 11.03.22 13:12, Christian Borntraeger wrote:
> 
> 
> Am 11.03.22 um 10:23 schrieb David Hildenbrand:
>> On 11.03.22 10:17, Daniel P. Berrangé wrote:
>>> On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote:
>>>> CPU models past gen16a will no longer support the csske feature. In
>>>> order to secure migration of guests running on machines that still
>>>> support this feature to machines that do not, let's disable csske
>>>> in the host-model.
>>
>> Sorry to say, removing CPU features is a no-go when wanting to guarantee
>> forward migration without taking care about CPU model details manually
>> and simply using the host model. Self-made HW vendor problem.
> 
> And this simply does not reflect reality. Intel and Power have removed TX
> for example. We can now sit back and please ourselves how we live in our
> world of dreams. Or we can try to define an interface that deals with
> reality and actually solves problems.
> 

Ehm, so, I spell out the obvious and get such a reaction? Okay, thank you.

See my other reply, maybe we want a different kind of "host-model" from
QEMU.

It's Friday and I'm not particularly motivated to participate further in
this discussion today. So I'm going to step away for today, please
myself and live in a world of dreams.

-- 
Thanks,

David / dhildenb



Re: [PATCH RFC 0/1] s390x CPU Model Feature Deprecation

2022-03-11 Thread David Hildenbrand
On 11.03.22 05:17, Collin Walling wrote:
> The s390x architecture has a growing list of features that will no longer
> be supported on future hardware releases. This introduces an issue with
> migration such that guests, running on models with these features enabled,
> will be rejected outright by machines that do not support these features.
> 
> A current example is the CSSKE feature that has been deprecated for some 
> time. 
> It has been publicly announced that gen15 will be the last release to
> support this feature, however we have postponed this to gen16a. A possible
> solution to remedy this would be to create a new QEMU QMP Response that allows
> users to query for deprecated/unsupported features.
> 
> This presents two parts of the puzzle: how to report deprecated features to
> a user (libvirt) and how should libvirt handle this information.
> 
> First, let's discuss the latter. The patch presented alongside this cover 
> letter
> attempts to solve the migration issue by hard-coding the CSSKE feature to be
> disabled for all s390x CPU models. This is done by simply appending the CSSKE
> feature with the disabled policy to the host-model.
> 
> libvirt pseudo:
> 
> if arch is s390x
> set CSSKE to disabled for host-model

That violates host-model semantics and possibly the user intend. There
would have to be some toggle to manually specify this, for example, a
new model type or a some magical flag.

Gluing this to the "host-model" feels wrong.

The other concern I have is that deprecated features are a moving
target, and with a new QEMU version you could suddenly have more
deprecated features. Hm.


Maybe you'd want some kind of a host-based-model from QEMU that does
this automatically? I need more coffee to get creative on a name.

-- 
Thanks,

David / dhildenb



Re: [PATCH RFC 1/1] qemu: capabilities: disable csske for host cpu

2022-03-11 Thread David Hildenbrand
On 11.03.22 10:17, Daniel P. Berrangé wrote:
> On Thu, Mar 10, 2022 at 11:17:38PM -0500, Collin Walling wrote:
>> CPU models past gen16a will no longer support the csske feature. In 
>> order to secure migration of guests running on machines that still
>> support this feature to machines that do not, let's disable csske
>> in the host-model.

Sorry to say, removing CPU features is a no-go when wanting to guarantee
forward migration without taking care about CPU model details manually
and simply using the host model. Self-made HW vendor problem.

> 
> The problem scenario you describe is the intended semantics of
> host-model though. It enables all features available in the host
> that you launched on. It lets you live migrate to a target host
> with the same, or a greater number of features. If the target has
> a greater number of features, it should restrict the VM to the
> subset of features that were present on the original source CPU.
> If the target has fewer features, then you simply can't live
> migrate a VM using host-model.
> 
> To get live migration in both directions across CPUs with differing
> featuresets, then the VM needs to be configured with a named CPU
> model that is a subset of both, rather than host-model.

Right, and cpu-model-baseline does that job for you if you're lazy to
lookup the proper model.

-- 
Thanks,

David / dhildenb



Re: [private] Re: [PATCH v5 00/16] Introduce virtio-mem model

2021-09-21 Thread David Hildenbrand

On 21.09.21 11:33, David Hildenbrand wrote:

On 13.09.21 16:52, Michal Privoznik wrote:

v4 of:

https://listman.redhat.com/archives/libvir-list/2021-June/msg00679.html

diff to v4:
- Rebased onto current master
- Worked in David's suggestions, e.g. rename from  to
, implemented offline memory update, implemented --node
argument to virsh update-memory-device, prealloc is OFF and reserve is
ON for virtio-mem

Some suggestions are left as future work. For instance:
- Don't require memory slots because virtio-mem lives on PCI bus anyway
- Allow path backed backend for virtio-mem
- support .prealloc for virtio-mem object (not memory-backend-* !)


I keep occasionally rebased version on my gitlab:

https://gitlab.com/MichalPrivoznik/libvirt/-/commits/virtio_mem_v5/



Hi Michal,

I noticed one minor thing:

If I start a VM with

  


  
  ...
  

  16777216
  0
  2048
  2097152



  
  

  16777216
  1
  2048
  2097152



  


I get after it booted up

41943040
37748736
9437184
  

  16777216
  0
  2048
  2097152
  131072



  
  

  16777216
  1
  2048
  2097152
  2097152



  

Note the "131072". Inside of the guest, I can see 
that it really is 2G.

If I trigger a new "virsh update-memory-device Fedora34", it gets updated
properly.

I assume the devices get initialized in the guest in parallel. Could it be that
libvirt gets confused when there are concurrent notifications about
two devices or could it be QEMU accidentally swallows some events? I'll 
investigate the
latter regarding rate limiting the output in QEMU, maybe something is messed up.


QEMU BUG, will send a fix :)


--
Thanks,

David / dhildenb



[private] Re: [PATCH v5 00/16] Introduce virtio-mem model

2021-09-21 Thread David Hildenbrand

On 13.09.21 16:52, Michal Privoznik wrote:

v4 of:

https://listman.redhat.com/archives/libvir-list/2021-June/msg00679.html

diff to v4:
- Rebased onto current master
- Worked in David's suggestions, e.g. rename from  to
   , implemented offline memory update, implemented --node
   argument to virsh update-memory-device, prealloc is OFF and reserve is
   ON for virtio-mem

Some suggestions are left as future work. For instance:
- Don't require memory slots because virtio-mem lives on PCI bus anyway
- Allow path backed backend for virtio-mem
- support .prealloc for virtio-mem object (not memory-backend-* !)


I keep occasionally rebased version on my gitlab:

https://gitlab.com/MichalPrivoznik/libvirt/-/commits/virtio_mem_v5/



Hi Michal,

I noticed one minor thing:

If I start a VM with


  
  

...

  
16777216
0
2048
2097152
  
  
  


  
16777216
1
2048
2097152
  
  
  



I get after it booted up

  41943040
  37748736
  9437184

  
16777216
0
2048
2097152
131072
  
  
  


  
16777216
1
2048
2097152
2097152
  
  
  


Note the "131072". Inside of the guest, I can see 
that it really is 2G.

If I trigger a new "virsh update-memory-device Fedora34", it gets updated
properly.

I assume the devices get initialized in the guest in parallel. Could it be that
libvirt gets confused when there are concurrent notifications about
two devices or could it be QEMU accidentally swallows some events? I'll 
investigate the
latter regarding rate limiting the output in QEMU, maybe something is messed up.


--
Thanks,

David / dhildenb



Re: [PATCH v5 05/16] qemu: Build command line for virtio-mem

2021-09-21 Thread David Hildenbrand

On 21.09.21 10:56, Michal Prívozník wrote:

On 9/17/21 2:54 PM, David Hildenbrand wrote:

On 15.09.21 13:07, Michal Prívozník wrote:

On 9/14/21 3:05 PM, David Hildenbrand wrote:

On 13.09.21 16:52, Michal Privoznik wrote:

Nothing special is happening here. All important changes were
done when for 'virtio-pmem' (adjusting the code to put virtio
memory on PCI bus, generating alias using
qemuDomainDeviceAliasIndex(). The only bit that might look
suspicious is no prealloc for virtio-mem. But if you think about
it, the whole purpose of this device is to change amount of
memory exposed to guest on the fly. There is no point in locking
the whole backend in memory.


Do I understand correctly that we'll always try setting "reserve=off",
and "prealloc=off"? That would be awesome :)


prealloc=off would be set (almost) unconditionally for all
memory-backend-* objects that are associated with virtio-mem device. And
if QEMU is new enough to have .reserve attribute then reserve=off is set
too.



Ah, perfect.



I do wonder if we want to warn or bail out if "priv->memPrealloc" is set
but we are temporarily not able to support it -- as discussed, because
virtio-mem in QEMU yet has to be taught about it.


priv->memPrealloc might not be what you think it is. The fact whether
immediate allocation was requested is stored in def->mem.allocation; and
priv->memPrealloc is just there to track whether -mem-prealloc what
already put onto (paritally) generated cmd line and thus the rest of
generators must refrain from setting prealloc=on.

Codewise speaking, if you'd look at qemuBuildCommandLine() (this is
where cmd line generation happens) then you'd see
qemuBuildMemCommandLine() being called first and only after that
qemuBuildMemoryDeviceCommandLine() being called second.

The former is responsible for generating generic memory for guest (e.g.
-m XX -mem-prealloc -mem-path ...  which is the old style, nowadays a
combination of -machine memory-backend= + -object memory-backend-*  is
generated).

Long story short, if priv->memPrealloc isn't true at this point, then
libvirt doesn't have to set prealloc=off explicitly, because off is the
default.


That makes sense. Yet I wonder if we should warn that preallocation is
currently not supported for virtio-mem and will be ignored. Your
decision :)


We could do that, yes. Let me respin the series with that change - these
patches don't apply cleanly anymore anyway. Should a similar warning be
produced when hugepages are configured?


I think it might make sense to print a warning if we cannot support 
prealloc and it was requested. We can then skip the warning once 
virtio-mem supports the "prealloc" property that we can just enable.


Using hugepages without prealloc is in general unsafe, so I don't think 
we need a virtio-mem specific warning. :)


Thanks!

--
Thanks,

David / dhildenb



Re: [PATCH v5 05/16] qemu: Build command line for virtio-mem

2021-09-17 Thread David Hildenbrand

On 15.09.21 13:07, Michal Prívozník wrote:

On 9/14/21 3:05 PM, David Hildenbrand wrote:

On 13.09.21 16:52, Michal Privoznik wrote:

Nothing special is happening here. All important changes were
done when for 'virtio-pmem' (adjusting the code to put virtio
memory on PCI bus, generating alias using
qemuDomainDeviceAliasIndex(). The only bit that might look
suspicious is no prealloc for virtio-mem. But if you think about
it, the whole purpose of this device is to change amount of
memory exposed to guest on the fly. There is no point in locking
the whole backend in memory.


Do I understand correctly that we'll always try setting "reserve=off",
and "prealloc=off"? That would be awesome :)


prealloc=off would be set (almost) unconditionally for all
memory-backend-* objects that are associated with virtio-mem device. And
if QEMU is new enough to have .reserve attribute then reserve=off is set
too.



Ah, perfect.



I do wonder if we want to warn or bail out if "priv->memPrealloc" is set
but we are temporarily not able to support it -- as discussed, because
virtio-mem in QEMU yet has to be taught about it.


priv->memPrealloc might not be what you think it is. The fact whether
immediate allocation was requested is stored in def->mem.allocation; and
priv->memPrealloc is just there to track whether -mem-prealloc what
already put onto (paritally) generated cmd line and thus the rest of
generators must refrain from setting prealloc=on.

Codewise speaking, if you'd look at qemuBuildCommandLine() (this is
where cmd line generation happens) then you'd see
qemuBuildMemCommandLine() being called first and only after that
qemuBuildMemoryDeviceCommandLine() being called second.

The former is responsible for generating generic memory for guest (e.g.
-m XX -mem-prealloc -mem-path ...  which is the old style, nowadays a
combination of -machine memory-backend= + -object memory-backend-*  is
generated).

Long story short, if priv->memPrealloc isn't true at this point, then
libvirt doesn't have to set prealloc=off explicitly, because off is the
default.


That makes sense. Yet I wonder if we should warn that preallocation is 
currently not supported for virtio-mem and will be ignored. Your decision :)


Thanks a bunch!


--
Thanks,

David / dhildenb



Re: [PATCH v5 00/16] Introduce virtio-mem model

2021-09-14 Thread David Hildenbrand

On 14.09.21 14:34, David Hildenbrand wrote:

On 13.09.21 16:52, Michal Privoznik wrote:

v4 of:

https://listman.redhat.com/archives/libvir-list/2021-June/msg00679.html

diff to v4:
- Rebased onto current master
- Worked in David's suggestions, e.g. rename from  to
, implemented offline memory update, implemented --node
argument to virsh update-memory-device, prealloc is OFF and reserve is
ON for virtio-mem

Some suggestions are left as future work. For instance:
- Don't require memory slots because virtio-mem lives on PCI bus anyway
- Allow path backed backend for virtio-mem


Just a note that


  
  


is doing what it's supposed to do. So only explicit file paths are not
supported yet.


- support .prealloc for virtio-mem object (not memory-backend-* !)


I keep occasionally rebased version on my gitlab:

https://gitlab.com/MichalPrivoznik/libvirt/-/commits/virtio_mem_v5/


I just played with it and "virsh update-memory-device" is working like a
charm now:

a) with "--node"
b) with "--alias", including manually specified alias like ""
c) with --config, --live, --current

I see that "aliases" prefixed with "ua-" are an existing concept. Maybe
we want to cross-reference that in the virtio-mem documentation?

Nothing unusual found during my testing. I did not play with huge pages,
as it's initially not supported.


... and I just played with huge pages, and due to the added 
"reserve=off" it works just as expected, nice. (prealloc support to be 
added to make it actually safe to use)



--
Thanks,

David / dhildenb



Re: [PATCH v5 05/16] qemu: Build command line for virtio-mem

2021-09-14 Thread David Hildenbrand

On 13.09.21 16:52, Michal Privoznik wrote:

Nothing special is happening here. All important changes were
done when for 'virtio-pmem' (adjusting the code to put virtio
memory on PCI bus, generating alias using
qemuDomainDeviceAliasIndex(). The only bit that might look
suspicious is no prealloc for virtio-mem. But if you think about
it, the whole purpose of this device is to change amount of
memory exposed to guest on the fly. There is no point in locking
the whole backend in memory.


Do I understand correctly that we'll always try setting "reserve=off", 
and "prealloc=off"? That would be awesome :)


I do wonder if we want to warn or bail out if "priv->memPrealloc" is set 
but we are temporarily not able to support it -- as discussed, because 
virtio-mem in QEMU yet has to be taught about it.


FYI: QEMU will bail out if "reserve=off,prealloc=on" is set in combination.



Signed-off-by: Michal Privoznik 
---
  src/qemu/qemu_alias.c |  9 +++-
  src/qemu/qemu_command.c   | 24 +--
  ...mory-hotplug-virtio-mem.x86_64-latest.args | 41 +++
  tests/qemuxml2argvtest.c  |  1 +
  4 files changed, 70 insertions(+), 5 deletions(-)
  create mode 100644 
tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args

diff --git a/src/qemu/qemu_alias.c b/src/qemu/qemu_alias.c
index 79e8953b2f..81a1e7eeed 100644
--- a/src/qemu/qemu_alias.c
+++ b/src/qemu/qemu_alias.c
@@ -475,8 +475,11 @@ qemuDeviceMemoryGetAliasID(virDomainDef *def,
  size_t i;
  int maxidx = 0;
  
-/* virtio-pmem goes onto PCI bus and thus DIMM address is not valid */

-if (!oldAlias && mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM)
+/* virtio-pmem and virtio-mem go onto PCI bus and thus DIMM address is not
+ * valid */
+if (!oldAlias &&
+mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_PMEM &&
+mem->model != VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM)
  return mem->info.addr.dimm.slot;
  
  for (i = 0; i < def->nmems; i++) {

@@ -523,6 +526,8 @@ qemuAssignDeviceMemoryAlias(virDomainDef *def,
  prefix = "virtiopmem";
  break;
  case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:
+prefix = "virtiomem";
+break;
  case VIR_DOMAIN_MEMORY_MODEL_NONE:
  case VIR_DOMAIN_MEMORY_MODEL_LAST:
  default:
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index 594e5643b1..91f3094ac8 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -3136,9 +3136,19 @@ qemuBuildMemoryBackendProps(virJSONValue **backendProps,
  virJSONValueObjectAdd(props, "b:x-use-canonical-path-for-ramblock-id", 
false, NULL) < 0)
  return -1;
  
-if (!priv->memPrealloc &&

-virJSONValueObjectAdd(props, "B:prealloc", prealloc, NULL) < 0)
-return -1;
+if (mem->model == VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM) {
+/* Explicitly disable prealloc for virtio-mem */
+if (priv->memPrealloc &&
+virJSONValueObjectAppendBoolean(props, "prealloc", 0) < 0)
+return -1;
+if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MEMORY_BACKEND_RESERVE) &&
+virJSONValueObjectAppendBoolean(props, "reserve", 0) < 0)
+return -1;
+} else {
+if (!priv->memPrealloc &&
+virJSONValueObjectAdd(props, "B:prealloc", prealloc, NULL) < 0)
+return -1;
+}
  
  if (virJSONValueObjectAdd(props, "U:size", mem->size * 1024, NULL) < 0)

  return -1;
@@ -3318,6 +3328,9 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def,
  break;
  
  case VIR_DOMAIN_MEMORY_MODEL_VIRTIO_MEM:

+device = "virtio-mem-pci";
+break;
+
  case VIR_DOMAIN_MEMORY_MODEL_NONE:
  case VIR_DOMAIN_MEMORY_MODEL_LAST:
  default:
@@ -3334,6 +3347,11 @@ qemuBuildMemoryDeviceStr(const virDomainDef *def,
  if (mem->labelsize)
  virBufferAsprintf(, "label-size=%llu,", mem->labelsize * 1024);
  
+if (mem->blocksize) {

+virBufferAsprintf(, "block-size=%llu,", mem->blocksize * 1024);
+virBufferAsprintf(, "requested-size=%llu,", mem->requestedsize * 
1024);
+}
+
  if (mem->uuid) {
  char uuidstr[VIR_UUID_STRING_BUFLEN];
  
diff --git a/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args

new file mode 100644
index 00..22ee1bc459
--- /dev/null
+++ b/tests/qemuxml2argvdata/memory-hotplug-virtio-mem.x86_64-latest.args
@@ -0,0 +1,41 @@
+LC_ALL=C \
+PATH=/bin \
+HOME=/tmp/lib/domain--1-QEMUGuest1 \
+USER=test \
+LOGNAME=test \
+XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \
+XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \
+XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \
+/usr/bin/qemu-system-i386 \
+-name guest=QEMUGuest1,debug-threads=on \
+-S \
+-object 

Re: [PATCH v5 00/16] Introduce virtio-mem model

2021-09-14 Thread David Hildenbrand

On 13.09.21 16:52, Michal Privoznik wrote:

v4 of:

https://listman.redhat.com/archives/libvir-list/2021-June/msg00679.html

diff to v4:
- Rebased onto current master
- Worked in David's suggestions, e.g. rename from  to
   , implemented offline memory update, implemented --node
   argument to virsh update-memory-device, prealloc is OFF and reserve is
   ON for virtio-mem

Some suggestions are left as future work. For instance:
- Don't require memory slots because virtio-mem lives on PCI bus anyway
- Allow path backed backend for virtio-mem


Just a note that

  


  

is doing what it's supposed to do. So only explicit file paths are not 
supported yet.



- support .prealloc for virtio-mem object (not memory-backend-* !)


I keep occasionally rebased version on my gitlab:

https://gitlab.com/MichalPrivoznik/libvirt/-/commits/virtio_mem_v5/


I just played with it and "virsh update-memory-device" is working like a 
charm now:


a) with "--node"
b) with "--alias", including manually specified alias like "name='ua-virtiomem1'/>"

c) with --config, --live, --current

I see that "aliases" prefixed with "ua-" are an existing concept. Maybe 
we want to cross-reference that in the virtio-mem documentation?


Nothing unusual found during my testing. I did not play with huge pages, 
as it's initially not supported.


Thanks a bunch!

--
Thanks,

David / dhildenb



Re: [PATCH v4 for v7.6.0 00/14] Introduce virtio-mem model

2021-09-13 Thread David Hildenbrand

On 13.09.21 08:53, Michal Prívozník wrote:

On 7/7/21 12:30 PM, David Hildenbrand wrote:

On 23.06.21 12:12, Michal Privoznik wrote:

v4 of:

https://listman.redhat.com/archives/libvir-list/2021-April/msg01138.html

diff to v3:
- Rebased code on the top of master
- Tried to work in all Peter's review suggestions
- Fixed a bug where adjusting  was viewed as hotplug of new
     by XML validator and thus if  was close
enough to
     the validator reported an error (this was reported by
    David).



Hi Michal,





6. 4k source results in an error

     
   4096
   0-1
     

"error: internal error: Unable to find any usable hugetlbfs mount for
4096 KiB"

This example is taken from https://libvirt.org/formatdomain.html for
DIMMs. Not sure what the expected behavior is.


Just realized that this IS expected behavior. 4096KiB pages (=4MiB) are
not regular system pages (4KiB). Thus libvirt is trying to find
hugetlbfs mount point that's serving 4MiB pages, unsuccessfully. I'll
post a patch that fixes the example though.


Ah, very right. I blindly copied the example ... make sense to me and 
the error message is correct.


--
Thanks,

David / dhildenb



Re: [PATCH v4 for v7.6.0 00/14] Introduce virtio-mem model

2021-09-10 Thread David Hildenbrand


Hi,

sorry for replying this late.


Thanks for looking into this. It's a fairly long list, so it's 
understandable that it took a while. :)





5. Slot handling.

As already discussed, virtio-mem and virtio-pmem don't need slots. Yet,
the "slots" definition is required and libvirt reserves once slot for
each such device ("error: unsupported configuration: memory device count
'2' exceeds slots count '1'"). This is certainly future work, if we ever
want to change that.


I can look into this.


Yeah, but it can certainly be considered as future work as well.



7. File source gets silently dropped

     
   /dev/shmem/vm0
     

The statement gets silently dropped, which is somewhat surprising.
However, I did not test what happens with DIMMs, maybe it's the same.


Yeah, this is somewhat expected. I mean, the part that's expected is
that libvirt drops parts it doesn't parse. Sometimes it is pretty
obvious () and sometimes it's not
so (like in your example when  makes sense for other memory
models like virtio-pmem). But just to be sure - since virtio-mem can be
backed by any memory-backing-* backend, does it make sense to have
 there? So far my code would use memory-backend-file for
hugepages only.


So it could be backed by a file residing on a filesystem that supports 
sparse files (shmem, hugetlbfs, ext4, ...) -- IOW anything modern :)


It's supposed to work, but if it makes your life easier, we can consider 
supporting other file sources future work.







8. Global preallocation of memory

With


 


we also get "prealloc=on" set for the memory backends of the virito-mem
devices, which is sub-optimal, because we end up preallocating all
memory of the memory backend (which is unexpected for a virtio-mem
device) and virtio-mem will then discard all memory immediately again.
So it's essentially a dangerous NOP -- dangerous because we temporarily
consume a lot of memory.

In an ideal world, we would not set this for the memory backend used for
the virtio-mem devices, but for the virtio-mem devices themselves, such
that preallocation happens when new memory blocks are actually exposed
to the VM.

As virtio-mem does not support "prealloc=on" for virtio-mem devices yet,
this is future work. We might want to error out, though, if  is used along with virtio-mem devices for now. I'm
planning on implementing this in QEMU soon. Until then, it might also be
good enough to simply document that this setup should be avoided.


Right. Meanwhile this was implemented in QEMU and thus I can drop
prealloc=on. But then my question is what happens when user wants to
expose additional memory to the guest but doesn't have enough free
hugepages in the pool? Libvirt's using prealloc=on so that QEMU doesn't
get killed later, after the guest booted up.


So "prealloc=on" support for virtio-mem is unfortunately not part of 
QEMU yet (only "reserve=off" for memory backends). As you correctly 
state, until that is in place, huge pages cannot be used in a safe way 
with virtio-mem, which is why they are not officially supported yet by 
virtio-mem.


The idea is to specify "prealloc=on" on the virtio-mem device level once 
supported, instead of on the memory backend level. So virtio-mem will 
preallocate the relevant memory before actually giving new block to the 
guest via virtio-mem, not when creating the memory backend. If 
preallocation fails at that point, no new blocks are given to the guest 
and we won't get killed.


Think of it like this: you defer preallocation to the point where you 
actually use the memory and handle preallocation errors still in a safe way.


More details are below.






9. Memfd and huge pages


     


and


   
     2048
   
   ...



I get on the QEMU cmdline

"-object
{"qom-type":"memory-backend-memfd","id":"memvirtiomem0","hugetlb":true,"hugetlbsize":2097152,"share":true,"prealloc":true,"size":17179869184}"


Dropping "the memfd" source I get on the QEMU cmdline:

-object^@{"qom-type":"memory-backend-file","id":"memvirtiomem0","mem-path":"/dev/hugepages/libvirt/qemu/2-Fedora34-2","share":true,"size":17179869184}


"prealloc":true should not have been added for virtio-mem in case of
memfd. !memfd does what's expected.




Okay, I will fix this. But can you shed more light here? I mean, why the
difference?


Assume you want a 1TB virtio-mem device backed by huge pages but 
initially only expose 1GB to the VM.


When setting prealloc=on on the memory backend, we will preallocate 1TB 
of huge pages when starting QEMU to discard the memory immediately again 
within virtio-mem startup code (first thing it does is make sure there 
is no memory backing at all, meaning the memory backend is completely 
"empty"). We end up with no preallocated memory and temporarily having 
allocated 1 TB.


When setting "prealloc=on" (once supported) on the virtio-mem device 
instead, we'll preallocate memory dynamically as we hand it to the VM -- 
so initially only 1GB.


Assume we want to give the VM an 

Re: [PATCH v4 for v7.6.0 00/14] Introduce virtio-mem model

2021-07-07 Thread David Hildenbrand

On 23.06.21 12:12, Michal Privoznik wrote:

v4 of:

https://listman.redhat.com/archives/libvir-list/2021-April/msg01138.html

diff to v3:
- Rebased code on the top of master
- Tried to work in all Peter's review suggestions
- Fixed a bug where adjusting  was viewed as hotplug of new
by XML validator and thus if  was close enough to
the validator reported an error (this was reported by
   David).



Hi Michal,

I just retested with this version and it mostly works as expected. I 
tested quite some memory configurations and have some comments / reports :)


I tested successfully:
- 1 node with one device
- 2 nodes with one device on each node
- 2 nodes with two devices on one node
- "virsh update-memory-device" on live domains -- works great
- huge pages and anonymous memory with access=private and access=shared.
  There is only one issue with hugepages and memfd (prealloc=on gets
  set).
- shared memory on memfd and anonymous memory (-> shared file) with
  access=shared

I only tested on a single host NUMA node so far, but don't expect 
surprises with host numa policies.



1. "virsh update-memory-device" and stopped domains

Once I have more than one virtio-mem device defined for a VM, "virsh 
update-memory-device" cannot be used anymore as aliases don't seem to be 
available on stopped VMs. If I manually define an alias on a stopped VM, 
the alias silently gets dropped. Is there any way to identify a 
virtio-mem device on a stopped domain?



2. "virsh update-memory-device" with --config on a running domain

# virsh update-memory-device "Fedora34" --config --alias "virtiomem1" 
--requested-size 16G

error: no memory device found

I guess the issue is again, that alias don't apply to the "!live" XML. 
So the "--config" option doesn't really work when having more than one 
virtio-mem device defined for a VM.



3. "virsh update-memory-device" and nodes

In addition to "--alias", something like "--node" would also be nice to 
have -- assuming there is only a single virtio-mem device per NUMA node, 
which is usually the case. For example:


"virsh update-memory-device "Fedora34" --node 1 --requested-size 16G" 
could come in handy. This would also work on "!live" domains.



4. "actual" vs. "current"

"16777216" I wonder if "current" instead of 
"actual" would be more in line with "currentMemory". But no strong opinion.



5. Slot handling.

As already discussed, virtio-mem and virtio-pmem don't need slots. Yet, 
the "slots" definition is required and libvirt reserves once slot for 
each such device ("error: unsupported configuration: memory device count 
'2' exceeds slots count '1'"). This is certainly future work, if we ever 
want to change that.



6. 4k source results in an error


  4096
  0-1


"error: internal error: Unable to find any usable hugetlbfs mount for 
4096 KiB"


This example is taken from https://libvirt.org/formatdomain.html for 
DIMMs. Not sure what the expected behavior is.



7. File source gets silently dropped


  /dev/shmem/vm0


The statement gets silently dropped, which is somewhat surprising. 
However, I did not test what happens with DIMMs, maybe it's the same.



8. Global preallocation of memory

With





we also get "prealloc=on" set for the memory backends of the virito-mem 
devices, which is sub-optimal, because we end up preallocating all 
memory of the memory backend (which is unexpected for a virtio-mem 
device) and virtio-mem will then discard all memory immediately again. 
So it's essentially a dangerous NOP -- dangerous because we temporarily 
consume a lot of memory.


In an ideal world, we would not set this for the memory backend used for 
the virtio-mem devices, but for the virtio-mem devices themselves, such 
that preallocation happens when new memory blocks are actually exposed 
to the VM.


As virtio-mem does not support "prealloc=on" for virtio-mem devices yet, 
this is future work. We might want to error out, though, if mode="immediate"\> is used along with virtio-mem devices for now. I'm 
planning on implementing this in QEMU soon. Until then, it might also be 
good enough to simply document that this setup should be avoided.



9. Memfd and huge pages





and


  
2048
  
  ...



I get on the QEMU cmdline

"-object 
{"qom-type":"memory-backend-memfd","id":"memvirtiomem0","hugetlb":true,"hugetlbsize":2097152,"share":true,"prealloc":true,"size":17179869184}"


Dropping "the memfd" source I get on the QEMU cmdline:

-object^@{"qom-type":"memory-backend-file","id":"memvirtiomem0","mem-path":"/dev/hugepages/libvirt/qemu/2-Fedora34-2","share":true,"size":17179869184}

"prealloc":true should not have been added for virtio-mem in case of 
memfd. !memfd does what's expected.



10. Memory locking

With





virtio-mem fails with

"qemu-system-x86_64: -device 
virtio-mem-pci,node=0,block-size=2097152,requested-size=0,memdev=memvirtiomem0,id=virtiomem0,bus=pci.0,addr=0x2: 
Incompatible with mlock"



Re: [PATCH v3 00/15] Introduce virtio-mem model

2021-06-17 Thread David Hildenbrand

On 17.06.21 14:42, Peter Krempa wrote:

On Thu, Jun 17, 2021 at 14:34:21 +0200, David Hildenbrand wrote:

On 17.06.21 14:17, Peter Krempa wrote:

On Thu, Jun 17, 2021 at 14:03:44 +0200, David Hildenbrand wrote:


[...]



4. QEMU does no longer require a "slots" specification when maxmem is set
(because virtio-based memory devices don't require ACPI memory module
slots).

Specifying "20971520" results in (IMHO
confusing) error:

   "error: XML document failed to validate against schema: Unable to
validate doc against /usr/local/share/libvirt/schemas/domain.rng
Extra element maxMemory in interleave
Invalid sequence in interleave
Element domain failed to validate content"


"Extra element maxMemory in interleave
Invalid sequence in interleave
Element domain failed to validate content"


This is because the XML schema for maxMemory requires the element.
Unfortunately the XML schema validator from libxml2 has extremely
user-unfriendly errors.


Specifying "20971520" results in
   "error: XML error: both maximum memory size and memory slot count must
be specified"


This is the error from the XML parser which has human-crafter errors.


However, older QEMU version have that requirement. Supported since 3.0 I
think:

$ git tag --contains 951f2269af2
...
v3.0.0
...


Is there a way how to detect that it's not needed? E.g. via the QMP
schema or such?

In this instance we could perhaps drop it and let qemu report the error
... well if it's reasonable though.



I guess only implicitly, for example, if virtio-mem-pci or 
virtio-pmem-pci are possible (I assume supported devices can be queried 
via QMP). And without these devices, it doesn't make sense to have 
"slots=0".


virtio-pmem-pci was introduced with v4.1.0, virtio-mem-pci was 
introduced with v5.1.0.


--
Thanks,

David / dhildenb



Re: [PATCH v3 00/15] Introduce virtio-mem model

2021-06-17 Thread David Hildenbrand

On 17.06.21 14:17, Peter Krempa wrote:

On Thu, Jun 17, 2021 at 14:03:44 +0200, David Hildenbrand wrote:

On 17.06.21 13:18, Michal Prívozník wrote:

On 6/17/21 11:44 AM, David Hildenbrand wrote:

On 17.03.21 12:57, Michal Privoznik wrote:

v3 of:

https://listman.redhat.com/archives/libvir-list/2021-February/msg00961.html


diff to v2:
- Dropped code that forbade use of virtio-mem and memballoon at the same
     time;
- This meant that I had to adjust memory accounting,
     qemuDomainSetMemoryFlags() - see patches 11/15 and 12/15 which are
new.
- Fixed small nits raised by Peter in his review of v2




Hi Michal, do you have a branch somewhere that I can easily checkout to
play with it/test it?



Yes:

https://gitlab.com/MichalPrivoznik/libvirt/-/tree/virtio_mem_v4

There were some comments in Peter's review and I really should fix my
code according to them and merge/send v4.

Michal



Thanks! I started with a single virtio-mem device.

1. NUMA requirement

Right now, one can really only configure "maxMemory" with NUMA specified,
otherwise there will be "error: unsupported configuration: At least
one numa node has to be configured when enabling memory hotplug".

I recall this is a limitation of older QEMU which would not create ACPI SRAT
tables otherwise. In QEMU, this is no longer the case. As soon as "maxmem"
is specified on the QEMU cmdline, we fake a single NUMA node:

hw/core/numa.c:numa_complete_configuration()

"Enable NUMA implicitly by adding a new NUMA node automatically"
-> m->auto_enable_numa_with_memdev / mc->auto_enable_numa_with_memhp

m->auto_enable_numa_with_memdev (slots=0) is set since 5.1 on x86-64 and arm64
m->auto_enable_numa_with_memhp (slots>0) is set since 2.11 on x86-64 and 4.1 on 
arm64

So in theory, with newer QEMU on x86-64 and arm64 we could drop that
limitation in libvirt (might require some changes eventually
regarding the "node" specification handling). ppc64 shouldn't care as there is 
no ACPI.


The main reason for the check to be present is actually exactly what you
are describing. qemu fakes the numa node in case none are configured,
this means that in case where libvirt would not enforce that you'd get a
discrepancy between the config and what qemu exposes.



Right, but it fakes a single NUMA node just for the purpose of creating 
the ACPI SRAT. (the same thing Linux will do implicitly if there are no 
NUMA nodes, because no NUMA == single NUMA node)


Anyhow, just something I found awkward to run into, because QEMU doesn't 
have this limitation anymore and that handling is properly glued to 
compatibility machines so there is no change when migrating etc. But 
it's certainly more a "nice to have for smaller XML files" :)


I guess auto_enable_numa handling might result in a different QEMU 
behavior (like the result for HMP "info numa" etc.), so if we'd ever 
want to go down that path, we'd have to double check what the effects are.




[...]


3. "memory" item handling

Whenever I edit the XML and set e.g., "4", it's silently 
converted back to "20 GiB".
Maybe that's just always implicitly calculated from the NUMA spec and the 
defined devices.


In cases where you've got numa confuigured, the  element is
re-calculated back from the size of the numa nodes, as when you change
the value there isn't any obvious algorithm on picking NUMA nodes where
to pull from.


Makes sense. Another minor thing that is similar to 1. (suboptimal but 
certainly no show stopper)



4. QEMU does no longer require a "slots" specification when maxmem is 
set (because virtio-based memory devices don't require ACPI memory 
module slots).


Specifying "20971520" results in (IMHO 
confusing) error:


  "error: XML document failed to validate against schema: Unable to
   validate doc against /usr/local/share/libvirt/schemas/domain.rng
   Extra element maxMemory in interleave
   Invalid sequence in interleave
   Element domain failed to validate content"


"Extra element maxMemory in interleave
Invalid sequence in interleave
Element domain failed to validate content"

Specifying "20971520" results in
  "error: XML error: both maximum memory size and memory slot count must
   be specified"

However, older QEMU version have that requirement. Supported since 3.0 I 
think:


$ git tag --contains 951f2269af2
...
v3.0.0
...

--
Thanks,

David / dhildenb



Re: [PATCH v3 00/15] Introduce virtio-mem model

2021-06-17 Thread David Hildenbrand

On 17.06.21 13:18, Michal Prívozník wrote:

On 6/17/21 11:44 AM, David Hildenbrand wrote:

On 17.03.21 12:57, Michal Privoznik wrote:

v3 of:

https://listman.redhat.com/archives/libvir-list/2021-February/msg00961.html


diff to v2:
- Dropped code that forbade use of virtio-mem and memballoon at the same
    time;
- This meant that I had to adjust memory accounting,
    qemuDomainSetMemoryFlags() - see patches 11/15 and 12/15 which are
new.
- Fixed small nits raised by Peter in his review of v2




Hi Michal, do you have a branch somewhere that I can easily checkout to
play with it/test it?



Yes:

https://gitlab.com/MichalPrivoznik/libvirt/-/tree/virtio_mem_v4

There were some comments in Peter's review and I really should fix my
code according to them and merge/send v4.

Michal



Thanks! I started with a single virtio-mem device.

1. NUMA requirement

Right now, one can really only configure "maxMemory" with NUMA specified,
otherwise there will be "error: unsupported configuration: At least
one numa node has to be configured when enabling memory hotplug".

I recall this is a limitation of older QEMU which would not create ACPI SRAT
tables otherwise. In QEMU, this is no longer the case. As soon as "maxmem"
is specified on the QEMU cmdline, we fake a single NUMA node:

hw/core/numa.c:numa_complete_configuration()

"Enable NUMA implicitly by adding a new NUMA node automatically"
-> m->auto_enable_numa_with_memdev / mc->auto_enable_numa_with_memhp

m->auto_enable_numa_with_memdev (slots=0) is set since 5.1 on x86-64 and arm64
m->auto_enable_numa_with_memhp (slots>0) is set since 2.11 on x86-64 and 4.1 on 
arm64

So in theory, with newer QEMU on x86-64 and arm64 we could drop that
limitation in libvirt (might require some changes eventually
regarding the "node" specification handling). ppc64 shouldn't care as there is 
no ACPI.


2. "virsh update-memory-device"

My domain looks something like:


  Fedora 34
...
  20971520
  20971520
  4194304
  16
...
  

  

  
...
  
...

  
16777216
0
2048
524288
0
  
  
  


  
...


It starts just fine and I can query the current properties

# virsh dumpxml "Fedora 34"
...

  
16777216
0
2048
524288
524288
  
  
  


Trying to update the requested size, however results in

[root@virtlab704 ~]# virsh update-memory-device "Fedora 34" --requested-size 0
error: unsupported configuration: Attaching memory device with size '16777216' 
would exceed domain's maxMemory config size '20971520'

Tried with "--alias" with the same result.

Is it maybe an issue regarding handling of "maxMemory" vs "memory" ?


3. "memory" item handling

Whenever I edit the XML and set e.g., "4", it's silently 
converted back to "20 GiB".
Maybe that's just always implicitly calculated from the NUMA spec and the 
defined devices.


Thanks!

--
Thanks,

David / dhildenb



Re: [PATCH v3 00/15] Introduce virtio-mem model

2021-06-17 Thread David Hildenbrand

On 17.03.21 12:57, Michal Privoznik wrote:

v3 of:

https://listman.redhat.com/archives/libvir-list/2021-February/msg00961.html

diff to v2:
- Dropped code that forbade use of virtio-mem and memballoon at the same
   time;
- This meant that I had to adjust memory accounting,
   qemuDomainSetMemoryFlags() - see patches 11/15 and 12/15 which are new.
- Fixed small nits raised by Peter in his review of v2




Hi Michal, do you have a branch somewhere that I can easily checkout to 
play with it/test it?


I can spot that at least patch #2 and #3 are already upstream. The 
remaining patches don't seem to apply cleanly on current upstream/master.


Thanks!

--
Thanks,

David / dhildenb



Re: [PATCH v2 05/13] conf: Introduce virtio-mem model

2021-02-22 Thread David Hildenbrand

On 22.02.21 17:28, Peter Krempa wrote:

On Mon, Feb 22, 2021 at 16:57:11 +0100, David Hildenbrand wrote:

On 22.02.21 16:46, Michal Privoznik wrote:

On 2/18/21 4:00 PM, David Hildenbrand wrote:

On 18.02.21 14:31, Michal Privoznik wrote:


[...]


In QEMU, we could make "info balloon" etc. also include virtio-mem provided
memory.

The main reason I did not do so initially is that
1) It's racy when reading/writing the balloon size. The QEMU interface
is broken as we don't get/set the size of the balloon size but
instead the logical VM size. While someone sets the logical VM size
via virtio-balloon, the logical VM size might change due to virtio-
mem guest activity.
2) I don't want people to be tempted to use both at the same
time.

Maybe the right think to do is make "info balloon" report the current
logical VM size. If there are races, bad luck - better not use both things
at the same time.


We specifically don't call info balloon or any equivalent nowadays if
qemu supports the balloon event and we should keep it like that at least
in terms of the active XML.

The bulk stats functions we have already call the monitor so it's okay
to do it there.

Both virtio-mem and balloon do have events so we should be able to
update any required definition bits without an explicit call to the
balloon info apart from the stats code and the refresh on reconnect.



Just to clarify, any BALLOON communication with the outer world 
(set/get/balloon_change) are in terms of "logical VM size", not "balloon 
size".


logical_vm_size = vm_size - balloon_size

vm_size does currently not include virtio-mem provided memory.

--
Thanks,

David / dhildenb



Re: [PATCH v2 05/13] conf: Introduce virtio-mem model

2021-02-22 Thread David Hildenbrand

On 22.02.21 16:46, Michal Privoznik wrote:

On 2/18/21 4:00 PM, David Hildenbrand wrote:

On 18.02.21 14:31, Michal Privoznik wrote:




Since now we have (possibly) two or more devices that allow
memory inflation/deflation and accounting for all of them (and
thus keeping  updated) might be hard. Therefore,
I'm deliberately forbidding memballoon. It's okay - virtio-mem is
superior to memballoon anyway. We can always reevaluate later.


That's a bad idea. It'll still be used for getting memory stats, free
page hinting and free page reporting.

Very weird use cases might even want to mix balloon inflation/deflation
with virtio-mem ...



I understand that, but then accounting for memory at libvirt level
becomes hard. What I mean, currently we have three elements for
reporting guest memory:


  1524288
  524288
  524288
  ...


 is for memory hotplug (corresponds to -m maxmem=X,slots=Y)

 is the initial amount guest has PLUS sum of sizes of all
memory devices. For instance if I want to give 1GiB initial amount and
have 1 pc-dimm of 2GiB size this element will report 3GiB.

 then reflects actual balloon size (gets updated on
BALLOON_CHANGE event and when libvirt finds fit via query-balloon).

In my experiments, balloon does not account for virtio-mem actual size.
Therefore, I can see following options:

1) fix  so that it does balloon size +
sum(virtio-mem.actual), or

2) don't touch  at all and let mgmt apps compute the sum
themselves (should they need it- which I believe they will),

3) come up with new, forth element (awful).

The benefit of 1) is that the element will report the same number as
'free' ran within guest. Problem is that I'd have to change all places
where we take the size of the balloon device directly and add sum of
actual sizes of virtio-mem devices (e.g. all stats APIs, ballon changed
event that we fire when we see qemu's event). And okay, if I do that -
then the only way to learn true value of balloon would be to take
current memory and subtract the sum of virtio-mem actual sizes. Which is
like option 2) but reversed. And since I could not decide which way to
go, I made it problem of future me by forbidding balloon and virtio-mem
coexistence.


In QEMU, we could make "info balloon" etc. also include virtio-mem 
provided memory.


The main reason I did not do so initially is that
1) It's racy when reading/writing the balloon size. The QEMU interface
   is broken as we don't get/set the size of the balloon size but
   instead the logical VM size. While someone sets the logical VM size
   via virtio-balloon, the logical VM size might change due to virtio-
   mem guest activity.
2) I don't want people to be tempted to use both at the same
   time.

Maybe the right think to do is make "info balloon" report the current 
logical VM size. If there are races, bad luck - better not use both 
things at the same time.


--
Thanks,

David / dhildenb



Re: [PATCH v2 05/13] conf: Introduce virtio-mem model

2021-02-18 Thread David Hildenbrand

On 18.02.21 14:31, Michal Privoznik wrote:

The virtio-mem is paravirtualized mechanism of adding/removing
memory to/from a VM. A virtio-mem-pci device is split into blocks
of equal size which are then exposed (all or only a requested
portion of them) to the guest kernel to use as regular memory.
Therefore, the device has two important attributes:

   1) block-size, which defines the size of a block
   2) requested-size, which defines how much memory (in bytes)
  is the device requested to expose to the guest.

The 'block-size' is configured on command line and immutable
throughout device's lifetime. The 'requested-size' can be set on
the command line too, but also is adjustable via monitor. In
fact, that is how management software places its requests to
change the memory allocation. If it wants to give more memory to
the guest it changes 'requested-size' to a bigger value, and if it
wants to shrink guest memory it changes the 'requested-size' to a
smaller value. Note, value of zero means that guest should
release all memory offered by the device. Of course, guest has to
cooperate. Therefore, there is a third attribute 'size' which is
read only and reflects how much memory the guest still has. This
can be different to 'requested-size', obviously. Because of name
clash, I've named it 'actualsize' and it is dealt with in future
commits (it is a runtime information anyway).

In the backend, memory for virtio-mem is backed by usual objects:
memory-backend-{ram,file,memfd} and their size puts the cap on
the amount of memory that a virtio-mem device can offer to a
guest. But we are already able to express this info using 
under .

Therefore, we need only two more elements to cover 'block-size'
and 'requested-size' attributes. This is the XML I've came up
with:

   
 
   1-3
   2048
 
 
   2097152
   0
   2048
   1048576
 
 
   

I hope by now it is obvious that:

   1) 'requested-size' must be an integer multiple of
  'block-size', and
   2) virtio-mem-pci device goes onto PCI bus and thus needs PCI
  address.

Then there is a limitation that the minimal 'block-size' is
transparent huge page size (I'll leave this without explanation).

Since now we have (possibly) two or more devices that allow
memory inflation/deflation and accounting for all of them (and
thus keeping  updated) might be hard. Therefore,
I'm deliberately forbidding memballoon. It's okay - virtio-mem is
superior to memballoon anyway. We can always reevaluate later.


That's a bad idea. It'll still be used for getting memory stats, free 
page hinting and free page reporting.


Very weird use cases might even want to mix balloon inflation/deflation 
with virtio-mem ...


--
Thanks,

David / dhildenb



Re: [PATCH 3/4] hw/virtio/virtio-balloon: Remove the "class" property

2021-02-03 Thread David Hildenbrand

On 03.02.21 18:18, Thomas Huth wrote:

This property was only required for compatibility reasons in the
pc-1.0 machine type and earlier. Now that these machine types have
been removed, the property is not useful anymore.

Signed-off-by: Thomas Huth 
---
  hw/virtio/virtio-balloon-pci.c | 11 +--
  1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/hw/virtio/virtio-balloon-pci.c b/hw/virtio/virtio-balloon-pci.c
index a2c5cc7207..79a3ba979a 100644
--- a/hw/virtio/virtio-balloon-pci.c
+++ b/hw/virtio/virtio-balloon-pci.c
@@ -34,21 +34,13 @@ struct VirtIOBalloonPCI {
  VirtIOPCIProxy parent_obj;
  VirtIOBalloon vdev;
  };
-static Property virtio_balloon_pci_properties[] = {
-DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
-DEFINE_PROP_END_OF_LIST(),
-};
  
  static void virtio_balloon_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)

  {
  VirtIOBalloonPCI *dev = VIRTIO_BALLOON_PCI(vpci_dev);
  DeviceState *vdev = DEVICE(>vdev);
  
-if (vpci_dev->class_code != PCI_CLASS_OTHERS &&

-vpci_dev->class_code != PCI_CLASS_MEMORY_RAM) { /* qemu < 1.1 */
-vpci_dev->class_code = PCI_CLASS_OTHERS;
-}
-
+vpci_dev->class_code = PCI_CLASS_OTHERS;
  qdev_realize(vdev, BUS(_dev->bus), errp);
  }
  
@@ -59,7 +51,6 @@ static void virtio_balloon_pci_class_init(ObjectClass *klass, void *data)

  PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
  k->realize = virtio_balloon_pci_realize;
  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
-device_class_set_props(dc, virtio_balloon_pci_properties);
  pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
  pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_BALLOON;
  pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;



Acked-by: David Hildenbrand 

--
Thanks,

David / dhildenb



Re: [PATCH 06/10] qemu: Wire up MEMORY_DEVICE_SIZE_CHANGE event

2021-02-02 Thread David Hildenbrand

[...]



A side note, do we get this event e.g. on VM reset? If no we need to
wire up the reset of 'actual' size in such case as it would wrongly
suggest that the VM is using it and it may not even get to loading the
driver.



QEMU fires the event whenever the value changes - including during 
system resets.


--
Thanks,

David / dhildenb



Re: [PATCH 03/10] conf: Introduce virtio-mem model

2021-01-25 Thread David Hildenbrand
On 22.01.21 19:16, Daniel Henrique Barboza wrote:
> 
> 
> On 1/22/21 9:50 AM, Michal Privoznik wrote:
>> The virtio-mem is paravirtualized mechanism of adding/removing
>> memory to/from a VM. A virtio-mem-pci device is split into blocks
>> of equal size which are then exposed (all or only a requested
>> portion of them) to the guest kernel to use as regular memory.
>> Therefore, the device has two important attributes:
>>
>>1) block-size, which defines the size of a block
>>2) requested-size, which defines how much memory (in bytes)
>>   is the device requested to expose to the guest.
>>
>> The 'block-size' is configured on command line and immutable
>> throughout device's lifetime. The 'requested-size' can be set on
>> the command line too, but also is adjustable via monitor. In
>> fact, that is how management software places its requests to
>> change the memory allocation. If it wants to give more memory to
>> the guest it changes 'requested-size' to a bigger value, and if it
>> wants to shrink guest memory it changes the 'requested-size' to a
>> smaller value. Note, value of zero means that guest should
>> release all memory offered by the device. Of course, guest has to
>> cooperate. Therefore, there is a third attribute 'size' which is
>> read only and reflects how much memory the guest still has. This
>> can be different to 'requested-size', obviously. Because of name
>> clash, I've named it 'actualsize' and it is dealt with in future
>> commits (it is a runtime information anyway).
>>
>> In the backend, memory for virtio-mem is backed by usual objects:
>> memory-backend-{ram,file,memfd} and their size puts the cap on
>> the amount of memory that a virtio-mem device can offer to a
>> guest. But we are already able to express this info using 
>> under .
>>
>> Therefore, we need only two more elements to cover 'block-size'
>> and 'requested-size' attributes. This is the XML I've came up
>> with:
>>
>>
>>  
>>1-3
>>2048
>>  
>>  
>>2097152
>>0
>>2048
>>1048576
>>  
>>  > function='0x0'/>
>>
>>
>> I hope by now it is obvious that:
>>
>>1) 'requested-size' must be an integer multiple of
>>   'block-size', and
>>2) virtio-mem-pci device goes onto PCI bus and thus needs PCI
>>   address.
>>
>> Then there is a limitation that the minimal 'block-size' is
>> transparent huge page size (I'll leave this without explanation).
>>
>> Signed-off-by: Michal Privoznik 
>> ---
>>   docs/formatdomain.rst | 35 --
>>   docs/schemas/domaincommon.rng | 11 
>>   src/conf/domain_conf.c| 53 ++-
>>   src/conf/domain_conf.h|  3 +
>>   src/conf/domain_validate.c| 39 +++
>>   src/qemu/qemu_alias.c |  1 +
>>   src/qemu/qemu_command.c   |  1 +
>>   src/qemu/qemu_domain.c| 10 +++
>>   src/qemu/qemu_domain_address.c| 37 ---
>>   src/qemu/qemu_validate.c  |  8 +++
>>   src/security/security_apparmor.c  |  1 +
>>   src/security/security_dac.c   |  2 +
>>   src/security/security_selinux.c   |  2 +
>>   tests/domaincapsmock.c|  9 +++
>>   .../memory-hotplug-virtio-mem.xml | 66 +++
>>   ...emory-hotplug-virtio-mem.x86_64-latest.xml |  1 +
>>   tests/qemuxml2xmltest.c   |  1 +
>>   17 files changed, 264 insertions(+), 16 deletions(-)
>>   create mode 100644 tests/qemuxml2argvdata/memory-hotplug-virtio-mem.xml
>>   create mode 12 
>> tests/qemuxml2xmloutdata/memory-hotplug-virtio-mem.x86_64-latest.xml
>>
>> diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst
>> index af540391db..2938758ec2 100644
>> --- a/docs/formatdomain.rst
>> +++ b/docs/formatdomain.rst
>> @@ -7267,6 +7267,18 @@ Example: usage of the memory devices
>>524288
>>  
>>
>> + 
>> +   
>> + 1-3
>> + 2048
>> +   
>> +   
>> + 2097152
>> + 0
>> + 2048
>> + 1048576
>> +   
>> + 
>>  
>>  ...
>>   
>> @@ -7274,7 +7286,8 @@ Example: usage of the memory devices
>>  Provide ``dimm`` to add a virtual DIMM module to the guest. 
>> :since:`Since
>>  1.2.14` Provide ``nvdimm`` model that adds a Non-Volatile DIMM module.
>>  :since:`Since 3.2.0` Provide ``virtio-pmem`` model to add a 
>> paravirtualized
>> -   persistent memory device. :since:`Since 7.1.0`
>> +   persistent memory device. :since:`Since 7.1.0` Provide ``virtio-mem`` 
>> model
>> +   to add paravirtualized memory device. :since: `Since 7.1.0`
>>   
>>   ``access``
>>  An optional attribute ``access`` ( :since:`since 3.2.0` ) that provides
>> @@ -7297,10 +7310,11 @@ Example: usage of the memory devices
>>  allowed only for 

Re: [PATCH 00/10] Introduce virtio-mem model

2021-01-25 Thread David Hildenbrand
On 22.01.21 23:03, Daniel Henrique Barboza wrote:
> 
> 
> On 1/22/21 6:19 PM, David Hildenbrand wrote:
>>
>>> Out of curiosity: are you aware of anyone working in enabling virtio-mem
>>> for pseries/ppc64? I'm wondering if there's some kind of architecture
>>> limitation in Power or if it's just a lack of interest.
>>
>> I remember there is interest, however:
>>
>> - arm64 and x86-64 is used more frequently in applicable (cloud?) setups, so 
>> it has high prio
>> - s390x doesn‘t have any proper memory hot(un)plug, and as I have a strong 
>> s399x background, it‘s rather easy for me to implement
>> - ppc64 at least supports hot(un)plug of DIMMs
>>
>> There is nothing fundamental speaking against ppc64 support AFAIR.
> 
> That's good to hear.
> 
>> A block size of 16MB should be possible. I‘m planning on looking into it, 
>> however, there are a lot of other things on my todo list for virtio-mem.
> 
> I'm not familiar with the 'block size' concept of the virtio-mem device that 
> would
> allow for 16MB increments. My knowledge of the pseries kernel/QEMU is that the
> guest visible memory must always be 256MiB aligned due to PAPR mechanics that
> forces a memory block to be at least this size. Albeit I believe that there is
> no constraints of the memory this device is providing being counted as
> non-hotplugable, then in this case the alignment shouldn't be needed.

In Linux guests, virtio-mem adds whole memory blocks (e.g., aligned
256MB), but is able to expose only parts of a memory block dynamically
to Linux mm - essentially in 16MB on ppc64 IIRC.

E.g., on x86-64 (and soon arm64), we mostly add 128MB memory blocks, but
can operate on (currently) 4MB blocks (MAX_ORDER - 1) inside these
blocks. A little like memory ballooning ... but also quite different :)

So far the theory on ppc64. I have no prototype on ppc64, so we'll have
to see what's actually possible.

> 
> But I digress. Thanks for the insights. I'll ping some people inside IBM and
> see if we have a more immediate use case for virtio-mem in Power. Perhaps
> we can do some sort of collaboration with your work.

Sure, I'll be happy to assist.

-- 
Thanks,

David / dhildenb



Re: [PATCH 00/10] Introduce virtio-mem model

2021-01-22 Thread David Hildenbrand


> Out of curiosity: are you aware of anyone working in enabling virtio-mem
> for pseries/ppc64? I'm wondering if there's some kind of architecture
> limitation in Power or if it's just a lack of interest.

I remember there is interest, however:

- arm64 and x86-64 is used more frequently in applicable (cloud?) setups, so it 
has high prio
- s390x doesn‘t have any proper memory hot(un)plug, and as I have a strong 
s399x background, it‘s rather easy for me to implement
- ppc64 at least supports hot(un)plug of DIMMs

There is nothing fundamental speaking against ppc64 support AFAIR. A block size 
of 16MB should be possible. I‘m planning on looking into it, however, there are 
a lot of other things on my todo list for virtio-mem.


> 
> 
>> The QEMU code has an advanced block-size auto-detection code - e.g., 
>> querying from the kernel but limiting it to sane values (e.g., 512 MB on 
>> some arm64 configurations). Maybe we can borrow some of that or even sense 
>> the block size via QEMU? Borrowing might be easier. :)
> 
> I guess it's a good candidate for a fancy QMP API.
> 

One can at least query the block-size via „qom-get“, but that requires to spin 
up an QEMU instance with a virtio-mem device.

> 
>> On x86-64 we are good to go with a 2MB default.
>>> 
>>> 
>>> - in patch 03 it is mentioned that:
>>> 
>>> "If it wants to give more memory to the guest it changes 'requested-size' to
>>> a bigger value, and if it wants to shrink guest memory it changes the
>>> 'requested-size' to a smaller value. Note, value of zero means that guest
>>> should release all memory offered by the device."
>>> 
>>> Does size zero implicates the virtio-mem device unplug? Will the device 
>>> still
>>> exist in the guest even with zeroed memory, acting as a sort of 'deflated
>>> virtio-balloon'?
>> Yes, the device will still exist, to be grown again later. Hotunplugging the 
>> device itself is not supported (yet, and also not in the near future).
> 
> 
> Assuming that virtio-mem has low overhead in the guest when it's 'deflated',
> I don't see any urgency into implementing hotunplug for this device TBH.

There are still things to be optimized in QEMU regarding virtual memory 
consumption, but that‘s more general work to be tackled within the next months. 
After that, not too much speaks against just letting the device stick around to 
provide more nemory later on demand.

Thanks!




Re: [PATCH 00/10] Introduce virtio-mem model

2021-01-22 Thread David Hildenbrand


> Am 22.01.2021 um 19:53 schrieb Daniel Henrique Barboza 
> :
> 
> 
> 
>> On 1/22/21 9:50 AM, Michal Privoznik wrote:
>> Technically, this is another version of:
>> https://www.redhat.com/archives/libvir-list/2020-December/msg00199.html
>> But since virtio-pmem part is pushed now, I've reworked virtio-mem a bit
>> and sending it as a new series.
>> For curious ones, David summarized behaviour well when implementing
>> virtio-mem support in kernel:
>> https://lwn.net/Articles/755423/
>> For less curious ones:
>>   # virsh update-memory $dom --requested-size 4G
>> adds additional 4GiB of RAM to guest;
>>   # virsh update-memory $dom --requested-size 0
>> removes those 4GiB added earlier.
>> Patches are also available on my GitLab:
>> https://gitlab.com/MichalPrivoznik/libvirt/-/tree/virtio_mem_v3
> 
> Code LGTM:
> 
> Reviewed-by: Daniel Henrique Barboza 
> 

Hi,

Let me answer your questions.


> 
> A few questions about the overall design:
> 
> - it is mentioned that 'requested-size' should respect the granularity
> of the block unit, but later on the 'actual' attribute is added to track
> the size that the device was expanded/shrunk. What happens if we forfeit
> the granularity check of the memory increments? Will QEMU error out because
> we're requesting an invalid value or it will silently size the device to a
> plausible size?

QEMU will error out, stating that the request-size has to be properly aligned 
to the block-size.

> 
> 
> - Reading the lwn article I understood that David implemented this support
> for s390x as well. If that's the case, then I believe you should double
> check later on what's the THP size that Z uses to be sure that it's the
> same 2MiB value you're considering in patch 03.

In the near future we might see arm64 and s390x support. The latter might 
probably take a bit longer. Both are not supported yet in QEMU/kernel.

The QEMU code has an advanced block-size auto-detection code - e.g., querying 
from the kernel but limiting it to sane values (e.g., 512 MB on some arm64 
configurations). Maybe we can borrow some of that or even sense the block size 
via QEMU? Borrowing might be easier. :)

On x86-64 we are good to go with a 2MB default.

> 
> 
> - in patch 03 it is mentioned that:
> 
> "If it wants to give more memory to the guest it changes 'requested-size' to
> a bigger value, and if it wants to shrink guest memory it changes the
> 'requested-size' to a smaller value. Note, value of zero means that guest
> should release all memory offered by the device."
> 
> Does size zero implicates the virtio-mem device unplug? Will the device still
> exist in the guest even with zeroed memory, acting as a sort of 'deflated
> virtio-balloon'?

Yes, the device will still exist, to be grown again later. Hotunplugging the 
device itself is not supported (yet, and also not in the near future).

Thanks!




Re: [PATCH 00/10] Introduce virtio-mem model

2021-01-22 Thread David Hildenbrand
On 22.01.21 13:50, Michal Privoznik wrote:
> Technically, this is another version of:
> 
> https://www.redhat.com/archives/libvir-list/2020-December/msg00199.html
> 
> But since virtio-pmem part is pushed now, I've reworked virtio-mem a bit
> and sending it as a new series.
> 
> For curious ones, David summarized behaviour well when implementing
> virtio-mem support in kernel:
> 
> https://lwn.net/Articles/755423/

... and for the really curious ones (because some details in that patch
are a little outdated), there is plenty more at:

https://virtio-mem.gitlab.io/

Thanks for all your effort Michal!

-- 
Thanks,

David / dhildenb



Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting

2020-10-14 Thread David Hildenbrand
On 14.10.20 16:15, Michal Privoznik wrote:
> On 10/14/20 4:07 PM, David Hildenbrand wrote:
> 
>>
>>
>> Note: prealloc=yes is a bad choice in this environment. It
>> contradicts memory overcommit - what we want to optimize with
>> free page reporting. You allocate all VM memory to throw it away
>> once the guest is up. Is there an option to turn this of with
>> hugetlbfs? I hope so.
> 
> Is this something that should be addressed on qemu side? Since I'm 
> writing a follow up patches for libvirt I can forbid this combination in 
> libvirt.

You mean, bailing out in QEMU if "prealloc=yes" is used along free page
reporting? Hmm, not sure if that's the right approach ... we would also
have to check for any DIMMs we hotplug and things like that. It's
definitely a setup issue to me (where we could try to warn the user, but
QEMU warnings tend to get ignored by everybody ...).

It's also going to be "not what you want" once we have virtio-mem
support in libvirt. (https://virtio-mem.gitlab.io/) But there, it's
easier to check+warn. You only want "prealloc=no" for the memory device
assigned to a virtio-mem device, not for all system RAM.

Thanks!

-- 
Thanks,

David / dhildenb



Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting

2020-10-14 Thread David Hildenbrand
On 14.10.20 15:46, Michal Privoznik wrote:
> On 10/14/20 2:06 PM, David Hildenbrand wrote:
>> On 14.10.20 13:53, Michal Privoznik wrote:
>>> On 10/14/20 10:26 AM, David Hildenbrand wrote:
>>>> On 14.10.20 08:30, Michal Privoznik wrote:
>>>
>> 
>>
>> No, not at all. Thanks for reporting!
>>
>> And the "bad" thing is, that QEMU doesn't do anything too fancy. All it
>> does is "fallocate(FALLOC_FL_PUNCH_HOLE)" on hugetlbfs when trying to
>> zap reported pages. The same mechanism is also used for postcopy live
>> migration and virtio-mem with hugetlbfs.
>>
>> Which kernel are you running?
>>
>> 1. Is it an upstream kernel, lkml + -mm lists are the right place
>> (please cc me, or I can try to reproduce and report it).
>>
>> 2. Is it a distro kernel? Then create a BUG there.
>>
>> I was just recently testing virtio-mem with hugetlbfs and it worked on
>> decent upstream Fedora. But maybe I was not able to trigger it.
> 
> Okay, I've upgraded to 5.9.0-gentoo, but the problem persists. Gentoo 
> puts only a very few patches on top of vanilla kernel neither of which 
> touches that area of the code:
> 
> https://dev.gentoo.org/~mpagano/genpatches/trunk/5.9/
> 
> So I think this is reproducible on vanilla too.
> 
> BTW: Have you tried placing the qemu inside v1 cgroups? Libvirt does 
> that so maybe that's the problem. Anyway, here's the cmd line:
> 
> /home/zippy/work/qemu/qemu.git/build/qemu-system-x86_64 \
> -name guest=fedora,debug-threads=on \
> -S \
> -object 
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-fedora/master-key.aes
>  
> \
> -machine 
> pc-i440fx-4.0,accel=kvm,usb=off,dump-guest-core=off,memory-backend=pc.ram \
> -cpu host,migratable=on \
> -m 4096 \
> -object 
> memory-backend-memfd,id=pc.ram,hugetlb=yes,hugetlbsize=2097152,prealloc=yes,size=4294967296,host-nodes=0,policy=bind
>  
> \
> -overcommit mem-lock=off \
> -smp 4,sockets=1,dies=1,cores=2,threads=2 \
> -object iothread,id=iothread1 \
> -object iothread,id=iothread2 \
> -object iothread,id=iothread3 \
> -object iothread,id=iothread4 \
> -uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \
> -no-user-config \
> -nodefaults \
> -device sga \
> -chardev socket,id=charmonitor,fd=33,server,nowait \
> -mon chardev=charmonitor,id=monitor,mode=control \
> -rtc base=utc \
> -no-shutdown \
> -global PIIX4_PM.disable_s3=0 \
> -global PIIX4_PM.disable_s4=0 \
> -boot menu=on,strict=on \
> -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 \
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 \
> -blockdev 
> '{"driver":"file","filename":"/var/lib/libvirt/images/fedora.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}'
>  
> \
> -blockdev 
> '{"node-name":"libvirt-1-format","read-only":false,"discard":"unmap","driver":"qcow2","file":"libvirt-1-storage","backing":null}'
>  
> \
> -device 
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-1-format,id=scsi0-0-0-0,bootindex=1
>  
> \
> -netdev tap,fd=35,id=hostnet0 \
> -device 
> virtio-net-pci,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:a4:6f:91,bus=pci.0,addr=0x3
>  
> \
> -chardev pty,id=charserial0 \
> -device isa-serial,chardev=charserial0,id=serial0 \
> -chardev socket,id=charchannel0,fd=36,server,nowait \
> -device 
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
>  
> \
> -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on \
> -device virtio-vga,id=video0,virgl=on,max_outputs=1,bus=pci.0,addr=0x2 \
> -device 
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7,free-page-reporting=on \
> -sandbox 
> on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
> -msg timestamp=on
> 

Thanks!

Reproduced easily on Fedora 32 (5.7.16-200.fc32.x86_64).

[   70.641802] CPU: 3 PID: 2178 Comm: qemu-system-x86 Not tainted 
5.7.16-200.fc32.x86_64 #1
[   70.641802] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS PRO/X570 
AORUS PRO, BIOS F21 07/31/2020
[   70.641803] RIP: 0010:page_counter_uncharge+0x4b/0x50
[   70.641804] Code: 0f c1 45 00 4c 29 e0 48 89 ef 48 89 c3 48 89 c6 e8 2a fe 
ff ff 48 85 db 78 10 48 8b 6d 20 48 85 ed 75 d8 5b 5d 41 5c 41 5d c3 <0f> 0b eb 
ec 90 0f 1f 44 00 00 48 8b 17 48 39 d6 72 41 41 54 49 89
[   70.641804] RSP: 0018:b4044139bb1

Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting

2020-10-14 Thread David Hildenbrand
On 14.10.20 13:53, Michal Privoznik wrote:
> On 10/14/20 10:26 AM, David Hildenbrand wrote:
>> On 14.10.20 08:30, Michal Privoznik wrote:
> 
> Sorry for hijacking this thread, but I need to report it somewhere (what 
> is the best place?)
> 
> When I want to start a guest with this feature turned on + memfd + 
> hugepages, the host kernel prints this warning into dmesg and hugepages 
> stop working from then on (meaning, even if the pool of allocated HPs is 
> large enough I can't start any guest with HPs):
> 
> 
> 
> 
> [  139.434748] [ cut here ]
> [  139.434754] WARNING: CPU: 2 PID: 6280 at mm/page_counter.c:57 
> page_counter_uncharge+0x33/0x40
> [  139.434754] Modules linked in: kvm_amd amdgpu kvm btusb btrtl btbcm 
> btintel sp5100_tco watchdog k10temp mfd_core gpu_sched ttm
> [  139.434759] CPU: 2 PID: 6280 Comm: CPU 1/KVM Not tainted 
> 5.8.13-gentoo-x86_64 #2
> [  139.434759] Hardware name: System manufacturer System Product 
> Name/PRIME X570-PRO, BIOS 1005 08/01/2019
> [  139.434760] RIP: 0010:page_counter_uncharge+0x33/0x40
> [  139.434762] Code: 48 85 ff 74 24 4c 89 c8 f0 48 0f c1 07 4c 29 c0 48 
> 89 c1 48 89 c6 e8 7c fe ff ff 48 85 c9 78 0a 48 8b 7f 28 48 85 ff 75 dc 
> c3 <0f> 0b eb f2 66 0f 1f 84 00 00 00 00 00 48 8b 17 48 39 d6 72 41 41
> [  139.434762] RSP: 0018:c9000355fb38 EFLAGS: 00010286
> [  139.434763] RAX: fffb4000 RBX: 888fc267e900 RCX: 
> fffb4000
> [  139.434763] RDX: 0402 RSI: fffb4000 RDI: 
> 888fd8411dd0
> [  139.434764] RBP: 888fcba983c0 R08: 00080400 R09: 
> fff7fc00
> [  139.434764] R10: c9000355fb40 R11: 000a R12: 
> 0001
> [  139.434765] R13: 888fc3d89140 R14: 01b2 R15: 
> 01b1
> [  139.434765] FS:  7fc9d4c35700() GS:888fde88() 
> knlGS:
> [  139.434766] CS:  0010 DS:  ES:  CR0: 80050033
> [  139.434766] CR2: 7f09a4003000 CR3: 000fc06fe000 CR4: 
> 00340ee0
> [  139.434767] Call Trace:
> [  139.434769]  hugetlb_cgroup_uncharge_file_region+0x46/0x70
> [  139.434772]  region_del+0x1a0/0x260
> [  139.434773]  hugetlb_unreserve_pages+0x32/0xa0
> [  139.434775]  remove_inode_hugepages+0x19d/0x3a0
> [  139.434776]  hugetlbfs_fallocate+0x3f2/0x4a0
> [  139.434778]  ? __seccomp_filter+0x75/0x6a0
> [  139.434779]  vfs_fallocate+0x124/0x260
> [  139.434780]  __x64_sys_fallocate+0x39/0x60
> [  139.434783]  do_syscall_64+0x38/0x60
> [  139.434784]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  139.434785] RIP: 0033:0x7fc9e0994de7
> [  139.434786] Code: 89 7c 24 08 48 89 4c 24 18 e8 45 fc f8 ff 41 89 c0 
> 4c 8b 54 24 18 48 8b 54 24 10 b8 1d 01 00 00 8b 74 24 0c 8b 7c 24 08 0f 
> 05 <48> 3d 00 f0 ff ff 77 41 44 89 c7 89 44 24 08 e8 75 fc f8 ff 8b 44
> [  139.434787] RSP: 002b:7fc9d4c337a0 EFLAGS: 0293 ORIG_RAX: 
> 011d
> [  139.434787] RAX: ffda RBX: 3640 RCX: 
> 7fc9e0994de7
> [  139.434788] RDX: 3620 RSI: 0003 RDI: 
> 001d
> [  139.434788] RBP: 7fc9d4c33800 R08:  R09: 
> 
> [  139.434789] R10: 0020 R11: 0293 R12: 
> 7fff9a75c3fe
> [  139.434789] R13: 7fff9a75c3ff R14: 7fc9d4c35700 R15: 
> 7fc9d4c33dc0
> [  139.434790] ---[ end trace fb9808303959fc01 ]---
> 
> 
> Is this known problem?

No, not at all. Thanks for reporting!

And the "bad" thing is, that QEMU doesn't do anything too fancy. All it
does is "fallocate(FALLOC_FL_PUNCH_HOLE)" on hugetlbfs when trying to
zap reported pages. The same mechanism is also used for postcopy live
migration and virtio-mem with hugetlbfs.

Which kernel are you running?

1. Is it an upstream kernel, lkml + -mm lists are the right place
(please cc me, or I can try to reproduce and report it).

2. Is it a distro kernel? Then create a BUG there.

I was just recently testing virtio-mem with hugetlbfs and it worked on
decent upstream Fedora. But maybe I was not able to trigger it.

> 
> Michal
> 


-- 
Thanks,

David / dhildenb



Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting

2020-10-14 Thread David Hildenbrand
On 14.10.20 08:30, Michal Privoznik wrote:
> On 10/14/20 8:07 AM, Nico Pache wrote:
>> IMO "return-pages" sounds the best out of those and stays relatively 
>> consistent with the kernel and qemu terminology for this feature.
>>
>> I personally don't see a huge problem with the current name, but I've 
>> also been staring at the words "free page reporting" for too long.
> 
> Right, I can see both reasons. But now that Peter raised it (again, 
> sorry) at libvirt level reporting usually means "to report something to 
> user". QEMU <-> KVM communication is too low level and since this is not 
> being reported to user directly (even though it affects 
> "stat-free-memory" attribute of the balloon) reporting does sound a bit 
> weird. Let's wait for Peter's opinion.

We originally wanted to use "free page hinting", but as that name is
already taken for another virtio-balloon feature to improve live
migration speed (one time hint of free pages just before migration), we
used "reporting" instead.

On a QEMU level, this feature name makes perfect sense. The guest
(continuously) reports free pages to the hypervisor. The hypervisor
tries to use the reports to free up some memory (which might not always
be possible).


Now, all I can say is that

1. "free page reporting" is the official name of the feature. Using a
different name in libvirt will crate more confusion than it might
actually help. Just saying.

2. The proposed alternatives ""free-pages" or "return-pages" or even
"discard-pages" don't match what's actually going on.

-- 
Thanks,

David / dhildenb



Re: [PATCH v2 0/4] qemu: Add support for free-page-reporting

2020-10-13 Thread David Hildenbrand
On 13.10.20 19:00, Nico Pache wrote:
> I don't believe so, but David might have some more insight into that
> question. 
> 
> Once the feature is enabled the guest will report pages on its freelist
> to the hypervisor, and the host/hypervisor will then release them. 

We report (some, not all) free memory of our guest to the hypervisor. So
whatever the guest reported shows up in "stat-free-memory" in the stats
already.

The hypervisor will "discard" backing storage of reported free memory,
resulting in your hypervisor having more free memory available (and the
QEMU process consuming less memory).

> 
> I'm not exactly sure what information we can or would want to 'stat' out
> of this process... A counter for number of pages returned?

As the guest can reuse memory any time without coordination with the
hypervisor that wouldn't be of too much help. A counter would only tell
you if free page reporting was once performed successfully.

You would really have to compare the QEMU process memory footprint with
the guest stats to figure out if free page reporting is doing what it's
supposed to do.

-- 
Thanks,

David / dhildenb



Re: [PATCH 1/1] Optional free-page-reporting can be enabled/disabled for memballon device of model 'virtio'.

2020-10-05 Thread David Hildenbrand
On 05.10.20 10:06, Peter Krempa wrote:
> On Fri, Oct 02, 2020 at 11:59:43 -0400, Nico Pache wrote:
>> xml:
>> 
>>   
> 
> according to what you state in the cover-letter this is a
> 'virtio-balloon-pci' feature. We usually put stuff which depends on a
> specific model of the device into the  subelement of the device
> element.

IIRC, this should work for virtio-balloon-*, including virtio-balloon-ccw.


-- 
Thanks,

David / dhildenb



Re: [libvirt] s390: change default cpu model to host-model?

2019-11-04 Thread David Hildenbrand

On 02.11.19 11:32, Daniel P. Berrangé wrote:

On Fri, Nov 01, 2019 at 06:43:16PM +0100, Christian Borntraeger wrote:

On the KVM forum I have discussed the default cpu model mode on s390.
Right now if the xml does not specify anything, libvirt defaults to
not specifying anything on the qemu command line (no -cpu statement)
which is the equivalent of -cpu host for s390 which is equivalent to
host-passthrough. While this enables all features it does not provide
any migration safety by default.

So in fact we are kind of "broken" right now when it comes to safery.

So we discussed that it would make sense that an empty xml should actually
be defaulted to host-model, which results in - as of today - the same guest
features but in a migration safe way.

There is another change planned right now to actually make the cpu model
present in an xml if none was specified. So we could actually do this change
before, together  or after te other. Jiri and I think it probably makes most
sense to have both changes at the same time (in terms of libvirt version).

Does anyone see an issue with changing the default model mode to "host-model"
if the xml does not specify anything else?


Changing from "host-passthrough" to "host-model" is not a huge difference,
but it is none the less a guest ABI change. "host-passthrough" doesn't
provide migration safety in the face of differing hardware, it should still
be valid for people with homogeneous hardware. So changing the model will
potentially break some existing usage.


I guess on s390x this is not the case ("-cpu host", no "-cpu", and 
passing the expanded "host" model will result in the same guest ABI, in 
contrast to x86 AFAIK). There is this special case, though, where we 
have old QEMUs without CPU model support. Not sure how to deal with 
that, then.




As the top priority we should definitely make sure that the guest XML
gets updated to list "host-passthrough" when no CPU is specified, so
that it reflects the current reality. If libvirt gets this information


+1 to that


from QEMU, then at some point down the line we can potentially change
the default by tieing a new default to a versioned machine type. We
have to wait a little while for old libvirt's to become irrelevant
wrt new QEMU though.   This is the same issue with changing the default
CPU on x86 which, though that's possibly harder on x86 as the scope of
any change is more significant.

--

Thanks,

David / dhildenb

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH 0/7] i386: Add `machine` parameter to query-cpu-definitions

2019-10-25 Thread David Hildenbrand

On 25.10.19 15:38, Eduardo Habkost wrote:

CCing danpb, libvir-list, mskrivanek.

On Fri, Oct 25, 2019 at 09:17:46AM +0200, David Hildenbrand wrote:

On 25.10.19 04:25, Eduardo Habkost wrote:

We had introduced versioned CPU models in QEMU 4.1, including a
method for querying for CPU model versions using


Interesting, I was not aware of that.

On s390x, we somewhat have versioned CPU models, but we decided against
giving them explicit names (e.g., z13-v1 or z13-4.1.0), because it didn't
really seem to be necessary. (and we often implement/add features for older
CPU models, there is a lot of fluctuation) Actually, you would have had to
add "z13-z/VM-x.x.x" or "z13-LPAR-x.x.x" or "z13-KVM-x.x.x" to model the
features you actually see in all the different virtual environments ("what a
CPU looks like"). Not to talk about QEMU versions in addition to that. So we
decided to always model what you would see under LPAR and are able to
specify for a KVM guest. So you can use "z13" in an up-to-date LPAR
environment, but not in a z/VM environment (you would have to disable
features).

Each (!base) CPU model has a specific feature set per machine. Libvirt uses
query-cpu-model-expansion() to convert this model+machine to a
machine-independent representation. That representation is sufficient for
all use cases we were aware of (esp. "virsh domcapabilities", the host CPU
model, migration).

While s390x has versioned CPU models, we have no explicit way of specifying
them for older machines, besides going over query-cpu-model-expansion and
using expanded "base model + features".

I can see that this might make sense on x86-64, where you only have maybe 3
versions of a CPU (e.g., the one Intel messed up first - Haswell, the one
Intel messed up next - Haswell-noTSX, and the one that Intel eventually did
right - Haswell-noTSX-IBRS) and you might want to specify "Haswell" vs.
"Haswell-IBRS" vs. "Haswell-noTSX-IBRS". But actually, you will always want
to go for "Haswell-noTSX-IBRS", because you can expect to run in updated
environments if I am not wrong, everything else are corner cases.

Of course, versioned CPU model are neat to avoid "base model + list of
features", but at least for expanding the host model on s390x, it is not
really helpful. When migrating, the model expansion does the trick.

I haven't looked into details of "how to specify or model versions". Maybe
IBM wants to look into creating versions for all the old models we had. But
again, not sure if that is of any help for s390x. CCing IBM.


I'm not sure yet if there are the x86-specific goals and
constraints that would make it difficult to follow the same
approach followed by s390x.  I have the impression we do,
but I need to think more carefully about it.

Let's discuss that during KVM Forum?


I won't be attending KVM Forum this year, but Christian should be around.



The two main goals of versioned CPU models in x86 are:
1) Decoupling CPU model updates from machine-types (users should be
able to update a CPU model definition without changing machine
type).
2) Letting management software automate CPU model updates.
Normally this is necessary when bare metal microcode or
software updates add/remove features from CPUs.  Sometimes the
Virtual CPU model update needs to be performed before the host
updates are applied (if features are being removed)


We have 2) on s390x after a QEMU machine update. You need a QEMU update 
usually either way to support the new CPU feature. But I can see how 
decoupling the CPU model definition from the machine makes this easier. 
It does introduce complexity. It applies only when running older QEMU 
machines, or if we have to update a CPU model before we get a new QEMU 
machine.


But we do have versioned CPU models already, so the discussion is 
already over :)




The main constraint that makes it difficult to address the above
without a new API is:
A) Every CPU model in x86 except "host" is already expected to
be migration-safe (I don't know how this compares to s390x).


I would describe that not a bug but a feature :) It does come with the 
price that only using the newest QEMU machine results in the newest CPU 
model, that part I understand.


--

Thanks,

David / dhildenb

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list



Re: [libvirt] [PATCH v5 00/15] CPU Model Baseline and Comparison for s390x

2019-09-20 Thread David Hildenbrand
On 19.09.19 22:24, Collin Walling wrote:
> Note: since I've made some changes to a lot of these patches / split
> up some patches, I've decided to hold off on adding any r-b's in case
> there is a specific change that someone does not agree with.
> 
> Changelog:
> 
> - Properly refactored code from CPU model expansion function
> - Introduced a cleanup patch for CPU model expansion function
> - Introduced patches that modifies the refactored code to suit
> needs for baseline/comparison
> - CPU expansion function now accepts a virCPUDefPtr
> - Removed props parsing from CPU model comparison (they weren't
> used)
> - Cleaner error reporting when baselining/comparing with erroneous
> CPU models / features
> - Various cleanups based on feedback
> ___
> 
> To run these patches, execute the virsh hypervisor-cpu-compare or 
> hypervisor-cpu-baseline commands and pass an XML file describing one or 
> more CPU definition. You can use the definition from virsh domcapabilities 
> or from a guest XML. There is no need extract it from the file and place 
> it a new one, as the XML parser will look specifically for the CPU tags.
> ___
> 
> These patches hookup the virsh hypervisor-cpu-compare/baseline commands 
> for the s390x architecture. They take an XML file describing some CPU 
> definitions and passes the data to QEMU, where the actual CPU model 
> comparison / baseline calculation is handled (available since QEMU 2.8.5).
> 
> When baselining CPU models with the --features argument, s390x will report
> a full CPU model expansion.
> 
> Thanks.

Nice to see this make progress after all these years I integrated
support into QEMU :)

-- 

Thanks,

David / dhildenb

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v4 0/8] CPU Model Baseline and Comparison for s390x

2019-09-10 Thread David Hildenbrand
On 10.09.19 17:22, Jiri Denemark wrote:
> On Tue, Sep 03, 2019 at 15:32:34 -0400, Collin Walling wrote:
>> On 8/20/19 10:06 AM, Jiri Denemark wrote:
>>> First, let me apologize for such a late review. I'll try my best to
>>> review your series earlier next time.
>>>
>>
>> Your review is greatly appreciated! I haven't replied to your other
>> posts on this series as my comments were mostly acknowledgements rather 
>> than discussion pieces. I'm working through them.
>>
>>> On Wed, Jul 17, 2019 at 10:03:21 -0400, Collin Walling wrote:
 When baselining CPU models and the user appends the --features argument 
 to the command, s390x will only report back features that supersede the 
 base model definition.

 *NOTE* if the --features flag is intended to expand /ALL/ features
 available to a CPU model (such as the huge list of features reported
 by a full CPU model expansion), please let me know and I can resolve 
 this.
>>>
>>> I'm not sure how well this fits s390 world, but the semantics of
>>> VIR_CONNECT_BASELINE_CPU_EXPAND_FEATURES flag is to report all CPU
>>> features which are hidden behind the CPU model. That is, all feature
>>> which you'd get when starting QEMU with just the CPU model name and no
>>> additional features. The extra features should not be touched at all.
>>> Specifically, removing them when the flag is not used is wrong.
>>>
>>> To me this looks like the flag should really result in running
>>> query-cpu-model-expansion (but likely the "static" one rather then
>>> "full" expansion) on the baseline CPU model and reporting the enabled
>>> features along with those already included in the baseline feature set.
>>>
>>
>> Actually, query-cpu-model-baseline will return a CPU model along with a
>> feature set. The features returned are the same as those produced from a 
>> static expansion on the model.
>>
>> Correct me if I am wrong here: virsh should report features *only* if the
>> --features flag is present. Otherwise, we only report the model name (which
>> can be accomplished by stripping the result of all reported features).
> 
> No, this is wrong in general (although it maybe correct for s390, I
> don't know).

David here: Not correct for s390x

> 
> Let me explain with some hypothetical CPU models.
> 
> Model1 (when used as -cpu Model1) will enable features f1, f2, f3.
> Model2 will enable f1, f2, f3, f4, f5.
> Model3 will enable f1, f2, f3, f4, f6.
> 
> Running baseline command for [Model1, Model2, Model3] should return
> Model1 with no additional features as [f1, f2, f3] are in only ones
> common to all three models and they are all enabled by Model1.
> 
> However, baseline of [Model2, Model3] should return Model1 + f4. The
> common set of features is [f1, f2, f3, f4], but Model1 would enable [f1,
> f2, f3] only, which means we need to add f4 explicitly.
> 
> With --features, even the features enabled implicitly by a specific CPU
> model should be returned. That is, the first result would be Model1 +
> [f1, f2, f3] and the second result would be Model1 + [f1, f2, f3, f4].
> 
> In other words, stripping all features if no --features option is used
> is wrong. Only features implicitly enabled by the CPU model should be
> stripped.
> 
> Specifically, baseline without --features on the following CPU
> definition
> 
> 
>   Model1
>   
> 
> 
> should be the same definition (i.e., Model1 + f4).

s390x will fallback to base models when baselining/expanding ("minimum
feature set we expect to be around for a certain cpu generation", which
is independent of the QEMU version and will never change). But if you're
already using base models, it works as you would expect it.

E.g.

Model1 is [f1, f2, f3]
Model1-base is [f1, f2]

Model2 is [f1, f2, f3, f4]
Model2-base is [f1, f2, f3]

Baselining Model1 with Model2 would result in
Model1-base + f3

For this, QMP "query-cpu-model-baseline" can be used.

Note: When expending cpu models (e.g. host), one would already always
get base models. So there, it would work completely as you expect it.

Now, to achieve the "--features" part, simply throw the result of
"query-cpu-model-baseline" into "query-cpu-model-expansion" with
"CpuModelExpansionType" == "full"

> 
> The question is whether you can enabled extra features on s390 or you
> can only use plain CPU models with no additional features. If additional
> features are allowed, I'd guess baseline without features should return
> the result of the QMP command minus the features a static expansion of
> the CPU model itself would return (this is assuming f4 will be included
> in the result we get from QEMU for Model1 + f4). When --features is
> used, the result from QEMU should be returned.

We do have plenty of extra cpu features on s390x. *Plenty* :)

> 
> I hope it's clear now. However, I'm not sure how CPU models work on
> s390 and I'd be happy if you could explain it to me. Especially, whether
> something similar to what I described can happen there.
> 
> 

Re: [libvirt] [PATCH v1 2/7] pcihp: overwrite hotplug handler recursively from the start

2018-11-02 Thread David Hildenbrand
On 02.11.18 14:00, Igor Mammedov wrote:
> On Fri, 2 Nov 2018 12:43:10 +0100
> David Hildenbrand  wrote:
> 
>> On 01.11.18 15:10, Igor Mammedov wrote:
>>> On Wed, 24 Oct 2018 12:19:25 +0200
>>> David Hildenbrand  wrote:
>>>   
>>>> For now, the hotplug handler is not called for devices that are
>>>> being cold plugged. The hotplug handler is setup when the machine
>>>> initialization is fully done. Only bridges that were cold plugged are
>>>> considered.
>>>>
>>>> Set the hotplug handler for the root piix bus directly when realizing.
>>>> Overwrite the hotplug handler of bridges when hotplugging/coldplugging
>>>> them.
>>>>
>>>> This will now make sure that the ACPI PCI hotplug handler is also called
>>>> for cold-plugged devices (also on bridges) and for bridges that were
>>>> hotplugged.
>>>>
>>>> When trying to hotplug a device to a hotplugged bridge, we now correctly
>>>> get the error message
>>>>  "Unsupported bus. Bus doesn't have property 'acpi-pcihp-bsel' set"
>>>> Insted of going via the standard PCI hotplug handler.  
>>> Erroring out is probably not ok, since it can break existing setups
>>> where SHPC hotplugging to hotplugged bridge was working just fine before.  
>>
>> The question is if it actually was supposed (and eventually did) work.
> I think it works now, it's QEMU 'ACPI hotplug hack' (which exists for
> the sake of Windows) limitation. We weren't able to dynamically add
> ACPI description for hotplugged bridge, so it was using native hotplug.
> Now theoretically we can load tables dynamically but that, would add
> maintenance nightmare (versioned tables) and would be harder to debug.
> I'd rather not go that direction and keep current limited version,
> suggesting users to use native hotplug if guest is capable.

Alright I'll keep current behavior (checking if the bridge is hotplugged
or coldplugged). Thanks!


-- 

Thanks,

David / dhildenb

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v1 2/7] pcihp: overwrite hotplug handler recursively from the start

2018-11-02 Thread David Hildenbrand
On 01.11.18 15:10, Igor Mammedov wrote:
> On Wed, 24 Oct 2018 12:19:25 +0200
> David Hildenbrand  wrote:
> 
>> For now, the hotplug handler is not called for devices that are
>> being cold plugged. The hotplug handler is setup when the machine
>> initialization is fully done. Only bridges that were cold plugged are
>> considered.
>>
>> Set the hotplug handler for the root piix bus directly when realizing.
>> Overwrite the hotplug handler of bridges when hotplugging/coldplugging
>> them.
>>
>> This will now make sure that the ACPI PCI hotplug handler is also called
>> for cold-plugged devices (also on bridges) and for bridges that were
>> hotplugged.
>>
>> When trying to hotplug a device to a hotplugged bridge, we now correctly
>> get the error message
>>  "Unsupported bus. Bus doesn't have property 'acpi-pcihp-bsel' set"
>> Insted of going via the standard PCI hotplug handler.
> Erroring out is probably not ok, since it can break existing setups
> where SHPC hotplugging to hotplugged bridge was working just fine before.

The question is if it actually was supposed (and eventually did) work.

If this was the expected behavior (mixing hotplug types), then the
necessary change to this patch would boil down to checking if the bridge
it hot or coldplugged.

> 
> Marcel/Michael what's your take on this change in behaviour?
> CCing libvirt in case they are doing this stuff
> 

Indeed, it would be nice to know if this was actually supposed to work
like this (coldplugged bridges using ACPI hotplug and hotplugged bridges
using SHPC hotplug).


-- 

Thanks,

David / dhildenb

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH] Remove deprecated -balloon option

2018-08-22 Thread David Hildenbrand
On 21.08.2018 12:05, Thomas Huth wrote:
> The "-balloon" option has been replaced by "-device virtio-balloon".
> It's been marked as deprecated since two releases, and nobody
> complained, so let's remove it now.
> 
> Signed-off-by: Thomas Huth 
> ---
>  docs/virtio-balloon-stats.txt |  6 +++---
>  qemu-deprecated.texi  |  5 -
>  qemu-options.hx   | 10 --
>  vl.c  | 36 
>  4 files changed, 3 insertions(+), 54 deletions(-)
> 
> diff --git a/docs/virtio-balloon-stats.txt b/docs/virtio-balloon-stats.txt
> index 9985e1d..1732cc8 100644
> --- a/docs/virtio-balloon-stats.txt
> +++ b/docs/virtio-balloon-stats.txt
> @@ -61,9 +61,9 @@ It's also important to note the following:
> respond to the request the timer will never be re-armed, which has
> the same effect as disabling polling
>  
> -Here are a few examples. QEMU is started with '-balloon virtio', which
> -generates '/machine/peripheral-anon/device[1]' as the QOM path for the
> -balloon device.
> +Here are a few examples. QEMU is started with '-device virtio-balloon',
> +which generates '/machine/peripheral-anon/device[1]' as the QOM path for
> +the balloon device.
>  
>  Enable polling with 2 seconds interval:
>  
> diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi
> index 67b7211..0714017 100644
> --- a/qemu-deprecated.texi
> +++ b/qemu-deprecated.texi
> @@ -106,11 +106,6 @@ enabled via the ``-machine usb=on'' argument.
>  
>  The ``-nodefconfig`` argument is a synonym for ``-no-user-config``.
>  
> -@subsection -balloon (since 2.12.0)
> -
> -The @option{--balloon virtio} argument has been superseded by
> -@option{--device virtio-balloon}.
> -
>  @subsection -fsdev handle (since 2.12.0)
>  
>  The ``handle'' fsdev backend does not support symlinks and causes the 9p
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 4efdedf..47c6b92 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -454,16 +454,6 @@ modprobe i810_audio clocking=48000
>  @end example
>  ETEXI
>  
> -DEF("balloon", HAS_ARG, QEMU_OPTION_balloon,
> -"-balloon virtio[,addr=str]\n"
> -"enable virtio balloon device (deprecated)\n", 
> QEMU_ARCH_ALL)
> -STEXI
> -@item -balloon virtio[,addr=@var{addr}]
> -@findex -balloon
> -Enable virtio balloon device, optionally with PCI address @var{addr}. This
> -option is deprecated, use @option{-device virtio-balloon} instead.
> -ETEXI
> -
>  DEF("device", HAS_ARG, QEMU_OPTION_device,
>  "-device driver[,prop[=value][,...]]\n"
>  "add device (based on driver)\n"
> diff --git a/vl.c b/vl.c
> index 16b913f..f952f01 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2127,36 +2127,6 @@ static void parse_display(const char *p)
>  }
>  }
>  
> -static int balloon_parse(const char *arg)
> -{
> -QemuOpts *opts;
> -
> -warn_report("This option is deprecated. "
> -"Use '--device virtio-balloon' to enable the balloon 
> device.");
> -
> -if (strcmp(arg, "none") == 0) {
> -return 0;
> -}
> -
> -if (!strncmp(arg, "virtio", 6)) {
> -if (arg[6] == ',') {
> -/* have params -> parse them */
> -opts = qemu_opts_parse_noisily(qemu_find_opts("device"), arg + 7,
> -   false);
> -if (!opts)
> -return  -1;
> -} else {
> -/* create empty opts */
> -opts = qemu_opts_create(qemu_find_opts("device"), NULL, 0,
> -_abort);
> -}
> -qemu_opt_set(opts, "driver", "virtio-balloon", _abort);
> -return 0;
> -}
> -
> -return -1;
> -}
> -
>  char *qemu_find_file(int type, const char *name)
>  {
>  int i;
> @@ -3659,12 +3629,6 @@ int main(int argc, char **argv, char **envp)
>  case QEMU_OPTION_no_hpet:
>  no_hpet = 1;
>  break;
> -case QEMU_OPTION_balloon:
> -if (balloon_parse(optarg) < 0) {
> -error_report("unknown -balloon argument %s", optarg);
> -exit(1);
> -}
> -break;
>  case QEMU_OPTION_no_reboot:
>  no_reboot = 1;
>  break;
> 

Nice to see this go.

Reviewed-by: David Hildenbrand 

-- 

Thanks,

David / dhildenb

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH] qemu: caps: use model "qemu" for s390 tcg cpu-model-expansion

2018-07-25 Thread David Hildenbrand
On 25.07.2018 01:02, Collin Walling wrote:
> On 07/24/2018 12:59 PM, Daniel P. Berrangé wrote:
>> On Tue, Jul 24, 2018 at 06:53:54PM +0200, Cornelia Huck wrote:
>>> On Tue, 24 Jul 2018 18:08:21 +0200
>>> Jiri Denemark  wrote:
>>>
 On Mon, Jul 23, 2018 at 17:45:33 -0400, Collin Walling wrote:
> Use model name "qemu" instead of "max" when calling
> query-cpu-model-expansion for s390 on tcg.
>
> Signed-off-by: Collin Walling 
> ---
>  src/qemu/qemu_capabilities.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
> index 23b4833..e9b44cc 100644
> --- a/src/qemu/qemu_capabilities.c
> +++ b/src/qemu/qemu_capabilities.c
> @@ -2356,7 +2356,10 @@ virQEMUCapsProbeQMPHostCPU(virQEMUCapsPtr qemuCaps,
>  
>  if (tcg || !virQEMUCapsGet(qemuCaps, QEMU_CAPS_KVM)) {
>  virtType = VIR_DOMAIN_VIRT_QEMU;
> -model = "max";
> +if (ARCH_IS_S390(qemuCaps->arch))
> +model = "qemu";
> +else
> +model = "max";  

 I think we should also check if "max" is a supported model
 (qemuCaps->tcgCPUModels is already populated at this point) and only use
 "qemu" on s390 if "max" is not supported. And please, report the issue
 to QEMU developers since one of the reasons behind "max" is its
 universal availability on everywhere CPU model expansion is supported.
>>>
>>> Hm, can you point me to that discussion? A quick search through the
>>> QEMU log gives me the addition of the "max" model on i386 as a
>>> replacement to the "host" model for !kvm, but nothing about it being
>>> universal...
>>
>> I don't recall the link but "max" is supposed to be the standard shorthand
>> for "enable all the features supported by the virt type". IOW, 'max' should
>> work both KVM and TCG ("host" was only well defined for KVM), and across
>> all architectures.
>>
>> So having to use a different name on s390 is a bug in QEMU imho.
>>
>> Regards,
>> Daniel
>>
> 
> Thanks for expanding on what the "max" model name is suppose to be. I wonder 
> if a 
> s/"qemu"/"max" in QEMU would suffice (I'm taking a shot in the dark here.)

Nope, it dynamically has to map to qemu/host depending on the
accelerator. But it also has to be a valid QOM object ("max-s390x-cpu").

> 
> @Connie, @David, you both are far more knowledgeable in this area than I am. 
> What
> do either of you suggest for moving forward with this? Should we forward this
> discussion on qemu-devel?
> 

I can have a look if nobody else wants to tackle it.


-- 

Thanks,

David / dhildenb

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH] qemu: caps: use model "qemu" for s390 tcg cpu-model-expansion

2018-07-25 Thread David Hildenbrand
On 24.07.2018 18:59, Daniel P. Berrangé wrote:
> On Tue, Jul 24, 2018 at 06:53:54PM +0200, Cornelia Huck wrote:
>> On Tue, 24 Jul 2018 18:08:21 +0200
>> Jiri Denemark  wrote:
>>
>>> On Mon, Jul 23, 2018 at 17:45:33 -0400, Collin Walling wrote:
 Use model name "qemu" instead of "max" when calling
 query-cpu-model-expansion for s390 on tcg.

 Signed-off-by: Collin Walling 
 ---
  src/qemu/qemu_capabilities.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

 diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c
 index 23b4833..e9b44cc 100644
 --- a/src/qemu/qemu_capabilities.c
 +++ b/src/qemu/qemu_capabilities.c
 @@ -2356,7 +2356,10 @@ virQEMUCapsProbeQMPHostCPU(virQEMUCapsPtr qemuCaps,
  
  if (tcg || !virQEMUCapsGet(qemuCaps, QEMU_CAPS_KVM)) {
  virtType = VIR_DOMAIN_VIRT_QEMU;
 -model = "max";
 +if (ARCH_IS_S390(qemuCaps->arch))
 +model = "qemu";
 +else
 +model = "max";  
>>>
>>> I think we should also check if "max" is a supported model
>>> (qemuCaps->tcgCPUModels is already populated at this point) and only use
>>> "qemu" on s390 if "max" is not supported. And please, report the issue
>>> to QEMU developers since one of the reasons behind "max" is its
>>> universal availability on everywhere CPU model expansion is supported.
>>
>> Hm, can you point me to that discussion? A quick search through the
>> QEMU log gives me the addition of the "max" model on i386 as a
>> replacement to the "host" model for !kvm, but nothing about it being
>> universal...
> 
> I don't recall the link but "max" is supposed to be the standard shorthand
> for "enable all the features supported by the virt type". IOW, 'max' should
> work both KVM and TCG ("host" was only well defined for KVM), and across
> all architectures.

I remember the discussions and I agreed back than that this was the
right thing to do.

Looks like x86, arm and ppc have it. We should have it, too.

> 
> So having to use a different name on s390 is a bug in QEMU imho.
> 

While this should be fixed, it is compat handling and not really a BUG.

> Regards,
> Daniel
> 


-- 

Thanks,

David / dhildenb

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCHv2 11/11] qemu_driver: BaselineHypervisorCPU supports S390 using QEMU/QMP

2018-07-24 Thread David Hildenbrand
On 23.07.2018 22:16, Collin Walling wrote:
> On 07/21/2018 12:05 AM, Chris Venteicher wrote:
>> Quoting David Hildenbrand (2018-07-18 02:26:24)
>>> On 18.07.2018 00:39, Collin Walling wrote:
>>>> On 07/17/2018 05:01 PM, David Hildenbrand wrote:
>>>>> On 13.07.2018 18:00, Jiri Denemark wrote:
>>>>>> On Mon, Jul 09, 2018 at 22:56:55 -0500, Chris Venteicher wrote:
>>>>>>> Transient S390 configurations require using QEMU to compute CPU Model
>>>>>>> Baseline and to do CPU Feature Expansion.
>>>>>>>
>>>>>>> Start and use a single QEMU instance to do both the baseline and
>>>>>>> expansion transactions required by BaselineHypervisorCPU.
>>>>>>>
>>>>>>> CPU Feature Expansion uses true / false to indicate if property is/isn't
>>>>>>> included in model. Baseline only returns property list where all
>>>>>>> enumerated properties are included.
>>>>>>
>>>>>> So are you saying on s390 there's no chance there would be a CPU model
>>>>>> with some feature which is included in the CPU model disabled for some
>>>>>> reason? Sounds too good to be true :-) (This is the question I referred
>>>>>> to in one of my replies to the other patches.)
>>>>>
>>>>> Giving some background information: When we expand/baseline CPU models,
>>>>> we always expand them to the "-base" variants of our CPU models, which
>>>>> contain some set of features we expect to be around in all sane
>>>>> configurations ("minimal feature set").
>>>>>
>>>>> It is very unlikely that we ever have something like
>>>>> "z14-base,featx=off", but it could happen
>>>>>  - When using an emulator (TCG)
>>>>>  - When running nested and the guest hypervisor is started with such a
>>>>>strange CPU model
>>>>>  - When something in the HW is very wrong or eventually removed in the
>>>>>future (unlikely but possible)
>>>>>
>>>>> On some very weird inputs to a baseline request, such a strange model
>>>>> can also be the result. But it is very unusual.
>>>>>
>>>>> I assume something like "baseline z14-base,featx=off with z14-base" will
>>>>> result in "z14-base,featx=off", too.
>>>>>
>>>>>
>>>>
>>>> That's correct. A CPU model with a feature disabled that is baseline with 
>>>> a CPU 
>>>> model with that same feature enabled will omit that feature in the QMP 
>>>> response.
>>>>
>>>> e.g. if z14-base has features x, y, z then
>>>>
>>>> "baseline z14-base,featx=off with z14-base" will result in 
>>>> "z14-base,featy=on,featz=on"
>>
>> I am runing tests on both S390 and X86 (hacked QEMU to enable baseline).
>>
>> I don't see a "false" property in the baseline response in any of the logs.
> 
> Right... baseline should not be returning any properties paired with false. It
> constructs a third CPU model with properties that can run on both CPUs.
> 

Let me rephrase: We don't return "false" for any property when
baselining as of now, but this might change in the future. It is
undocumented behavior.


-- 

Thanks,

David / dhildenb

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCHv2 11/11] qemu_driver: BaselineHypervisorCPU supports S390 using QEMU/QMP

2018-07-18 Thread David Hildenbrand
On 18.07.2018 00:39, Collin Walling wrote:
> On 07/17/2018 05:01 PM, David Hildenbrand wrote:
>> On 13.07.2018 18:00, Jiri Denemark wrote:
>>> On Mon, Jul 09, 2018 at 22:56:55 -0500, Chris Venteicher wrote:
>>>> Transient S390 configurations require using QEMU to compute CPU Model
>>>> Baseline and to do CPU Feature Expansion.
>>>>
>>>> Start and use a single QEMU instance to do both the baseline and
>>>> expansion transactions required by BaselineHypervisorCPU.
>>>>
>>>> CPU Feature Expansion uses true / false to indicate if property is/isn't
>>>> included in model. Baseline only returns property list where all
>>>> enumerated properties are included.
>>>
>>> So are you saying on s390 there's no chance there would be a CPU model
>>> with some feature which is included in the CPU model disabled for some
>>> reason? Sounds too good to be true :-) (This is the question I referred
>>> to in one of my replies to the other patches.)
>>
>> Giving some background information: When we expand/baseline CPU models,
>> we always expand them to the "-base" variants of our CPU models, which
>> contain some set of features we expect to be around in all sane
>> configurations ("minimal feature set").
>>
>> It is very unlikely that we ever have something like
>> "z14-base,featx=off", but it could happen
>>  - When using an emulator (TCG)
>>  - When running nested and the guest hypervisor is started with such a
>>strange CPU model
>>  - When something in the HW is very wrong or eventually removed in the
>>future (unlikely but possible)
>>
>> On some very weird inputs to a baseline request, such a strange model
>> can also be the result. But it is very unusual.
>>
>> I assume something like "baseline z14-base,featx=off with z14-base" will
>> result in "z14-base,featx=off", too.
>>
>>
> 
> That's correct. A CPU model with a feature disabled that is baseline with a 
> CPU 
> model with that same feature enabled will omit that feature in the QMP 
> response.
> 
> e.g. if z14-base has features x, y, z then
> 
> "baseline z14-base,featx=off with z14-base" will result in 
> "z14-base,featy=on,featz=on"

Usually we try to not chose a model with stripped off base features ("we
try to produce a model that looks sane"), but instead fallback to some
very ancient CPU model. E.g.

{ "execute": "query-cpu-model-baseline", "arguments" : { "modela": {
"name": "z14-base", "props": {"msa" : false}}, "modelb": { "name": "z14"}} }

-> {"return": {"model": {"name": "z800-base", "props": {"etf2": true,
"ldisp": true

We might want to change that behavior in the future however (or maybe it
already is like this for some corner cases) - assume some base feature
gets dropped by HW in a new CPU generation. We don't always want to
fallback to a z900 or so when baselining. So one should assume that we
can have disabled features here.

Especially as there is a BUG in QEMU I'll have to fix:

{ "execute": "query-cpu-model-baseline", "arguments" : { "modela": {
"name": "z14-base", "props": {"esan3" : false}}, "modelb": { "name":
"z14"}} }

-> Segmentation fault

This would have to produce a model with esan3 disabled (very very
unlikely to ever happen in real life :) )

The result should be something like {"model": {"name": "z900-base",
"props": {"esan3": false}}} or an error that they cannot be baselined.



> 
> Since baseline will just report a base cpu and its minimal feature set... I 
> wonder if it
> makes sense for libvirt to just report the features resulting from the 
> baseline command 
> instead of later calling expansion?

Yes it does and the docs say:

"Baseline two CPU models, creating a compatible third model. The created
model will always be a static, migration-safe CPU model (see "static"
CPU model expansion for details)"

> 
>>>
>>> Most of the code you added in this patch is indented by three spaces
>>> while we use four spaces in libvirt.
>>>
>>>> ---
>>>>  src/qemu/qemu_driver.c | 74 +-
>>>>  1 file changed, 65 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver

Re: [libvirt] [PATCHv2 11/11] qemu_driver: BaselineHypervisorCPU supports S390 using QEMU/QMP

2018-07-17 Thread David Hildenbrand
On 13.07.2018 18:00, Jiri Denemark wrote:
> On Mon, Jul 09, 2018 at 22:56:55 -0500, Chris Venteicher wrote:
>> Transient S390 configurations require using QEMU to compute CPU Model
>> Baseline and to do CPU Feature Expansion.
>>
>> Start and use a single QEMU instance to do both the baseline and
>> expansion transactions required by BaselineHypervisorCPU.
>>
>> CPU Feature Expansion uses true / false to indicate if property is/isn't
>> included in model. Baseline only returns property list where all
>> enumerated properties are included.
> 
> So are you saying on s390 there's no chance there would be a CPU model
> with some feature which is included in the CPU model disabled for some
> reason? Sounds too good to be true :-) (This is the question I referred
> to in one of my replies to the other patches.)

Giving some background information: When we expand/baseline CPU models,
we always expand them to the "-base" variants of our CPU models, which
contain some set of features we expect to be around in all sane
configurations ("minimal feature set").

It is very unlikely that we ever have something like
"z14-base,featx=off", but it could happen
 - When using an emulator (TCG)
 - When running nested and the guest hypervisor is started with such a
   strange CPU model
 - When something in the HW is very wrong or eventually removed in the
   future (unlikely but possible)

On some very weird inputs to a baseline request, such a strange model
can also be the result. But it is very unusual.

I assume something like "baseline z14-base,featx=off with z14-base" will
result in "z14-base,featx=off", too.


> 
> Most of the code you added in this patch is indented by three spaces
> while we use four spaces in libvirt.
> 
>> ---
>>  src/qemu/qemu_driver.c | 74 +-
>>  1 file changed, 65 insertions(+), 9 deletions(-)
>>
>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
>> index 9a35e04a85..6c6107f077 100644
>> --- a/src/qemu/qemu_driver.c
>> +++ b/src/qemu/qemu_driver.c
>> @@ -13400,10 +13400,13 @@ qemuConnectBaselineHypervisorCPU(virConnectPtr 
>> conn,
>>  virArch arch;
>>  virDomainVirtType virttype;
>>  virDomainCapsCPUModelsPtr cpuModels;
>> -bool migratable;
>> +bool migratable_only;
> 
> Why? The bool says the user specified
> VIR_CONNECT_BASELINE_CPU_MIGRATABLE which means they want a migratable
> CPU back. What does the "_only" part mean? This API does not return
> several CPUs, it only returns a single one and it's either migratable or
> not.
> 
>>  virCPUDefPtr cpu = NULL;
>>  char *cpustr = NULL;
>>  char **features = NULL;
>> +virQEMUCapsInitQMPCommandPtr cmd = NULL;
>> +bool forceTCG = false;
>> +qemuMonitorCPUModelInfoPtr modelInfo = NULL;
>>  
>>  virCheckFlags(VIR_CONNECT_BASELINE_CPU_EXPAND_FEATURES |
>>VIR_CONNECT_BASELINE_CPU_MIGRATABLE, NULL);
>> @@ -13411,8 +13414,6 @@ qemuConnectBaselineHypervisorCPU(virConnectPtr conn,
>>  if (virConnectBaselineHypervisorCPUEnsureACL(conn) < 0)
>>  goto cleanup;
>>  
>> -migratable = !!(flags & VIR_CONNECT_BASELINE_CPU_MIGRATABLE);
>> -
>>  if (!(cpus = virCPUDefListParse(xmlCPUs, ncpus, VIR_CPU_TYPE_AUTO)))
>>  goto cleanup;
>>  
>> @@ -13425,6 +13426,19 @@ qemuConnectBaselineHypervisorCPU(virConnectPtr conn,
>>  if (!qemuCaps)
>>  goto cleanup;
>>  
>> +/* QEMU can enumerate non-migratable cpu model features for some archs 
>> like x86
>> + * migratable_only == true:  ask for and include only migratable 
>> features
>> + * migratable_only == false: ask for and include all features
>> + */
>> +migratable_only = !!(flags & VIR_CONNECT_BASELINE_CPU_MIGRATABLE);
>> +
>> +if (ARCH_IS_S390(arch)) {
>> +   /* QEMU for S390 arch only enumerates migratable features
>> +* No reason to explicitly ask QEMU for or include non-migratable 
>> features
>> +*/
>> +   migratable_only = true;
>> +}
>> +
> 
> And what if they come up with some features which are not migratable in
> the future? I don't think there's any reason for this API to mess with
> the value. The code should just provide the same CPU in both cases for
> s390.

s390x usually only provides features if they are migratable. Could it
happen it the future that we have something that cannot be migrated?
Possible but very very unlikely. We cared about migration (even for
nested support) right from the beginning. As of now, we have no features
that are supported by QEMU that cannot be migrated - in contrast to e.g.
x86 (e.g. VMX/SVM). Every new feature has to be whitelisted in QEMU -
and we only allow to do so if migration is in place for it.

> 
>>  if (!(cpuModels = virQEMUCapsGetCPUDefinitions(qemuCaps, virttype)) ||
>>  cpuModels->nmodels == 0) {
>>  virReportError(VIR_ERR_OPERATION_UNSUPPORTED,
>> @@ -13437,18 +13451,31 @@ qemuConnectBaselineHypervisorCPU(virConnectPtr 
>> conn,
>>  

Re: [libvirt] [PATCH/QEMU] s390x/kvm: use cpu_model_available for guarded storage on compat machines

2017-10-26 Thread David Hildenbrand
On 25.10.2017 18:45, Marc Hartmayer wrote:
> On Wed, Oct 25, 2017 at 05:50 PM +0200, David Hildenbrand <da...@redhat.com> 
> wrote:
>> On 25.10.2017 17:09, Boris Fiuczynski wrote:
>>> On 10/25/2017 12:23 PM, David Hildenbrand wrote:
>>>> On 25.10.2017 12:18, Christian Borntraeger wrote:
>>>>> Ping, I plan to submit belows patch for 2.11. We can then still look into
>>>>> a libvirt<->qemu interface for limiting host-model depending on machine 
>>>>> versions
>>>>> (or not).
>>>>
>>>> I think this would be sufficient for now.
>>>>
>>>> Having different host models, depending on the the machine type sounds
>>>> wrong. But maybe we'll need it in the future.
>>>>
>>>
>>> David, I disagree if your proposal is to generally tolerate new cpu
>>> features in old machine types. This *might* work for gs but how do you
>>> guaranty that guests do not behave differently/wrong when suddenly new
>>> cpu features are made available to guest when (re-)starting them.
>>> That is my feedback for the qemu side of this mater.
>>
>> Just re-reading this section, so what you mean is:
>>
>> a) guest is started, host model is copied and used. guest works.
>> b) guest is stopped.
>> c) QEMU/KVM/HW is updated.
>> d) guest is started, new host model is copied. guest no longer works.
>>
>> d) could be because there are now some additional features with e.g.
>> broken guest implementation or now missing features.
>>
>>
>> What you propose (if I am not wrong) is a to bind features somehow to a
>> QEMU machine. I think that should never be done. You could not catch now
>> missing features.
> 
> What exactly do you mean by the last sentence?

In general, up/downgrading QEMU/KVM/HW can lead to the removal of features.

Another example is the "nested" flag for KVM. toggling that can lead to
the host feature looking differently (+/- SIE features).

So if you really want to make sure that a VM XML that once ran perfectly
fine on a host will still run after any QEMU/KVM/HW changes on that host:

a) specify an explicit CPU model, not the host model (e.g. "z13")
b) copy the host model to the XML persistently.

Linking any of that to the machine types is in my opinion the very wrong
approach.

> 
>>
>> What would you think about a persistent host-model copy option? So
>> instead of copying at every start, only copy and replace it once in the XML?
>>
>> Easy to specify by the user and no CPU feature changes will happend
>> "blindly".
>>
>>
>> --
>>
>> Thanks,
>>
>> David
>>
> --
> Beste Grüße / Kind regards
>Marc Hartmayer
> 
> IBM Deutschland Research & Development GmbH
> Vorsitzende des Aufsichtsrats: Martina Koederitz
> Geschäftsführung: Dirk Wittkopp
> Sitz der Gesellschaft: Böblingen
> Registergericht: Amtsgericht Stuttgart, HRB 243294
> 


-- 

Thanks,

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH/QEMU] s390x/kvm: use cpu_model_available for guarded storage on compat machines

2017-10-25 Thread David Hildenbrand
On 25.10.2017 17:09, Boris Fiuczynski wrote:
> On 10/25/2017 12:23 PM, David Hildenbrand wrote:
>> On 25.10.2017 12:18, Christian Borntraeger wrote:
>>> Ping, I plan to submit belows patch for 2.11. We can then still look into
>>> a libvirt<->qemu interface for limiting host-model depending on machine 
>>> versions
>>> (or not).
>>
>> I think this would be sufficient for now.
>>
>> Having different host models, depending on the the machine type sounds
>> wrong. But maybe we'll need it in the future.
>>
> 
> David, I disagree if your proposal is to generally tolerate new cpu 
> features in old machine types. This *might* work for gs but how do you 
> guaranty that guests do not behave differently/wrong when suddenly new 
> cpu features are made available to guest when (re-)starting them.
> That is my feedback for the qemu side of this mater.

Just re-reading this section, so what you mean is:

a) guest is started, host model is copied and used. guest works.
b) guest is stopped.
c) QEMU/KVM/HW is updated.
d) guest is started, new host model is copied. guest no longer works.

d) could be because there are now some additional features with e.g.
broken guest implementation or now missing features.


What you propose (if I am not wrong) is a to bind features somehow to a
QEMU machine. I think that should never be done. You could not catch now
missing features.

What would you think about a persistent host-model copy option? So
instead of copying at every start, only copy and replace it once in the XML?

Easy to specify by the user and no CPU feature changes will happend
"blindly".


-- 

Thanks,

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH/QEMU] s390x/kvm: use cpu_model_available for guarded storage on compat machines

2017-10-25 Thread David Hildenbrand
On 25.10.2017 17:09, Boris Fiuczynski wrote:
> On 10/25/2017 12:23 PM, David Hildenbrand wrote:
>> On 25.10.2017 12:18, Christian Borntraeger wrote:
>>> Ping, I plan to submit belows patch for 2.11. We can then still look into
>>> a libvirt<->qemu interface for limiting host-model depending on machine 
>>> versions
>>> (or not).
>>
>> I think this would be sufficient for now.
>>
>> Having different host models, depending on the the machine type sounds
>> wrong. But maybe we'll need it in the future.
>>
> 
> David, I disagree if your proposal is to generally tolerate new cpu 
> features in old machine types. This *might* work for gs but how do you 
> guaranty that guests do not behave differently/wrong when suddenly new 
> cpu features are made available to guest when (re-)starting them.
> That is my feedback for the qemu side of this mater.

My point would be that it seems to work for all existing architectures
(so far as I am aware) and this one problem is easily fixed (and stems
from old CPU feature compatibility handling). So my question would be,
are there any potential CPU features that make such handling necessary
right now or in the near future?

> 
> Regarding the libvirt side of this:
> When looking at https://libvirt.org/formatdomain.html#elementsCPU I 
> found the following sentence:
> Since the CPU definition is copied just before starting a domain, 
> exactly the same XML can be used on different hosts while still 
> providing the best guest CPU each host supports.
> 
> My interpretation of "the best guest CPU each host supports" is that 
> besides limiting factors like hardware, kernel and qemu capabilities the 
> requested machine type for the guest is a limiting factor as well.

I understand "what the host supports" as combination of hw+kernel+qemu.
But the definition can be interpreted differently. I don't think that
the requested machine has to be taken into account at this point.
(Again, do you have any real examples where this would be applicable?)

> 
> Nevertheless if my interpretation is found to be incorrect than we 
> should think about another new cpu mode that includes the machine type 
> into the "best guest CPU" detection.

Which use case? I just want to understand how the current solution could
be problematic? (besides the problem we had, which is easily fixed)

> My assumption is that we must not require the users to know which cpu 
> model they manually need to define to match a specific machine type
> AND we want to guarantee that guests run without risking any side 
> effects by tolerating any additional cpu features.

That's why I think CPU models should be independent of the used QEMU
machine. It just over complicates things as we have seen.

Especially suddenly having multiple "host" CPU models depending on the
machine type, confusing. If we can, we should keep it simple.

-- 

Thanks,

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH/QEMU] s390x/kvm: use cpu_model_available for guarded storage on compat machines

2017-10-25 Thread David Hildenbrand
On 25.10.2017 12:18, Christian Borntraeger wrote:
> Ping, I plan to submit belows patch for 2.11. We can then still look into
> a libvirt<->qemu interface for limiting host-model depending on machine 
> versions
> (or not).

I think this would be sufficient for now.

Having different host models, depending on the the machine type sounds
wrong. But maybe we'll need it in the future.


-- 

Thanks,

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] Question about the host-model CPU mode

2017-10-20 Thread David Hildenbrand

> 
> I intend to put some brain-power in this too. Probably next week.
> 
> My general impression is, that I have a at places different understanding
> of how things should work compared to David. Especially when it comes
> to this concept of persistent copying, and also an end-user-digestible
> definition of what host-model does. (I think this with copying capabilities
> from whatever xml which is subject to convoluted caching is a bit too
> hard to digest for an end user not involved in the development of qemu
> and libvirt).

When reading the doc I was assuming that it would be a persistent copy.
But it would only fix part of the issue.

In the end, it specifies quite clearly what is copied. And if that is
not runnable with the selected machine, bad luck. Therefore ...

> 
> I had a conversation with Boris a couple of hours ago, and it seems, things
> are pretty convoluted.
> 
> If I understand the train of thought here (David) it can be summarized like 
> this:
> For a certain QEMU binary no aspect of the cpu-models may depend on the
> machine type. In particular, compat properties and compat handling is
> not alowed to alter cpu-models (whether the available cpu models nor the
> capabilities of each of these).

... I always recommended avoiding such compatibility settings in the
machine. But before we had CPU models it was really all we had. No we
should let go of it.

We might be able to fix this one (gs) and take care of it in the future,
but ...

> 
> This we would have to make a part of the external interface. That way
> one could be sure that the 'cpu capabilities' are machine-type independent
> (that is, the same for all the machine types).

... the problem is once nasty stuff like zPCI comes in place. If we ever
have (other?) machine dependent stuff that toggles CPU features, we can
only try limit the damage.

> 
> Or did I get this completely wrong? (Based on the answer branches my
> think).

Not at all.

> 
> Halil
> 


-- 

Thanks,

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Question about the host-model CPU mode

2017-10-20 Thread David Hildenbrand
On 20.10.2017 16:02, Christian Borntraeger wrote:
> 
> 
> On 10/20/2017 03:51 PM, David Hildenbrand wrote:
> [...]
>>> The problem goes much further.
>>> A fresh guest with
>>>
>>> 
>>>  hvm
>>>
>>>
>>>
>>> does not start. No migration from an older system is necessary.
>>>
>>
>> Yes, as stated in the documentation "copying host CPU definition from
>> capabilities XML" this can not work. And it works just as documented.
>> Not saying this is a nice thing :)
>>
>> I think we should try to fix gs_allowed (if possible) and avoid
>> something like that in the future. This would avoid other complexity
>> involved when suddenly having X host models.
> 
> Maybe this one is really a proper fix. It will allow the guest to start
> and on migration the cpu model will complain if the target cannot provide gs.
> Similar things can happen if - for example - the host kernel lacks some 
> features.

Right, just what I had in mind.

> 
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 77169bb..97a08fa 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -430,7 +430,6 @@ static void ccw_machine_class_init(ObjectClass *oc, void 
> *data)
>  s390mc->ri_allowed = true;
>  s390mc->cpu_model_allowed = true;
>  s390mc->css_migration_enabled = true;
> -s390mc->gs_allowed = true;
>  mc->init = ccw_init;
>  mc->reset = s390_machine_reset;
>  mc->hot_add_cpu = s390_hot_add_cpu;
> @@ -509,12 +508,6 @@ bool cpu_model_allowed(void)
>  return get_machine_class()->cpu_model_allowed;
>  }
>  
> -bool gs_allowed(void)
> -{
> -/* for "none" machine this results in true */
> -return get_machine_class()->gs_allowed;
> -}
> -
>  static char *machine_get_loadparm(Object *obj, Error **errp)
>  {
>  S390CcwMachineState *ms = S390_CCW_MACHINE(obj);
> @@ -757,7 +750,6 @@ static void ccw_machine_2_9_class_options(MachineClass 
> *mc)
>  {
>  S390CcwMachineClass *s390mc = S390_MACHINE_CLASS(mc);
>  
> -s390mc->gs_allowed = false;
>  ccw_machine_2_10_class_options(mc);
>  SET_MACHINE_COMPAT(mc, CCW_COMPAT_2_9);
>  s390mc->css_migration_enabled = false;
> diff --git a/include/hw/s390x/s390-virtio-ccw.h 
> b/include/hw/s390x/s390-virtio-ccw.h
> index a9a90c2..1de53b0 100644
> --- a/include/hw/s390x/s390-virtio-ccw.h
> +++ b/include/hw/s390x/s390-virtio-ccw.h
> @@ -40,7 +40,6 @@ typedef struct S390CcwMachineClass {
>  bool ri_allowed;
>  bool cpu_model_allowed;
>  bool css_migration_enabled;
> -bool gs_allowed;
>  } S390CcwMachineClass;
>  
>  /* runtime-instrumentation allowed by the machine */
> diff --git a/target/s390x/kvm.c b/target/s390x/kvm.c
> index a0d5052..3f13fc2 100644
> --- a/target/s390x/kvm.c
> +++ b/target/s390x/kvm.c
> @@ -362,7 +362,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>  cap_ri = 1;
>  }
>  }
> -if (gs_allowed()) {
> +if (cpu_model_allowed()) {
>  if (kvm_vm_enable_cap(s, KVM_CAP_S390_GS, 0) == 0) {
>  cap_gs = 1;
>  }
> 

And the last hunk makes sure we keep same handling for machines without
CPU model support and therefore no way to mask support. For all recent
machines, we expect CPU model checks to be in place.

Will have to think about this a bit more. Will you send this as a proper
patch?

-- 

Thanks,

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Question about the host-model CPU mode

2017-10-20 Thread David Hildenbrand
On 20.10.2017 15:49, Christian Borntraeger wrote:
> 
> 
> On 10/20/2017 03:43 PM, David Hildenbrand wrote:
>> On 20.10.2017 15:36, Christian Borntraeger wrote:
>>>
>>>
>>> On 10/20/2017 03:16 PM, David Hildenbrand wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> we recently encountered the problem that the 'host-model' [1] has to be
>>>>> related to the machine type of a domain. We have following problem:
>>>>>
>>>>>Let's assume we've a z13 system with a QEMU 2.9 and we define a
>>>>>domain using the default s390-virtio-ccw machine together with the
>>>>>host-model CPU mode [1]. The definition will have the machine
>>>>>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
>>>>>in the domain definition. In a next step we upgrade to QEMU 2.10
>>>>>(first version to recognize z14). Everything is still fine, even
>>>>>though the machine runs in 2.9 compatibility mode. Finally we
>>>>>upgrade to a z14. As a consequence it is not possible to start the
>>>>>domain anymore as the machine type doesn't support our CPU host
>>>>>model (which is expanded at start time of the domain).
>>>>
>>>> Actually, what is the cause of that problem? I assume it is the gs
>>>> feature (gs_allowed)?
>>>>
>>>> We should really avoid such things (..._allowed) for CPU model features
>>>> in the future and clue all new such stuff to cpumodel_allowed.
>>>
>>> Yes, starting a guest with
>>>
>>> hvm
>>>   
>>>   
>>>
>>> results in
>>>
>>> qemu-system-s390x: Some features requested in the CPU model are not 
>>> available in the configuration: gs 
>>>
>>> Tying it to cpumodel_allowed would not help, migration-wise though.
>>> libvirt would still transform
>>>
>>>
>>> hvm
>>>   
>>>   
>>
>> My point was, that the host model would have to be copied and _remain_
>> there when s390-ccw-virtio was expanded to s390-ccw-virtio-2.9.
>>
>> So really replacing  by the model z13-base
>> This would at least fix this issue. Just like s390-ccw-virtio get's
>> replaced and remains thats way.
>>
>> But this might for sure have other problems.
> 
> The problem goes much further.
> A fresh guest with
> 
> 
>  hvm
>
>
> 
> does not start. No migration from an older system is necessary.
> 

Yes, as stated in the documentation "copying host CPU definition from
capabilities XML" this can not work. And it works just as documented.
Not saying this is a nice thing :)

I think we should try to fix gs_allowed (if possible) and avoid
something like that in the future. This would avoid other complexity
involved when suddenly having X host models.

-- 

Thanks,

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Question about the host-model CPU mode

2017-10-20 Thread David Hildenbrand
On 20.10.2017 15:36, Christian Borntraeger wrote:
> 
> 
> On 10/20/2017 03:16 PM, David Hildenbrand wrote:
>>
>>> Hi all,
>>>
>>> we recently encountered the problem that the 'host-model' [1] has to be
>>> related to the machine type of a domain. We have following problem:
>>>
>>>Let's assume we've a z13 system with a QEMU 2.9 and we define a
>>>domain using the default s390-virtio-ccw machine together with the
>>>host-model CPU mode [1]. The definition will have the machine
>>>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
>>>in the domain definition. In a next step we upgrade to QEMU 2.10
>>>(first version to recognize z14). Everything is still fine, even
>>>though the machine runs in 2.9 compatibility mode. Finally we
>>>upgrade to a z14. As a consequence it is not possible to start the
>>>domain anymore as the machine type doesn't support our CPU host
>>>model (which is expanded at start time of the domain).
>>
>> Actually, what is the cause of that problem? I assume it is the gs
>> feature (gs_allowed)?
>>
>> We should really avoid such things (..._allowed) for CPU model features
>> in the future and clue all new such stuff to cpumodel_allowed.
> 
> Yes, starting a guest with
>
> hvm
>   
>   
> 
> results in
> 
> qemu-system-s390x: Some features requested in the CPU model are not available 
> in the configuration: gs 
> 
> Tying it to cpumodel_allowed would not help, migration-wise though.
> libvirt would still transform
> 
>
> hvm
>   
>   

My point was, that the host model would have to be copied and _remain_
there when s390-ccw-virtio was expanded to s390-ccw-virtio-2.9.

So really replacing  by the model z13-base
This would at least fix this issue. Just like s390-ccw-virtio get's
replaced and remains thats way.

But this might for sure have other problems.

> 
> 
> into 
> -cpu 
> z14-base,aen=on,cmmnt=on,aefsi=on,mepoch=on,msa8=on,msa7=on,msa6=on,msa5=on,msa4=on,
>  \
> msa3=on,msa2=on,msa1=on,sthyi=on,edat=on,ri=on,edat2=on,vx=on,ipter=on,vxeh=on,vxpd=on,
>  \
> esop=on,iep=on,cte=on,ais=on,gs=on,zpci=on,sea_esop2=on,te=on,cmm=on
>  ^
> because cpu model is certainly there. Now the guest would start but migration 
> would
> later fail because what we create now would never have been possible with 2.9.

Migration is a totally different story, as tooling has to make sure to
find a CPU model that is compatible over all host models. So
cpumodel_allowed would indeed work. (at least in my theory ;) )

> 
> If libvirt could get information from QEMU depending on the machine version, 
> this would
> make it work. But of course this has other issues.

I am not sure if that is the right thing to do.

The documentation states clearly that the host model is copied. If that
is not runnable, fix the setup.

> 
> Christian
> 


-- 

Thanks,

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Question about the host-model CPU mode

2017-10-20 Thread David Hildenbrand

> Hi all,
> 
> we recently encountered the problem that the 'host-model' [1] has to be
> related to the machine type of a domain. We have following problem:
> 
>Let's assume we've a z13 system with a QEMU 2.9 and we define a
>domain using the default s390-virtio-ccw machine together with the
>host-model CPU mode [1]. The definition will have the machine
>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
>in the domain definition. In a next step we upgrade to QEMU 2.10
>(first version to recognize z14). Everything is still fine, even
>though the machine runs in 2.9 compatibility mode. Finally we
>upgrade to a z14. As a consequence it is not possible to start the
>domain anymore as the machine type doesn't support our CPU host
>model (which is expanded at start time of the domain).

Actually, what is the cause of that problem? I assume it is the gs
feature (gs_allowed)?

We should really avoid such things (..._allowed) for CPU model features
in the future and clue all new such stuff to cpumodel_allowed.

-- 

Thanks,

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Question about the host-model CPU mode

2017-10-20 Thread David Hildenbrand
On 20.10.2017 14:50, Jiri Denemark wrote:
> On Fri, Oct 20, 2017 at 13:37:42 +0200, David Hildenbrand wrote:
>> On 20.10.2017 13:09, Marc Hartmayer wrote:
>>> we recently encountered the problem that the 'host-model' [1] has to be
>>> related to the machine type of a domain. We have following problem:
>>>
>>>Let's assume we've a z13 system with a QEMU 2.9 and we define a
>>>domain using the default s390-virtio-ccw machine together with the
>>>host-model CPU mode [1]. The definition will have the machine
>>>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
>>>in the domain definition. In a next step we upgrade to QEMU 2.10
>>>(first version to recognize z14). Everything is still fine, even
>>>though the machine runs in 2.9 compatibility mode. Finally we
>>>upgrade to a z14. As a consequence it is not possible to start the
>>>domain anymore as the machine type doesn't support our CPU host
>>>model (which is expanded at start time of the domain).
>>
>> Yes, the host model may vary depending on QEMU version and to some
>> degree also on compatibility machines (although I always try to push
>> people to not introduce any new such stuff).
>>
>> Quoting for the libvirt documentation: https://libvirt.org/formatdomain.html
>>
>> "host-model
>> The host-model mode is essentially a shortcut to copying host CPU
>> definition from capabilities XML into domain XML. Since the CPU
>> definition is copied just before starting a domain, exactly the same XML
>> can be used on different hosts while still providing the best guest CPU
>> each host supports."
>>
>> You're telling me that that copying does not happen, which seems to be
>> the problematic part about this in my opinion.
>>
>> So I am not sure if probing anything else (as you noted below) is the
>> correct thing to do. Copy it and you have a migration_safe and even
>> static version of the _current_ host CPU.
> 
> The thing is libvirt calls query-cpu-model-expansion to check what the
> host CPU is. This 'host-model' CPU is replaced with the probed CPU model
> when a domain starts. The problem described by Marc is that the probed
> CPU model cannot be used universally with all machine types. So starting
> a domain on such host with machine type s390-virtio-ccw-2.10 works, but
> a domain with machine type s390-virtio-ccw-2.9 fails to start with the
> same probed CPU model.
> 

My assumption would be that the CPU model is copied into the XML when
the domain fist starts. This is what the documentation describes.

So when upgrading QEMU, the CPU model in the XML is still the same
(z13), even though something different is now reported in the host info
after upgrading QEMU (z14).

In this case it would continue to work.

The problem is that the CPU model is not copied into the XML doesn't
remain there, right? It is suddenly replaced with a z14 host model.

> Jirka
> 


-- 

Thanks,

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Question about the host-model CPU mode

2017-10-20 Thread David Hildenbrand
On 20.10.2017 13:09, Marc Hartmayer wrote:
> On Thu, Oct 12, 2017 at 02:07 PM +0200, Jiri Denemark <jdene...@redhat.com> 
> wrote:
>> On Mon, Oct 09, 2017 at 10:16:48 +0200, Marc Hartmayer wrote:
>>> On Thu, Oct 05, 2017 at 02:11 PM +0200, Jiri Denemark <jdene...@redhat.com> 
>>> wrote:
>>>> But it's going to be a bit complicated because we ask QEMU what the
>>>> host CPU is and the interface we used would need to be enhanced to
>>>> give us different results for all supported machine types so that we
>>>> can select the right one when a domain is started.
>>>
>>> So how do we deal with this?
>>
>> I think we need to start discussing this on qemu-devel list. If I
>> remember correctly, David Hildenbrand is the original author of the
>> query-cpu-model-expansion QMP command.
>>
>> Please, Cc me so that the thread is more visible to me.
>>
>> Thanks,
>>
>> Jirka
>>
> 
> Hi all,
> 
> we recently encountered the problem that the 'host-model' [1] has to be
> related to the machine type of a domain. We have following problem:
> 
>Let's assume we've a z13 system with a QEMU 2.9 and we define a
>domain using the default s390-virtio-ccw machine together with the
>host-model CPU mode [1]. The definition will have the machine
>expanded to s390-virtio-ccw-2.9 but retain the host-model CPU mode
>in the domain definition. In a next step we upgrade to QEMU 2.10
>(first version to recognize z14). Everything is still fine, even
>though the machine runs in 2.9 compatibility mode. Finally we
>upgrade to a z14. As a consequence it is not possible to start the
>domain anymore as the machine type doesn't support our CPU host
>model (which is expanded at start time of the domain).

Yes, the host model may vary depending on QEMU version and to some
degree also on compatibility machines (although I always try to push
people to not introduce any new such stuff).

Quoting for the libvirt documentation: https://libvirt.org/formatdomain.html

"host-model
The host-model mode is essentially a shortcut to copying host CPU
definition from capabilities XML into domain XML. Since the CPU
definition is copied just before starting a domain, exactly the same XML
can be used on different hosts while still providing the best guest CPU
each host supports."

You're telling me that that copying does not happen, which seems to be
the problematic part about this in my opinion.

So I am not sure if probing anything else (as you noted below) is the
correct thing to do. Copy it and you have a migration_safe and even
static version of the _current_ host CPU.

> 
> For determining the actual host-model the QMP command
> 'query-cpu-model-expansion' is used. This is done once per QEMU binary
> and the result of it is cached by libvirt. The problem with that is
> that libvirt queries with the newest machine type of the QEMU binary
> for the host CPU model. But now we realized that we have to consider
> the machine type of each domain for the determination of the host CPU
> model of a domain.
> 
> We could now either probe the host CPU model for each QEMU binary +
> machine type combination and for this we've to start a new QEMU
> process each time. Or we can add a QMP command/parameter that allows
> us to determine the host CPU model(s) with respect to the passed
> machine type and/or all supported machine types.
> 
> The QMP command query-cpu-model-expansion is currently described in
> qapi-schema.json as follows:
> 
> 
> ##
> # @query-cpu-model-expansion:
> #
> # Expands a given CPU model (or a combination of CPU model + additional 
> options)
> # to different granularities, allowing tooling to get an understanding what a
> # specific CPU model looks like in QEMU under a certain configuration.
> #
> # This interface can be used to query the "host" CPU model.
> #
> # The data returned by this command may be affected by:
> #
> # * QEMU version: CPU models may look different depending on the QEMU version.
> # (Except for CPU models reported as "static" in query-cpu-definitions.)
> # * machine-type: CPU model may look different depending on the machine-type.
> # (Except for CPU models reported as "static" in query-cpu-definitions.)
> # * machine options (including accelerator): in some architectures, CPU models
> # may look different depending on machine and accelerator options. (Except for
> # CPU models reported as "static" in query-cpu-definitions.)
> # * "-cpu" arguments and global properties: arguments to the -cpu option and
> # global properties may affect expansion of CPU models. Using
> # query-cpu-model-ex

Re: [libvirt] [Qemu-devel] [PATCH 0/9] i386: query-cpu-model-expansion test script

2017-01-20 Thread David Hildenbrand
Am 19.01.2017 um 18:45 schrieb Daniel P. Berrange:
> On Thu, Jan 19, 2017 at 06:21:22PM +0100, David Hildenbrand wrote:
>>
>>>> Also think about "query-cpu-model-expansion model=host type=static",
>>>> which will primarily be used by libvirt on s390x. There is no way to
>>>> expand this into a static cpu model. Faking anything will just hide errors.
>>>
>>> Yes, static expansion of host model must always return an error
>>> if it's not possible to expand.
>>>
>>>>
>>>> If "host" can't be expanded, QEMU has to be treated like there is no CPU
>>>> model support (as for older QEMU versions).
>>>
>>> OK. I will propose a patch updating the query-cpu-model-expansion
>>> documentation to be more explicit about it.
>>
>> The only real alternative I see would be disabling the query-cpu-model-*
>> interface completely if KVM support is not available.
>>
>> This would however mean, that the same QEMU binary would have the
>> interface when running under TCG, but not when running under KVM on an
>> old KVM version.
>>
>> That also doesn't really feel right, or what do you think?
> 
> Yeah that really isn't good.  query-cpu-model-* needs to work on TCG
> and *not* have a dependancy on KVM in that case, since you can be
> running TCG s390 on a x86_64 host, so the host CPU is totally irrelevant
> for TCG
> 

Actually what I meant was:

TCG: query-cpu-model-* interface always provided
KVM (with cpu model support): query-cpu-model-* interface provided
KVM (without cpu model support): no query-cpu-model-* interface provided

This would avoid having to report an error when expanding "host" in the
third case (KVM without cpu model support) but would lead to one QEMU
binary having a different set of supported qmp calls when called from
TCG and KVM.

> Regards,
> Daniel
> 


-- 

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH 0/9] i386: query-cpu-model-expansion test script

2017-01-19 Thread David Hildenbrand

>> Also think about "query-cpu-model-expansion model=host type=static",
>> which will primarily be used by libvirt on s390x. There is no way to
>> expand this into a static cpu model. Faking anything will just hide errors.
> 
> Yes, static expansion of host model must always return an error
> if it's not possible to expand.
> 
>>
>> If "host" can't be expanded, QEMU has to be treated like there is no CPU
>> model support (as for older QEMU versions).
> 
> OK. I will propose a patch updating the query-cpu-model-expansion
> documentation to be more explicit about it.

The only real alternative I see would be disabling the query-cpu-model-*
interface completely if KVM support is not available.

This would however mean, that the same QEMU binary would have the
interface when running under TCG, but not when running under KVM on an
old KVM version.

That also doesn't really feel right, or what do you think?

-- 

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH 0/9] i386: query-cpu-model-expansion test script

2017-01-18 Thread David Hildenbrand
Am 18.01.2017 um 18:34 schrieb Eduardo Habkost:
> On Wed, Jan 18, 2017 at 12:09:28PM -0500, Jason J. Herne wrote:
>> On 01/18/2017 12:00 PM, Eduardo Habkost wrote:
>>> On Tue, Jan 17, 2017 at 10:22:10AM -0500, Jason J. Herne wrote:
 On 01/16/2017 08:01 PM, Eduardo Habkost wrote:
> This is a follow-up to the series that implements
> query-cpu-model-expansion. Before including the test script, the
> series has some fixes to allow the results of
> query-cpu-model-expansion to be used in the QEMU command-line.
>
> The script probably will work on s390x too, but I couldn't test
> it yet.
>

 Eduardo,

 This test seems to mostly work on s390. The only issue I ran into is
 querying host model using tcg only. s390 requires kvm to query the host
 model. Perhaps we could just skip the tcg host test case on s390?
>>>
>>> We could still try to test "host", but add it to a greylist where
>>> errors returned by query-cpu-model-expansion can be non-fatal.
>>> query-cpu-model-expansion model="host" can also fail with KVM if
>>> the host doesn't support CPU models.
>>>
>>
>> David had the idea to just support -cpu host for tcg. We could do that.
>> In the meantime, I'm ok with your greylist idea too. This would allow the
>> script to work properly on s390 right from the start, and we can remove the
>> greylist when s390 supports specifying -cpu host for tcg.
> 
> I believe we will still need to ignore query-cpu-model-expansion
> errors on some cases, otherwise the test script will fail on
> hosts where KVM doesn't support CPU models in KVM.

That is indeed true. For "host" + KVM there would have to be an extra
check. non-fatal error sound right for this case (e.g. a warning)

> 
> But we probably don't need a hardcoded greylist, anyway: we could
> just make the error non-fatal in case the CPU model is not
> reported as migration-safe in query-cpu-definitions.
> 
> But I was wondering:
> 
> 1) Isn't "-cpu host" the default CPU model on s390x on KVM,
>even if the host doesn't support CPU models?

Yes, it has an inbuilt compatibility mode when specified. If KVM support
for cpu models is missing, using "-cpu host" will work (as it worked on
QEMU versions without cpu model support), doing something like "-cpu
host,vx=on" will not work, as modifying features is not possible (as the
interface for query/config is missing).

Using "host" for all query-cpu-model is forbiden, as we can't tell what
this model looks like.

> 
> 2) Is it really correct to return an error on
>"query-cpu-model-expansion model=host type=full" if the host
>doesn't support CPU models?
> 

Yes it is, because there is no way to tell which features there are.
Returning { name: "host", props: {} } would be misleading, as it
would mean that there are no features. Which is wrong. We just can't
tell. It is up to the caller to handle this. E.g. ignoring and
continuing, doing compatibility stuff or simply reporting an error.

Also think about "query-cpu-model-expansion model=host type=static",
which will primarily be used by libvirt on s390x. There is no way to
expand this into a static cpu model. Faking anything will just hide errors.

If "host" can't be expanded, QEMU has to be treated like there is no CPU
model support (as for older QEMU versions).

>What if it just returned { name: "host", props: {} }
>on those cases, meaning that the CPU model is valid and
>usable, but QEMU is unable to provide extra information about
>it.
> 


-- 

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH for-2.9 15/17] target-i386: Define static "base" CPU model

2016-12-20 Thread David Hildenbrand

Am 16.12.2016 um 17:02 schrieb Jiri Denemark:

On Mon, Dec 05, 2016 at 21:57:45 -0200, Eduardo Habkost wrote:

On Mon, Dec 05, 2016 at 07:18:47PM +0100, David Hildenbrand wrote:

Am 02.12.2016 um 22:18 schrieb Eduardo Habkost:

The query-cpu-model-expand QMP command needs at least one static
model, to allow the "static" expansion mode to be implemented.
Instead of defining static versions of every CPU model, define a
"base" CPU model that has absolutely no feature flag enabled.



Introducing separate ones makes feature lists presented to the user much
shorter (and therefore easier to maintain). But I don't know how libvirt
wants to deal with models on x86 in the future.


I understand that having a larger set of static models would make
expansions shorter. But I worry that by defining a complete set
of static models on x86 would require extra maintenance work on
the QEMU side with no visible benefit for libvirt.

I would like to hear from libvirt developers what they think. I
still don't know what they plan to use the type=static expansion
results for.


Currently we are mostly interested in the expansion of the "host" CPU
model. We're fine with the expansion based on the "basic" static model
with no features. Returning some real model instead of "basic" would be
OK as long as it would be one of the existing CPU models. Adding special
static models, such as Broadwell-base would actually be a complication


I agree, mixing names would be confusing. So if we would want to 
introduce static CPU models for x86 in QEMU, they would have to be named

exactly like the libvirt models and contain the exact same feature set.


since we would need to provide some translation to the existing models
for backward compatibility. I'd appreciate if we could avoid doing this.


Right and translation would only confuse people, especially if the CPU
models in libvirt and QEMU behave differently.



Jirka




--

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH for-2.9 15/17] target-i386: Define static "base" CPU model

2016-12-06 Thread David Hildenbrand

Am 06.12.2016 um 00:57 schrieb Eduardo Habkost:

On Mon, Dec 05, 2016 at 07:18:47PM +0100, David Hildenbrand wrote:

Am 02.12.2016 um 22:18 schrieb Eduardo Habkost:

The query-cpu-model-expand QMP command needs at least one static
model, to allow the "static" expansion mode to be implemented.
Instead of defining static versions of every CPU model, define a
"base" CPU model that has absolutely no feature flag enabled.



Introducing separate ones makes feature lists presented to the user much
shorter (and therefore easier to maintain). But I don't know how libvirt
wants to deal with models on x86 in the future.


I understand that having a larger set of static models would make
expansions shorter. But I worry that by defining a complete set
of static models on x86 would require extra maintenance work on
the QEMU side with no visible benefit for libvirt.


As static models will never change (theory) the maintenance work should 
be pretty much down to zero. But the initial implementation and comming 
up with the models requires work (my experience ;) ).


I am not against the "base" model (actually it is really pretty nice to 
have). Using only that somehow smells like the "user" cpu model 
discussion. Which might be ok for x86.




I would like to hear from libvirt developers what they think. I
still don't know what they plan to use the type=static expansion
results for.



How long is the static expansion on a recent intel CPU?


CPU model "Broadwell" returns 165 entries on return.model.props:

(QEMU) query-cpu-model-expansion type=static model={"name":"Broadwell"}



{"return": {"migration-safe": true, "model": {"name": "base", "props": {"pfthreshold": false, "pku": false, "rtm": true, "tsc-deadline": true, "xstore-en": false, "tsc-scale": false, "abm": true, "ia64": false, "kvm-mmu": false, "xsaveopt": true, "tce": false, "smep": true, "fpu": true, "xcrypt": false, "clflush": true, "flushbyasid": false, "kvm-steal-time": false, "lm": true, "tsc": true, "adx": true, "fxsr": true, "tm": false, 
"xgetbv1": false, "xstore": false, "vme": false, "vendor": "GenuineIntel", "arat": true, "de": true, "aes": true, "pse": true, "ds-cpl": false, "tbm": false, "sse": true, "phe-en": false, "f16c": true, "ds": false, "mpx": false, "tsc-adjust": false, "avx512f": false, "avx2": true, "pbe": false, "cx16": true, "avx512pf": false, "movbe": true, "perfctr-nb": false, "ospke": false, "avx512ifma": false, "stepping": 2, "sep": true, 
"sse4a": false, "avx512dq": false, "avx512-4vnniw": false, "xsave": true, "pmm": false, "hle": true, "est": false, "xop": f!

alse, "smx": false, "monitor": false, "avx512er": false, "apic": true, "sse4.1": true, "sse4.2": true, "pause-filter": false, "lahf-lm": true, "kvm-nopiodelay": false, "acpi": false, "mmx": true, "osxsave": false, "pcommit": false, "mtrr": true, "clwb": false, "dca": false, "pdcm": false, "xcrypt-en": false, "3dnow": false, "invtsc": false, "tm2": false, "hypervisor": true, "kvmclock-stable-bit": false, "fxsr-opt": 
false, "pcid": true, "lbrv": false, "avx512-4fmaps": false, "svm-lock": false, "popcnt": true, "nrip-save": false, "avx512vl": false, "x2apic": true, "kvmclock": false, "smap": true, "family": 6, "min-level": 13, "dtes64": false, "ace2": false, "fma4": false, "xtpr": false, "avx512bw": false, "nx": true, "lwp": false, "msr": true, "ace2-en": false, "decodeassists": false, "perfctr-core": false, "pge": true, 
"pn": false, "fma": true, "nodeid-msr": false, "cx8": true, "mce": true, "avx512cd": false, "cr8legacy": false, "mca": true, "pni": true, "rdseed": true, "osv!
w": false, "fsgsbase": true, "model-id": "Intel Core Processor (Broadw
e
ll)", 

Re: [libvirt] [Qemu-devel] [PATCH for-2.9 00/17] target-i386: Implement query-cpu-model-expansion

2016-12-05 Thread David Hildenbrand



Is static really needed? I can understand why migration-safe might be
of interest, but can't see how "static" could help (I mean we have
static expansion for this purpose). Do you have anything special in
mind regarding exposing "static"?


I didn't have any specific use case in mind. My main worry is to
avoid letting management software make any assumptions about the
returned data. If we don't add an explicit "static" field, a
client might make some wrong assumptions just because some
conditions are known to be always true on existing
implementations. e.g.: a client could incorrectly assume that
"query-cpu-model-expansion type=full" of a migration-safe model
would always be static.


I think "migration-safe" makes sense, as the doc currently states for
"@full": "The produced model is not guaranteed to be migration-safe".

For static expansion, this information is somewhat superfluous, as
"static CPU models are migration-safe", but it doesn't hurt. (and this
is also what you properly documented :) )

But I don't think that "static" will be of any use. If you want a
static model, go for "static" expansion. (I don't really see any use
case here, but if you do, keep it)



Something like that would be cool. Unfortunately, e.g. on s390x some
CPUID-like data (e.g. STFL(E) and SCP info) is only available from
kernel space. So we can't simply run a user space script. However,
something like a kernel module would be possible (or even something like the
s390x pc-bios to simply read and dump the data).


I meant a kernel binary, above. On x86, there are existing test
cases on the autotest repository that run a very small kernel[1]
that simply dumps CPUID data to the serial console. We could
reuse it, or we could add a CPUID command to the qtest protocol.
I'm not sure what's the best solution.

We can try to use a similar approach on s390x to reuse code in
the test script, but we can add arch-specific code to it if
necessary.


Ah, alright, so I wasn't aware of that because there is no autotest for
s390x (at least not that I know of ;) ). Thanks for the hint.



[1] https://github.com/autotest/tp-qemu/tree/master/qemu/deps/cpuid/src




--

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH for-2.9 00/17] target-i386: Implement query-cpu-model-expansion

2016-12-05 Thread David Hildenbrand

Am 02.12.2016 um 22:17 schrieb Eduardo Habkost:

This series implements query-cpu-model-expansion on target-i386.

QAPI / interface changes


When implementing this, I have noticed that the "host" CPU model
in i386 includes some migration-unsafe features that can't be
translated to any migration-safe representation: "pmu", and
"host-cache-info".

To be able to handle the migration-unsafe features, I have
extended the query-cpu-model-expansion definition to be clear
about what happens to those features when the CPU model is
expanded (in short: static expansion removes them, full expansion
keeps them).


I think this makes sense. The important part is to really keep static 
expansion static :)




I also added "static" and "migration-safe" fields to the return
value of query-cpu-model-expansion, so callers can know if the
the expanded representation is static and migration-safe.


Is static really needed? I can understand why migration-safe might be
of interest, but can't see how "static" could help (I mean we have
static expansion for this purpose). Do you have anything special in
mind regarding exposing "static"?



Test code
-

I have added a Python test script for the feature, that will try
multiple combinations of the expansion operation, and see if the
returned data keeps matches some constratins.

The test script works with the s390x query-cpu-model-expansion
command, except that: 1) I couldn't test it with KVM; 2) qtest.py
error handling when QEMU refuses to run is unreliable (so the
script needs runnability information to be availble in TCG mode,
too, to skip not-runnable CPU models and avoid problems).


Everything except "host" should behave completely the same on s390x
with or without KVM being active. So with !KVM tests we can already
cover most of the interesting parts. Thanks for taking care of s390x.



Future versions of the test script could run a arch-specific
CPUID-dump guest binary, and validate data seen by the guest
directly. While we don't do that, the script validates all QOM
properties on the CPU objects looking for unexpected changes. At
least in the case of x86, the QOM properties will include lots of
the CPUID data seen by the guest, giving us decent coverage.


Something like that would be cool. Unfortunately, e.g. on s390x some 
CPUID-like data (e.g. STFL(E) and SCP info) is only available from

kernel space. So we can't simply run a user space script. However,
something like a kernel module would be possible (or even something like 
the s390x pc-bios to simply read and dump the data).



--

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [RFC 00/15] qmp: Report supported device types on 'query-machines'

2016-11-22 Thread David Hildenbrand




Considered alternatives
===

Indirect mapping (machine => bus => device)
---

This RFC implements a mechanism to implement ax
  machine-type => supported-device-types
mapping. An alternative solution I considered was to expose an
indirect mapping:
  machine-type => default-bus-types
followed by
  bus-type => supported-device-types.

But exposing only the resulting supported device-types list
imposes less restrictions on how the device and bus type
hierarchy is implemented inside QEMU. There's still a
  machine-type => bus-type => device-type
mapping implemented internally, but it is an implementation
detail on the current version, and not part of the
externally-visible interface.

The Implementation
==

This add a new field to MachineClass: default_bus_types, and a
new field to BusClass: supported_device_type.


Is it possible to modify a machine (setting some properties e.g. on the
command line), that suddenly more devices are supported? Something like
enabling an additional bus? (I assume so, because it is called "default
bus types" :) )

If so, the indirect mapping could be of more benefit in the long run.

Thinking about my machine at home:

I just care about the available buses. If my machine doesn't have USB,
but PCI, I can buy a USB PCI card and make it support USB. Then I can
happily plug in USB devices. So the "default" state might at least
no longer be sufficient when wanting to plug in a USB fan on a hot
summer day ;) .

But, with the indirect mapping, I guess we would need yet another qmp
command ...

--

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] Cpu Modeling

2016-09-27 Thread David Hildenbrand
CCing Christian

> On Thu, Sep 22, 2016 at 14:47:36 -0400, Jason J. Herne wrote:
> > Testing for runability:
> > - Simply try to start QEMU with KVM, compat machine, CPU model  
> 
> Yes, if the domain XML explicitly requests a specific CPU model.
> Additionally, we need to make sure a CPU model chosen by libvirt
> (host-model) is runnable, but that should be handled by
> query-cpu-model-expansion.
> 
> > Identifying host model, e.g. "virsh capabilities"
> > - query-cpu-model-expansion on "host" with "-M none --enable-kvm"  
> 
> Right, except that it doesn't seem to work on x86_64 now.
> 
> It's not critical for query-cpu-model-expansion, but do we have an
> interface we can use to check whether the new commands are supported by
> a QEMU binary? Trying and checking for
> 
> {"error": {"class": "GenericError", "desc": "this feature or command
> is not currently supported"}}
> 
> is not really possible. At least we'd need a specific error class.
> 
> > :
> > - simply copy the identified host model  
> 
> Yes.
> 
> > :
> > - "-cpu host"  
> 
> Exactly.
> 
> ...
> > 1. We will invoke qemu to gather the host cpu data used for virsh
> > capabilities. Today this data seems to be collected directly from the
> > host hardware for most (all?) architectures.  
> 
> Not really, virsh capabilities will still contain data gathered directly
> from the host hardware. But, we have virsh domcapabilities for
> displaying data tight to a specific QEMU binary. Since yesterday
> afternoon we have support for displaying CPU related data in the domain
> capabilities XML; see
> http://libvirt.org/formatdomaincaps.html#elementsCPU
> 
> The host-model part of the XML will show the result of
> query-cpu-model-expansion on "host" model, or the result of querying the
> hardware directly if we can't ask QEMU for that (which is the current
> state).
> 
> > 2. virsh cpu-models {arch} will also use a Qemu invocation to gather
> > cpu model data.  
> 
> No, virsh cpu-models will just list CPU models supported by libvirt (or
> an empty list if libvirt supports all models supported by QEMU). The CPU
> model data from QEMU can be found in domain capabilities XML.
> 
> > 3. Most architectures seem to use a "map" (xml file containing cpu
> > model data that ships with Libvirt) to satisfy #1 and #2 above. Our
> > new method does not use this map as it gets all of the data directly
> > from Qemu.  
> 
> Even if we switch some CPU operations to work through QEMU, we may still
> need to use the cpu_map.xml file for some other operations, or other
> hypervisor driver. It all depends on what we need to do for a given
> architecture. For example, ARM does not use cpu_map.xml at all even now.
> 
> Slightly related, I don't think we have a way to list CPU features
> supported by QEMU over QMP, do we? "-cpu ?" will show them all, but I
> couldn't find a QMP command that would give me the same list.
> 
> > 4. virsh cpu-baseline and cpu-compare will now invoke qemu directly as
> > well.  
> 
> Perhaps, but before we can do that, we'll probably need to come up with
> new APIs. It don't think it's critical, though.
> 
> > Note: I'm not sure exactly how much all of this will apply just to
> > s390 with other architectures moving over to the new interface if/when
> > they want to. Or if we will want to change all architectures to this
> > new interface at the same time.  
> 
> Well, IIUC the new commands are not supported on any other architecture
> right now, are they? Anyway, I don't think we need to switch everything
> at the same time. If we have a way to detect what commands are supported
> for a specific QEMU binary, we can implement the code in libvirt and use
> when we can or falling back to the current way.
> 
> Jirka
> 



David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [RFC 00/28] s390x CPU models: exposing features

2016-07-11 Thread David Hildenbrand
> Are there any further comments, especially on patches 23-25, introducing new
> QOM interfaces?

Ping, can somebody please have a look, especially on patches 23-25?
We really want to know if we can proceed with this CPU model approach.

David

> 
> Also, if anybody is planning to look into this, please speak up :)
> 
> Otherwise it might make sense to put this onto the next KVM call agenda.
> 
> David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [RFC 00/28] s390x CPU models: exposing features

2016-06-30 Thread David Hildenbrand
ility into account.
> 
> As we are aware that e.g. x86 has their own idea of a CPU model and their
> existing implementation in place, but are also looking into to ways to e.g.
> expand the "host" CPU model to a detailed representation, we designed the
> "expansion" interface to also allow that.
> 
> Comments are very welcome, but please always keep the restrictions and
> specialties in mind when suggesting some major design changes.
> 
> The header update will be replaced by a kvm-next header update as soon as
> the VSIE patches are upstream. The major KVM interface changes are already
> part of kvm-next.
> 
> The current state is available on git://github.com/cohuck/qemu on branch
> "cpumodel-s390x".
> 
> David Hildenbrand (26):
>   s390x/cpumodel: "host" and "qemu" as CPU subclasses
>   s390x/cpumodel: expose CPU class properties
>   s390x/cpumodel: generate CPU feature group lists
>   s390x/cpumodel: introduce CPU feature group definitions
>   s390x/cpumodel: register defined CPU models as subclasses
>   s390x/cpumodel: store the CPU model in the CPU instance
>   s390x/cpumodel: expose features and feature groups as properties
>   s390x/cpumodel: let the CPU model handle feature checks
>   s390x/cpumodel: check and apply the CPU model
>   s390x/sclp: factor out preparation of cpu entries
>   s390x/sclp: introduce sclp feature blocks
>   s390x/sclp: indicate sclp features
>   s390x/sclp: propagate the ibc val(lowest and unblocked ibc)
>   s390x/sclp: propagate the mha via sclp
>   s390x/sclp: propagate hmfai
>   update linux headers (CPU model)
>   s390x/kvm: allow runtime-instrumentation for "none" machine
>   s390x/kvm: implement CPU model support
>   s390x/kvm: disable host model for existing compat machines
>   s390x/kvm: let the CPU model control CMM(A)
>   qmp: add QMP interface "query-cpu-model-expansion"
>   qmp: add QMP interface "query-cpu-model-comparison"
>   qmp: add QMP interface "query-cpu-model-baseline"
>   s390x/cpumodel: implement QMP interface "query-cpu-model-expansion"
>   s390x/cpumodel: implement QMP interface "query-cpu-model-comparison"
>   s390x/cpumodel: implement QMP interface "query-cpu-model-baseline"
> 
> Michael Mueller (2):
>   s390x/cpumodel: introduce CPU features
>   s390x/cpumodel: generate CPU feature lists for CPU models
> 
>  Makefile.target |2 +-
>  hw/s390x/s390-virtio-ccw.c  |5 +
>  hw/s390x/s390-virtio.c  |6 +-
>  hw/s390x/sclp.c |   35 +-
>  include/hw/s390x/sclp.h |   17 +-
>  include/sysemu/arch_init.h  |   10 +
>  linux-headers/asm-s390/kvm.h|   40 ++
>  qapi-schema.json|  184 ++
>  qmp-commands.hx |   18 +
>  qmp.c   |   22 +
>  rules.mak   |1 +
>  stubs/Makefile.objs |3 +
>  stubs/arch-query-cpu-model-baseline.c   |   13 +
>  stubs/arch-query-cpu-model-comparison.c |   12 +
>  stubs/arch-query-cpu-model-expansion.c  |   12 +
>  target-s390x/Makefile.objs  |   22 +-
>  target-s390x/cpu-qom.h  |5 +
>  target-s390x/cpu.c  |   35 +-
>  target-s390x/cpu.h  |5 +
>  target-s390x/cpu_features.c |  376 +++
>  target-s390x/cpu_features.h |  302 +
>  target-s390x/cpu_models.c   | 1055 
> +++
>  target-s390x/cpu_models.h   |  113 
>  target-s390x/gen-features.c |  587 +
>  target-s390x/helper.c   |   29 +-
>  target-s390x/kvm.c  |  346 +-
>  target-s390x/machine.c  |   14 +-
>  27 files changed, 3203 insertions(+), 66 deletions(-)
>  create mode 100644 stubs/arch-query-cpu-model-baseline.c
>  create mode 100644 stubs/arch-query-cpu-model-comparison.c
>  create mode 100644 stubs/arch-query-cpu-model-expansion.c
>  create mode 100644 target-s390x/cpu_features.c
>  create mode 100644 target-s390x/cpu_features.h
>  create mode 100644 target-s390x/cpu_models.c
>  create mode 100644 target-s390x/cpu_models.h
>  create mode 100644 target-s390x/gen-features.c
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC 00/28] s390x CPU models: exposing features

2016-06-23 Thread David Hildenbrand
> On Wed, Jun 22, 2016 at 09:54:51 +0200, David Hildenbrand wrote:
> > > On Wed, Jun 22, 2016 at 09:34:49 +0200, David Hildenbrand wrote:  
> > > > I think the coffee didn't do its work already :) . I wanted to write 
> > > > that we can
> > > > _with_ this additional query. Meaning the involved overhead would be ok 
> > > > - in my
> > > > opinion for s390x.
> > > > 
> > > > What we could do to avoid one compare operation would be:
> > > > 
> > > > a) Expand the host model
> > > > b) Expand the target model (because on s390x we could have migration 
> > > > unsafe
> > > > model)
> > > > c) Work with the runnability information returned via 
> > > > query-cpu-definitions
> > > > 
> > > > But as we have to do b) either way on s390x, we can directly do a 
> > > > compare
> > > > operation. (which makes implementation a lot simpler, because libvirt 
> > > > then
> > > > doesn't have to deal with any feature/model names).
> > > 
> > > But why do you even need to do any comparison? Isn't it possible to let
> > > QEMU do it when a domain starts? The thing is we should avoid doing
> > > completely different things on each architecture.
> > >   
> > 
> > Sure, QEMU will of course double check when starting the guest! So trying to
> > start and failing is of course an option! So no check is needed if that is
> > acceptable.  
> 
> Yeah, I think it's the safest and easiest option now.
> 
> Jirka
> 

Alright then, this RFC already handles that properly, so that seems to be
solved. The question now is if you guys see a fundamental problem in the way we
want to handle CPU models.

Especially
a) Having flexible, not migration safe CPU models that can be expanded to
migration safe models (using the expansion interface).

b) Letting QEMU carry out the task of comparing and baselining to be used
for e.g. for "virsh cpu-baseline" or "virsh cpu-compare".

c) Indicating the host model as the expansion of "-cpu host", e.g. for "virsh
capabilities" (which says "host" for now for us).

Also, it will be good to know if the "expansion" interface with parameters
"full" or "migratable" is really helpful to you or if I should drop that and
you will come up with an own "query-host-cpu".

We are also planning to implement the "query-cpu-definitions" runnability
information in the future, because as you said, it might be a good way to
quickly indicate runnable CPU models. But we are most likely not going to use it
for e.g. comparing or baselining or detection of runnability of a more complex
cpu model (having a lot of feature changes).

Thanks!

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC 00/28] s390x CPU models: exposing features

2016-06-23 Thread David Hildenbrand

> Question: KVM supports x2apic (and enables it by default) on any
> host CPU, because it is all emulated by KVM. Should "host-model"
> include x2apic on all hosts, or only when the host CPU has it?
> ("-cpu host" does include it).
> 
> Including x2apic sounds more useful, but it doesn't match the
> "get as close to the host CPU as possible" part of your
> description.
> 

>From a s390x point of view, our "host" model is about indicating the maximum
possible configuration we are able to virtualize, not strictly limiting it only
to the real host features.

We also have a feature called "sthyi", emulated by KVM, which might not be
around for the real "host". We will include that in "host", because otherwise
things seem to get really complicated with no obvious benefit (to me :) ).

Not sure how you want to handle that.

David


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC 00/28] s390x CPU models: exposing features

2016-06-22 Thread David Hildenbrand
> On Wed, Jun 22, 2016 at 09:34:49 +0200, David Hildenbrand wrote:
> > I think the coffee didn't do its work already :) . I wanted to write that 
> > we can
> > _with_ this additional query. Meaning the involved overhead would be ok - 
> > in my
> > opinion for s390x.
> > 
> > What we could do to avoid one compare operation would be:
> > 
> > a) Expand the host model
> > b) Expand the target model (because on s390x we could have migration unsafe
> > model)
> > c) Work with the runnability information returned via query-cpu-definitions
> > 
> > But as we have to do b) either way on s390x, we can directly do a compare
> > operation. (which makes implementation a lot simpler, because libvirt then
> > doesn't have to deal with any feature/model names).  
> 
> But why do you even need to do any comparison? Isn't it possible to let
> QEMU do it when a domain starts? The thing is we should avoid doing
> completely different things on each architecture.
> 

Sure, QEMU will of course double check when starting the guest! So trying to
start and failing is of course an option! So no check is needed if that is
acceptable.

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC 00/28] s390x CPU models: exposing features

2016-06-22 Thread David Hildenbrand

> > > > 2) Requiring a running QEMU instance to run
> > > >query-cpu-model-comparison
> > > > 
> > > > With my previous query-host-cpu proposal, the task of comparing
> > > > the configuration requested by the user with host capabilities
> > > > can be done directly by libvirt. This way, no extra QEMU instance
> > > > needs to be run before starting a VM.
> > > 
> > > I think we can just easily get around this by not comparing a guest CPU
> > > to host (except for the explicit virConnectCompareCPU, which is not very
> > > useful in its current form anyway).  
> > 
> > If there is some flexible way around that, great. But I think we (s390x) 
> > could
> > life without this additional query.  
> 
> So if I understand correctly, you say you don't need the API to compare
> guest CPU to host CPU, right? If so, that's exactly what I said too.
> 

I think the coffee didn't do its work already :) . I wanted to write that we can
_with_ this additional query. Meaning the involved overhead would be ok - in my
opinion for s390x.

What we could do to avoid one compare operation would be:

a) Expand the host model
b) Expand the target model (because on s390x we could have migration unsafe
model)
c) Work with the runnability information returned via query-cpu-definitions

But as we have to do b) either way on s390x, we can directly do a compare
operation. (which makes implementation a lot simpler, because libvirt then
doesn't have to deal with any feature/model names).

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC 00/28] s390x CPU models: exposing features

2016-06-22 Thread David Hildenbrand
> On Tue, Jun 21, 2016 at 18:22:30 -0300, Eduardo Habkost wrote:
> > On Tue, Jun 21, 2016 at 11:09:49PM +0200, Jiri Denemark wrote:
> > [...]  
> > > > 1) "query-cpu-model-expansion model=host" vs "query-host-cpu":
> > > > 
> > > > I still don't think we want to set in stone that "the result the
> > > > guest sees when using -cpu host" is always the same as "what the
> > > > host supports running".
> > > > 
> > > > For example: let's assume a given architecture have two features
> > > > (A and B) that are both supported by the host but can never be
> > > > enabled together. For actual "-cpu host" usage, QEMU would have
> > > > to choose between enabling A and B. For querying host
> > > > capabilities, we still want to let management software know that
> > > > either A or B are supported.  
> > > 
> > > What libvirt is really interested in is the guest CPU which would be
> > > used with -cpu host. This is actually what I thought query-host-cpu was
> > > all about. Perhaps because there's no difference for x86.  
> > 
> > In that case, I think it makes sense to just extend
> > query-cpu-definitions or use "query-cpu-model-expansion
> > model=host" instead of a query-host-cpu command.
> > 
> > Probably query-cpu-model-expansion is better than just extending
> > query-cpu-definitions, because it would allow the expansion of
> > extra CPU options, like "host,migratable=off".  
> 
> Yeah, this would be even better.
> 
> Jirka
> 

Please be aware that we don't have anything like that on s390x, but
I prepared for that requirement by being able to tell
query-cpu-model-expansion how to expand (full, migratable, stable).
Actually full and migratable looks the same on s390x.

The plan for us is to rely on "stable" most of the time, full and migratable
might be what you're looking for.

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC 00/28] s390x CPU models: exposing features

2016-06-22 Thread David Hildenbrand
> On Tue, Jun 21, 2016 at 17:33:09 -0300, Eduardo Habkost wrote:
> > On Tue, Jun 21, 2016 at 07:01:44PM +0200, David Hildenbrand wrote:  
> > > > (CCing libvirt people)
> > > > 
> > > > On Tue, Jun 21, 2016 at 03:02:05PM +0200, David Hildenbrand wrote:  
> > > > > This is our second attempt to implement CPU models for s390x. We 
> > > > > realized
> > > > > that we also want to have features exposed via the CPU model. While 
> > > > > doing
> > > > > that we realized that we want to have a better interface for libvirt. 
> > > > >
> > > > 
> > > > Before getting into the details, I would like to clarify how the
> > > > following could be accomplished using the new commands:
> > > > 
> > > > Example:
> > > > 
> > > > 1) User configures libvirt with:
> > > >
> > > >Westmere
> > > >
> > > >
> > > > 2) libvirt will translate that to:
> > > >"-cpu Westmere,+aes" or "-cpu Westmere,aes=on"
> > > > 3) libvirt wants to know if "-cpu Westmere,aes=on" is usable in
> > > >the current host, before trying to start the VM.
> > > > 
> > > > How exactly would this be done using the new commands?  
> > > 
> > > Hi Eduardo,
> > > 
> > > thanks for having a look - highly appreciated that you actually map this
> > > to libvirt requirements!
> > > 
> > > That would map to a compare operation between "host" and 
> > > "Westmere,aes=on".
> > > 
> > > Host could at that point already be expanded by libvirt. Doesn't matter 
> > > at that
> > > point.
> > > 
> > > If the result is "identica"l or "superset", it is runnable. If the result 
> > > is
> > > "subset" or "incompatible", details about the responsible properties is
> > > indicated. (I actually took that idea from your patch for indicating
> > > runnability).  
> > 
> > So, I have two worries about the proposal:
> > 
> > 
> > 1) "query-cpu-model-expansion model=host" vs "query-host-cpu":
> > 
> > I still don't think we want to set in stone that "the result the
> > guest sees when using -cpu host" is always the same as "what the
> > host supports running".
> > 
> > For example: let's assume a given architecture have two features
> > (A and B) that are both supported by the host but can never be
> > enabled together. For actual "-cpu host" usage, QEMU would have
> > to choose between enabling A and B. For querying host
> > capabilities, we still want to let management software know that
> > either A or B are supported.  
> 
> What libvirt is really interested in is the guest CPU which would be
> used with -cpu host. This is actually what I thought query-host-cpu was
> all about. Perhaps because there's no difference for x86.

That's also what I had in mind first. Let me explain why they are not the same
on s390x and why it is problematic (actually both types are not the same!):

1. Certain CPU features are only valid for LPAR guests, they can't be
virtualized for KVM guests (remember that we always have at least one level of
virtualization on s390x). So we will never be able to provide these features to
a KVM guest, therefore we will never be able to completely mimic the real host
model.

2. We have a whole lot of unpublished features. We would have to include them in
the "real host model", but we can't. For the "host" model, this is not a
problem, because we simply don't virtualize them. But the "real host model"
would then vary between say QEMZ versions, which looks really strage, because
in essance, QEMU/KVM looks like the wrong interface to query for a "real host
model".

3. We don't have the kernel interfaces in place to query the "real host model".
We can only query what we are able to virtualize. And that is indeed different
compared to what I said previously.

And as I can't see any use case for s390x, we most probably will never be able
to support the interface you are suggesting here. Which is fine, if it makes
sense for x86.

> 
> > 2) Requiring a running QEMU instance to run
> >query-cpu-model-comparison
> > 
> > With my previous query-host-cpu proposal, the task of comparing
> > the configuration requested by the user with host capabilities
> > can be done directly by libvirt. This way, no extra QEMU instance
> > needs to be run before starting a VM.  
> 
> I think we can just easily get around this by not comparing a guest CPU
> to host (except for the explicit virConnectCompareCPU, which is not very
> useful in its current form anyway).

If there is some flexible way around that, great. But I think we (s390x) could
life without this additional query.


So your first concern is valid, and if you really need this information, you
should probably really create a new interface. Your second concern should not
be trivial for now. We don't want to optimize for total speed at that point.

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC 00/28] s390x CPU models: exposing features

2016-06-22 Thread David Hildenbrand
> On Tue, Jun 21, 2016 at 13:44:31 -0300, Eduardo Habkost wrote:
> > (CCing libvirt people)
> > 
> > On Tue, Jun 21, 2016 at 03:02:05PM +0200, David Hildenbrand wrote:
> > > This is our second attempt to implement CPU models for s390x. We realized
> > > that we also want to have features exposed via the CPU model. While doing
> > > that we realized that we want to have a better interface for libvirt.
> > 
> > Before getting into the details, I would like to clarify how the
> > following could be accomplished using the new commands:
> > 
> > Example:
> > 
> > 1) User configures libvirt with:
> >
> >Westmere
> >
> >
> > 2) libvirt will translate that to:
> >"-cpu Westmere,+aes" or "-cpu Westmere,aes=on"
> > 3) libvirt wants to know if "-cpu Westmere,aes=on" is usable in
> >the current host, before trying to start the VM.
> 
> While this is what libvirt currently does, I don't think it's necessary
> to keep doing that... see below.
> 
> > > a) "query-cpu-model-expansion" - tell us what the "host" or a migration
> > >unsafe model looks like. Either falling back to a stable model or
> > >completely exposing all properties. We are interested in stable models.
> > > b) "query-cpu-model-comparison" - tell us how two CPU models compare,
> > > indicating which properties were responsible for the decision.
> > > c) "query-cpu-model-baseline" - create a new model out of two models,
> > > taking a requested level of stability into account.
> 
> This looks like it copies current libvirt APIs, which I think is not a
> very good idea. Both CPU compare and baseline APIs in libvirt will need
> to be changed (or rather new APIs with similar functionality added) to
> become useful. I think we should first focus on making guest CPU
> configuration work the way it should work. For that I think we need some
> higher level commands.

Hi Jiri,

thanks for your reply!

Please be aware that we (s390x) have no libvirt cpu model support in place. We
are coming up with a solution that solves our problems, but instead of tagging
all interfaces with "s390x" we want interfaces that everybody might use.

I understand that x86 has some very advanced libvirt CPU model magic in place.
You can still stick to what you have right now. It doesn't work for us.

So we provide 3 basic functions (baseline, compare, expand) that can easily by
added for s390x into the CPU model. So I don't see a problem introducing
these operations in libvirt - it just works without any magic and knowledge
about features and models in libvirt.

In the end it keeps the CPU model where it belongs (QEMU) and allows us to
just fit in fine into the libvirt CPU model world that was heavily inspired
by x86 (an I can tell you, that is a very challenging task).

But - of course - we have to do some additional queries on a QEMU instance. I
don't think that this is a problem (at least not for s390x).

> 
> Let me sum up what libvirt is doing (or will be doing) with guest
> CPUs... Libvirt supports three guest CPU configuration modes:

One important thing to mention here is that all of these was inspired by x86.
Some stuff doesn't even make sense on s390x ("mimic the real host"), as
it would even in practice never be possible. See my reply to Eduards patchset
I'll post later.

We want to focus on the real use cases and I think they are
- Making sure models ("Guest API") doesn't change during migration
- Allowing the user to use a certain CPU model
- Baselining/Comparing/Using the -cpu host model


> 
> - host-passthrough -- this is easy, it's just asking for "-cpu host" and
>   no fancy commands are required.

This should map to expanding the host model in our case.

> 
> - host-model -- for this we need to know what CPU configuration we need
>   to ask for to get as close to the host CPU as possible while being
>   able to ask for the exact same CPU on a different host (after
>   migration). And we need to be able to ask for this at probing time
>   (when libvirtd starts rather than when starting a new domain) using
>   "-machine none,accel=kvm:tcg", that is without actually creating any
>   guest CPU. This is what Eduardo's query-host-cpu QMP command is useful
>   for. In x86 world we could just query the guest CPU after running QEMU
>   with "-cpu host", but that's unusable with "-machine none", which is
>   why another way of getting this info is needed.

Again, we will be relying on "-cpu host" here, because the real host model
will never be possible to be fordwarded on s390x. And what's the 

Re: [libvirt] [RFC 00/28] s390x CPU models: exposing features

2016-06-21 Thread David Hildenbrand
> (CCing libvirt people)
> 
> On Tue, Jun 21, 2016 at 03:02:05PM +0200, David Hildenbrand wrote:
> > This is our second attempt to implement CPU models for s390x. We realized
> > that we also want to have features exposed via the CPU model. While doing
> > that we realized that we want to have a better interface for libvirt.  
> 
> Before getting into the details, I would like to clarify how the
> following could be accomplished using the new commands:
> 
> Example:
> 
> 1) User configures libvirt with:
>
>Westmere
>
>
> 2) libvirt will translate that to:
>"-cpu Westmere,+aes" or "-cpu Westmere,aes=on"
> 3) libvirt wants to know if "-cpu Westmere,aes=on" is usable in
>the current host, before trying to start the VM.
> 
> How exactly would this be done using the new commands?

Hi Eduardo,

thanks for having a look - highly appreciated that you actually map this
to libvirt requirements!

That would map to a compare operation between "host" and "Westmere,aes=on".

Host could at that point already be expanded by libvirt. Doesn't matter at that
point.

If the result is "identica"l or "superset", it is runnable. If the result is
"subset" or "incompatible", details about the responsible properties is
indicated. (I actually took that idea from your patch for indicating
runnability).

Thanks!

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0/3] qmp: query-host-cpu command

2016-06-21 Thread David Hildenbrand
> On Tue, Jun 21, 2016 at 02:52:54PM +0200, David Hildenbrand wrote:
> > > On Tue, Jun 21, 2016 at 08:20:40AM +0200, David Hildenbrand wrote:  
> > > > > Add QMP command to allow management software to query for
> > > > > CPU information for the running host.
> > > > > 
> > > > > The data returned by the command is in the form of a dictionary
> > > > > of QOM properties.
> > > > > 
> > > > > This series depends on the "Add runnability info to
> > > > > query-cpu-definitions" series I sent 2 weeks ago.
> > > > > 
> > > > > Git tree:
> > > > >   https://github.com/ehabkost/qemu-hacks.git work/query-host-cpu
> > > > > 
> > > > 
> > > > I like that interface, I'm going to post (maybe today? :) ) a similar 
> > > > interface
> > > > that allows to also expand other cpu models, not just the host model.   
> > > >  
> > > 
> > > In x86 I want to avoid exposing the details of other CPU models
> > > to libvirt because the details depend on machine-type.
> > > 
> > > But if it is useful for you, I believe the same "qom-properties"
> > > dict could be returned in query-cpu-definitions.
> > >   
> > > > 
> > > > Maybe we can then decide which one makes sense for all of us. But in 
> > > > general,
> > > > this interface is much better compared to what we had before. 
> > > 
> > > Maybe both? I think it's better to have a separate interface for
> > > querying "what exactly this host supports" and another one for
> > > querying for "what happens if I use -cpu host". In the case of
> > > x86, both are equivalent, but we can't guarantee this on all
> > > architectures.
> > >   
> > 
> > I'll post my patches in a couple of minutes, let's discuss it
> > then.
> > 
> > We might want to avoid having multiple interfaces carrying out
> > the same task.  
> 
> OK, I will wait for the patches before discussing it. My
> assumption is that both look similar, but are actually different
> tasks.
> 

For us, also "-cpu host" and querying the host model is identical. Not sure
if there would for now really be a requirement for some other architecture
to handle this differently. Do you have anything specific already in mind?

But let's see if it indeed makes sense to split this up after looking at
our interface proposal.

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0/3] qmp: query-host-cpu command

2016-06-21 Thread David Hildenbrand
> On Tue, Jun 21, 2016 at 08:20:40AM +0200, David Hildenbrand wrote:
> > > Add QMP command to allow management software to query for
> > > CPU information for the running host.
> > > 
> > > The data returned by the command is in the form of a dictionary
> > > of QOM properties.
> > > 
> > > This series depends on the "Add runnability info to
> > > query-cpu-definitions" series I sent 2 weeks ago.
> > > 
> > > Git tree:
> > >   https://github.com/ehabkost/qemu-hacks.git work/query-host-cpu
> > >   
> > 
> > I like that interface, I'm going to post (maybe today? :) ) a similar 
> > interface
> > that allows to also expand other cpu models, not just the host model.  
> 
> In x86 I want to avoid exposing the details of other CPU models
> to libvirt because the details depend on machine-type.
> 
> But if it is useful for you, I believe the same "qom-properties"
> dict could be returned in query-cpu-definitions.
> 
> > 
> > Maybe we can then decide which one makes sense for all of us. But in 
> > general,
> > this interface is much better compared to what we had before.   
> 
> Maybe both? I think it's better to have a separate interface for
> querying "what exactly this host supports" and another one for
> querying for "what happens if I use -cpu host". In the case of
> x86, both are equivalent, but we can't guarantee this on all
> architectures.
> 

I'll post my patches in a couple of minutes, let's discuss it then.

We might want to avoid having multiple interfaces carrying out the same task.

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0/3] qmp: query-host-cpu command

2016-06-21 Thread David Hildenbrand
> Add QMP command to allow management software to query for
> CPU information for the running host.
> 
> The data returned by the command is in the form of a dictionary
> of QOM properties.
> 
> This series depends on the "Add runnability info to
> query-cpu-definitions" series I sent 2 weeks ago.
> 
> Git tree:
>   https://github.com/ehabkost/qemu-hacks.git work/query-host-cpu
> 

I like that interface, I'm going to post (maybe today? :) ) a similar interface
that allows to also expand other cpu models, not just the host model.

Maybe we can then decide which one makes sense for all of us. But in general,
this interface is much better compared to what we had before. 

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH 7/9] qmp: Add runnability information to query-cpu-definitions

2016-06-03 Thread David Hildenbrand

> It is not exactly a special case (it is a read-only property like
> any other), but it's worth mentioning. I will change it to:
> 
> # If the QOM property is read-only, that means there's no known
> # way to make the CPU model run in the current host. If
> # absolutely no extra information will be returned to explain why
> # the CPU model is not runnable, implementations may simply
> # return "type" as the property name.
> 
> >   
> > > +# If @unavailable-features is an empty list, the CPU model is
> > > +# runnable using the current host and machine-type.
> > > +# If @unavailable-features is not present, runnability
> > > +# information for the CPU model is not available.
> > >  #
> > >  # Since: 1.2.0
> > >  ##  
> > 
> > I'm happy with this interface.  Thanks!  
> 
> Thanks!
> 

Yes, sounds also good to me. For "hw generation too new" and similar errors
we will simply return "type". Thanks.

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH 7/9] qmp: Add runnability information to query-cpu-definitions

2016-05-12 Thread David Hildenbrand

> Updated to:
> 
> ##
> # @CpuDefinitionInfo:
> #
> # Virtual CPU definition.
> #
> # @name: the name of the CPU definition
> # @runnable: #optional. whether the CPU model us usable with the
> #current machine and accelerator. Omitted if we don't
> #know the answer. (since 2.7)
> # @unavailable-features: List of attributes that prevent the CPU

"List of properties" ?

> #model from running in the current host.
> #(since 2.7)
> #
> # @unavailable-features is a list of QOM property names that
> # represent CPU model attributes that prevent the CPU from running.
> # If the QOM property is read-only, that means the CPU model can
> # never run in the current host. If the property is read-write, it
> # means that it MAY be possible to run the CPU model in the current
> # host if that property is changed. Management software can use it
> # as hints to suggest or choose an alternative for the user, or
> # just to generate meaningful error messages explaining why the CPU
> # model can't be used.
> #
> # Since: 1.2.0
> ##
> 

what about changing unavailable-features to

problematic-properties
responsible-properties

... anything else making clear that we are dealing with properties, not
only features?

Apart from that, sounds good to me.

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 7/9] qmp: Add runnability information to query-cpu-definitions

2016-05-10 Thread David Hildenbrand

> > Yes, I think so. However to really make good hints, upper layers would most
> > likely need more information about the exact problem with a property -
> > maybe something like an enum value per problematic property.
> > (UNAVAILABLE_FEATURE, VALUE_TOO_BIG, VALUE_TOO_SMALL, UNSUPPORTED_VALUE) 
> > ...  
> 
> If a more powerful interface is needed, we can add extra fields,
> yes. Although I'm not sure we really start providing that level
> of detail in a generic way (I guess it will depend on how much
> arch-independent code libvirt will use to handle runnability
> information).

And I would actually (later) prefer to have something like
bool too_new; (name tbd)

to cover the cpu-generation problem instead of having to expose read-only
properties just for the sake of communicating that a cpu model is simply too new
and cannot be made runnable toggling features.

> 
> >   
> > > > > 
> > > > > We could replace this with something more generic, like:
> > > > > 
> > > > > @runnability-blockers: List of attributes that prevent the CPU
> > > > >   model from running in the current host.
> > > > >   
> > > > >   A list of QOM property names that represent CPU model
> > > > >   attributes that prevent the CPU from running. If the QOM
> > > > >   property is read-only, that means the CPU model can never run
> > > > >   in the current host. If the property is read-write, it means
> > > > >   that it MAY be possible to run the CPU model in the current
> > > > >   host if that property is changed.
> > > > >   
> > > > >   Management software can use it as hints to suggest or choose an
> > > > >   alternative for the user, or just to generate meaningful error
> > > > >   messages explaining why the CPU model can't be used.
> > > > > 
> > > > > (I am looking for a better name than "runnability-blockers").
> > > > > 
> > 
> > Not sure which approach would be better.  
> 
> Note that this is only a change in documentation of the new
> field, to allow it to cover extra cases without any changes in
> the interface.
> 
> I also don't like the "runnability-blockers" name, but I prefer
> the new documentation I wrote above because it is more explicit
> about what exactly the field means. I plan to keep the
> "unavailable-features" name but use the above documentation text
> in the next version. Does that sound OK?
> 

I like the read-only part about that, but still you should somehow clarify that
we are dealing with boolean properties a.k.a features. At least that's my
opinion.

Apart from that, this is fine with me!

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 7/9] qmp: Add runnability information to query-cpu-definitions

2016-05-09 Thread David Hildenbrand

> > > > 
> > > > Just FYI, on other architectures (e.g. s390x), other conditions (e.g. 
> > > > cpu
> > > > generation) also define if a CPU model is runnable, so the pure 
> > > > availability of
> > > > features does not mean that a cpu model is runnable.
> > > > 
> > > > We could have runnable=false and unavailable-features being an empty 
> > > > list.
> > > 
> > > Even on those cases, can't the root cause be mapped to a QOM
> > > property name (e.g. "cpu-generation"), even if it's property that
> > > can't be changed by the user?  
> > 
> > In the current state, no.  
> 
> But it could be implemented by s390x in the future, if it's
> considered useful, right?

Yes, we could fit that into read-only properties if we would ever need it
(like cpu-generation you mentioned and cpu-ga-level - both will always be
read-only).

However we could come up with more optional fields for that in the future.
(like unsupported-values or sth. like that). I actually prefer
unavailable-features over runnability-blockers.

> 
> > 
> > So I think for runnable=false:
> > a) unavailable-features set -> can be made runnable
> > b) unavailable-features not set -> cannot be made runnable
> > 
> > would be enough.  
> 
> I understand it would be enough, but I would like to at least
> define semantics that would make sense for all architectures in
> case it gets implemented in the future. Would the proposal below
> make sense?
> 

Yes, I think so. However to really make good hints, upper layers would most
likely need more information about the exact problem with a property -
maybe something like an enum value per problematic property.
(UNAVAILABLE_FEATURE, VALUE_TOO_BIG, VALUE_TOO_SMALL, UNSUPPORTED_VALUE) ...

> > > 
> > > We could replace this with something more generic, like:
> > > 
> > > @runnability-blockers: List of attributes that prevent the CPU
> > >   model from running in the current host.
> > >   
> > >   A list of QOM property names that represent CPU model
> > >   attributes that prevent the CPU from running. If the QOM
> > >   property is read-only, that means the CPU model can never run
> > >   in the current host. If the property is read-write, it means
> > >   that it MAY be possible to run the CPU model in the current
> > >   host if that property is changed.
> > >   
> > >   Management software can use it as hints to suggest or choose an
> > >   alternative for the user, or just to generate meaningful error
> > >   messages explaining why the CPU model can't be used.
> > > 
> > > (I am looking for a better name than "runnability-blockers").
> > >   

Not sure which approach would be better.

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 7/9] qmp: Add runnability information to query-cpu-definitions

2016-05-09 Thread David Hildenbrand
> On Mon, May 09, 2016 at 10:54:53AM +0200, David Hildenbrand wrote:
> > > Extend query-cpu-definitions schema to allow it to return two new
> > > optional fields: "runnable" and "unavailable-features".
> > > "runnable" will tell if the CPU model can be run in the current
> > > host. "unavailable-features" will contain a list of CPU
> > > properties that are preventing the CPU model from running in the
> > > current host.
> > > 
> > > Cc: David Hildenbrand <d...@linux.vnet.ibm.com>
> > > Cc: Michael Mueller <m...@linux.vnet.ibm.com>
> > > Cc: Christian Borntraeger <borntrae...@de.ibm.com>
> > > Cc: Cornelia Huck <cornelia.h...@de.ibm.com>
> > > Cc: Jiri Denemark <jdene...@redhat.com>
> > > Cc: libvir-list@redhat.com
> > > Signed-off-by: Eduardo Habkost <ehabk...@redhat.com>
> > > ---
> > >  qapi-schema.json | 10 +-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/qapi-schema.json b/qapi-schema.json
> > > index 54634c4..450e6e7 100644
> > > --- a/qapi-schema.json
> > > +++ b/qapi-schema.json
> > > @@ -2948,11 +2948,19 @@
> > >  # Virtual CPU definition.
> > >  #
> > >  # @name: the name of the CPU definition
> > > +# @runnable: true if the CPU model is runnable using the current
> > > +#machine and accelerator. Optional. Since 2.6.
> > > +# @unavailable-features: List of properties that prevent the CPU
> > > +#model from running in the current host,
> > > +#if @runnable is false. Optional.
> > > +#Since 2.6.  
> > 
> > Just FYI, on other architectures (e.g. s390x), other conditions (e.g. cpu
> > generation) also define if a CPU model is runnable, so the pure 
> > availability of
> > features does not mean that a cpu model is runnable.
> > 
> > We could have runnable=false and unavailable-features being an empty list.  
> 
> Even on those cases, can't the root cause be mapped to a QOM
> property name (e.g. "cpu-generation"), even if it's property that
> can't be changed by the user?

In the current state, no.

So I think for runnable=false:
a) unavailable-features set -> can be made runnable
b) unavailable-features not set -> cannot be made runnable

would be enough.

> 
> We could replace this with something more generic, like:
> 
> @runnability-blockers: List of attributes that prevent the CPU
>   model from running in the current host.
>   
>   A list of QOM property names that represent CPU model
>   attributes that prevent the CPU from running. If the QOM
>   property is read-only, that means the CPU model can never run
>   in the current host. If the property is read-write, it means
>   that it MAY be possible to run the CPU model in the current
>   host if that property is changed.
>   
>   Management software can use it as hints to suggest or choose an
>   alternative for the user, or just to generate meaningful error
>   messages explaining why the CPU model can't be used.
> 
> (I am looking for a better name than "runnability-blockers").
> 



David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 0/9] Add runnability info to query-cpu-definitions

2016-05-09 Thread David Hildenbrand
> This series extends query-cpu-definitions to include two extra
> fields: "runnable", and "unavailable-features".
> 
> This will return information based on the current machine and
> accelerator only. In the future we may extend these mechanisms to
> allow querying other machines and other accelerators without
> restarting QEMU, but it will require some reorganization of
> QEMU's main code.

Hi Eduardo,

just as discussed during/after the last kvm forum offline, we are currently
working on a new set of qmp functions to keep all the cpu model details
in QEMU and not have to replicate them in libvirt and to have a nice way
to query the host cpu model.

We already have a prototype running, just need to clarify some s390x details
(feature set of the CPU generations) before we can send out an RFC. I could,
however, send out just the interface changes + details on the concept upfront,
if anybody is interested.

In the current implementation:
- s390x cpu models will be independent of the QEMU machine, so for s390x the
  runnability information will only depend on the accelerator (host cpu model).
  However, for compatibility machines, cpu models will be disabled, and we won't
  therefore have any runnability information.
- we will not touch query-cpu-definitions
- the new interfaces will also make use of the current accelerator and the
  current machine, so the context of these functions are the same.
  
Adding functionality to query for other machines is not necessary for s390x.
Other accelerators don't really make sense to me, let's keep it simple here.

So this change is fine from my point of view. It doesn't contradict to out
concept.

David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH 7/9] qmp: Add runnability information to query-cpu-definitions

2016-05-09 Thread David Hildenbrand
> Extend query-cpu-definitions schema to allow it to return two new
> optional fields: "runnable" and "unavailable-features".
> "runnable" will tell if the CPU model can be run in the current
> host. "unavailable-features" will contain a list of CPU
> properties that are preventing the CPU model from running in the
> current host.
> 
> Cc: David Hildenbrand <d...@linux.vnet.ibm.com>
> Cc: Michael Mueller <m...@linux.vnet.ibm.com>
> Cc: Christian Borntraeger <borntrae...@de.ibm.com>
> Cc: Cornelia Huck <cornelia.h...@de.ibm.com>
> Cc: Jiri Denemark <jdene...@redhat.com>
> Cc: libvir-list@redhat.com
> Signed-off-by: Eduardo Habkost <ehabk...@redhat.com>
> ---
>  qapi-schema.json | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi-schema.json b/qapi-schema.json
> index 54634c4..450e6e7 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -2948,11 +2948,19 @@
>  # Virtual CPU definition.
>  #
>  # @name: the name of the CPU definition
> +# @runnable: true if the CPU model is runnable using the current
> +#machine and accelerator. Optional. Since 2.6.
> +# @unavailable-features: List of properties that prevent the CPU
> +#model from running in the current host,
> +#if @runnable is false. Optional.
> +#Since 2.6.

Just FYI, on other architectures (e.g. s390x), other conditions (e.g. cpu
generation) also define if a CPU model is runnable, so the pure availability of
features does not mean that a cpu model is runnable.

We could have runnable=false and unavailable-features being an empty list.


>  #
>  # Since: 1.2.0
>  ##
>  { 'struct': 'CpuDefinitionInfo',
> -  'data': { 'name': 'str' } }
> +  'data': { 'name': 'str',
> +'*runnable': 'bool',
> +'*unavailable-features': [ 'str' ] } }
> 
>  ##
>  # @query-cpu-definitions:



David

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list