Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-11 Thread Peter Xu
On Fri, Aug 11, 2023 at 07:39:37PM +0200, David Hildenbrand wrote:
> On 11.08.23 18:54, Peter Xu wrote:
> > On Fri, Aug 11, 2023 at 06:25:14PM +0200, David Hildenbrand wrote:
> > > On 11.08.23 18:22, Peter Xu wrote:
> > > > On Fri, Aug 11, 2023 at 06:17:05PM +0200, David Hildenbrand wrote:
> > > > > We wouldn't touch "-mem-path".
> > > > 
> > > > But still the same issue when someone uses -object memory-backend-file 
> > > > for
> > > > hugetlb, mapping privately, expecting ram discard to work?
> > > > 
> > > > Basically I see that example as, "hugetlb" in general made the private
> > > > mapping over RW file usable, so forbidden that anywhere may take a risk.
> > > 
> > > These users can be directed to using hugetlb
> > > 
> > > a) using MAP_SHARED
> > > b) using memory-backend-memfd, if MAP_PRIVATE is desired
> > > 
> > > Am I missing any important use case? Are we being a bit to careful about
> > > virtio-balloon and postcopy simply not being available for these corner
> > > cases?
> > 
> > The current immediate issue is not really mem=rw + fd=rw + private case
> > (which was a known issue), but how to make mem=rw + fd=ro + private work
> > for ThinnerBloger, iiuc.
> > 
> > I'd just think it safer to expose that cap to solve problem A (vm
> > templating) without affecting problem B (fallcate-over-private not working
> > right), when B is uncertain.
> 
> Right, and I'm thinking about if B is worth the effort.
> 
> > 
> > I'm also copy Daniel & libvirt list in case there's quick comment from
> > there. Say, maybe libvirt never use private mapping on hugetlb files over
> > memory-backend-file at all, then it's probably fine.
> 
> libvirt certainly allows setting  with  type="file">.
> 
> Could be that they also end up mapping "" to memory-backend-file
> instead of memory-backend-memfd (e.g., compatibility with older kernels?).
> 
> > 
> > In all cases, you and Igor should have the final grasp; no stand on a
> > strong opinon from my side.
> 
> I do value your opinion, so I'm still trying to figure out if there are sane
> use cases that really need a new parameter. Let's recap:
> 
> When opening the file R/O, resulting in fallocate() refusing to work:
> * virtio-balloon will fail to discard RAM but continue to "be alive"
> * virtio-mem will discard any private pages, but cannot free up disk
>   blocks using fallocate.
> * postcopy would fail early
> 
> Postcopy:
> * Works on shmem (MAP_SHARED / MAP_PRIVATE)
> * Works on hugetlb (MAP_SHARED / MAP_PRIVATE)
> * Does not work on file-backed memory (including MAP_PRIVATE)
> 
> We can ignore virtio-mem for now. What remains is postcopy and
> virtio-balloon.
> 
> memory-backend-file with MAP_PRIVATE on shmem/tmpfs results in a double
> memory consumption, so we can mostly cross that out as "sane use case".
> Rather make such users aware of that :D
> 
> memory-backend-file with MAP_PRIVATE on hugetlb works. virtio-balloon is not
> really compatible with hugetlb, free-page-reporting might work (although
> quite non-nonsensical). So postcopy as the most important use case remains.
> 
> memory-backend-file with MAP_PRIVATE on file-backed memory works. postcopy
> does not apply. virtio-balloon should work I guess.
> 
> 
> So the two use cases that are left are:
> * postcopy with hugetlb would fail
> * virtio-balloon with file-backed memory cannot free up disk blocks
> 
> Am I missing a case?

Looks complete.  I don't want to say so, but afaik postcopy should be
"corner case" in most cases I'd say; people do still rely mostly on
precopy.  It's probably a matter of whether we'd like take the risk.

-- 
Peter Xu



Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-11 Thread Peter Xu
On Fri, Aug 11, 2023 at 06:25:14PM +0200, David Hildenbrand wrote:
> On 11.08.23 18:22, Peter Xu wrote:
> > On Fri, Aug 11, 2023 at 06:17:05PM +0200, David Hildenbrand wrote:
> > > We wouldn't touch "-mem-path".
> > 
> > But still the same issue when someone uses -object memory-backend-file for
> > hugetlb, mapping privately, expecting ram discard to work?
> > 
> > Basically I see that example as, "hugetlb" in general made the private
> > mapping over RW file usable, so forbidden that anywhere may take a risk.
> 
> These users can be directed to using hugetlb
> 
> a) using MAP_SHARED
> b) using memory-backend-memfd, if MAP_PRIVATE is desired
> 
> Am I missing any important use case? Are we being a bit to careful about
> virtio-balloon and postcopy simply not being available for these corner
> cases?

The current immediate issue is not really mem=rw + fd=rw + private case
(which was a known issue), but how to make mem=rw + fd=ro + private work
for ThinnerBloger, iiuc.

I'd just think it safer to expose that cap to solve problem A (vm
templating) without affecting problem B (fallcate-over-private not working
right), when B is uncertain.

I'm also copy Daniel & libvirt list in case there's quick comment from
there. Say, maybe libvirt never use private mapping on hugetlb files over
memory-backend-file at all, then it's probably fine.

In all cases, you and Igor should have the final grasp; no stand on a
strong opinon from my side.

-- 
Peter Xu



Re: [PATCH v2 5/5] migration: Deprecate old compression method

2023-06-27 Thread Peter Xu
On Thu, Jun 22, 2023 at 09:50:19PM +0200, Juan Quintela wrote:
> Signed-off-by: Juan Quintela 

Acked-by: Peter Xu 

-- 
Peter Xu



Re: [PATCH v2 1/5] migration: Use proper indentation for migration.json

2023-06-27 Thread Peter Xu
On Thu, Jun 22, 2023 at 09:50:15PM +0200, Juan Quintela wrote:
> We broke it with dirtyrate limit patches.
> 
> Signed-off-by: Juan Quintela 

Acked-by: Peter Xu 

-- 
Peter Xu



Re: [RFC 4/6] migration: Deprecate -incoming

2023-06-23 Thread Peter Xu
On Fri, Jun 23, 2023 at 09:23:18AM +0100, Daniel P. Berrangé wrote:
> On Thu, Jun 22, 2023 at 11:54:43AM -0400, Peter Xu wrote:
> > On Thu, Jun 22, 2023 at 10:59:58AM +0100, Daniel P. Berrangé wrote:
> > > I've mentioned several times before that the user should never need to
> > > set this multifd-channels parameter (nor many other parameters) on the
> > > destination in the first place.
> > > 
> > > The QEMU migration stream should be changed to add a full
> > > bi-directional handshake, with negotiation of most parameters.
> > > IOW, the src QEMU should be configured with 16 channels, and
> > > it should connect the primary control channel, and then directly
> > > tell the dest that it wants to use 16 multifd channels.
> > > 
> > > If we're expecting the user to pass this info across to the dest
> > > manually we've already spectacularly failed wrt user friendliness.
> > 
> > I can try to move the todo even higher.  Trying to list the initial goals
> > here:
> > 
> > - One extra phase of handshake between src/dst (maybe the time to boost
> >   QEMU_VM_FILE_VERSION) before anything else happens.
> > 
> > - Dest shouldn't need to apply any cap/param, it should get all from src.
> >   Dest still need to be setup with an URI and that should be all it needs.
> 
> There are a few that the dest will still need set explicitly. Specifically
> the TLS parameters - tls-authz and tls-creds, because those are both
> related to --object parameters configured on the dst QEMU. Potentially
> there's an argument to be made for the TLS parameters to be part fo the
> initial 'migrate' and 'migrate-incoming' command data, as they're
> specifically related to the connection establishment, while (most) of
> the other params are related to the migration protocol running inside
> the connection.

Ideally we can even make tls options to be after the main connection is
established, IOW the gnutls handshake can be part of the generic handshake.
But yeah I agree that may contain much more work, so we may start with
assuming the v2 handshake just happen on the tls channel built for now.

I think the new protocol should allow extension so when we want to move the
tls handshake into it v2 protocol should be able to first detect src/dst
binary support of that, and switch to that if we want - then we can even
got a src qemu migration failure which tells "dest qemu forget to setup tls
credentials in cmdlines", or anything wrong on dest during tls setup.

> 
> A few other parameters are also related to the connection establishment,
> most notably the enablement multifd, postcopy and postcopy-pre-empt.

As I mentioned in the list, I plan to make this part of the default v2
where v2 handshake will take care of managing the connections rather than
relying on the old code.  I'm not sure how complicated it'll be, but the v2
protocol just sounds a good fit for having such a major change on how we
setup the channels, and chance we get all things alright from the start.

> 
> I think with those ones we don't need to set them on the src either.
> With the new migration handshake we should probably use multifd
> codepaths unconditionally, with a single channel.

The v2 handshake will be beneficial to !multifd as well.  Right now I tend
to make it also work for !multifd, e.g., it always makes sense to do a
device tree comparision before migration, even if someone used special
tunneling so multifd may not be able to be enabled for whatever reason, but
as long as a return path is available so they can talk.

> By matching with the introduction of new protocol, we have a nice point
> against which to deprecate the old non-multifd codepaths. We'll need to
> keep the non-multifd code around *alot* longer than the normal
> deprecation cycle though, as we need mig to/from very old QEMUs.

I actually had a feeling that we should always keep it..  I'm not sure
whether we must combine a new handshake to "making multifd the default".  I
do think we can make the !multifd path very simple though, e.g., I'd think
we should start considering deprecate things like !multifd+compressions etc
earlier than that.

> 
> The enablement of postcopy could be automatic too - src & dst can
> both detect if their host OS supports it. That would make all
> migrations post-copy capable. The mgmt app just needs to trigger
> the switch to post-copy mode *if* they want to use it.

Sounds doable.

> 
> Likewise we can just always assume postcopy-pre-empt is available.
> 
> I think 'return-path' becomes another one we can just assume too.

Right, handshake cap (or with the new QAPI of URI replacement) should imply
return path already.

Thanks,

-- 
Peter Xu



Re: [RFC 4/6] migration: Deprecate -incoming

2023-06-23 Thread Peter Xu
On Fri, Jun 23, 2023 at 08:17:46AM +0100, Daniel P. Berrangé wrote:
> On Thu, Jun 22, 2023 at 03:20:01PM -0400, Peter Xu wrote:
> > On Thu, Jun 22, 2023 at 05:33:29PM +0100, Daniel P. Berrangé wrote:
> > > On Thu, Jun 22, 2023 at 11:54:43AM -0400, Peter Xu wrote:
> > > > I can try to move the todo even higher.  Trying to list the initial 
> > > > goals
> > > > here:
> > > > 
> > > > - One extra phase of handshake between src/dst (maybe the time to boost
> > > >   QEMU_VM_FILE_VERSION) before anything else happens.
> > > > 
> > > > - Dest shouldn't need to apply any cap/param, it should get all from 
> > > > src.
> > > >   Dest still need to be setup with an URI and that should be all it 
> > > > needs.
> > > > 
> > > > - Src shouldn't need to worry on the binary version of dst anymore as 
> > > > long
> > > >   as dest qemu supports handshake, because src can fetch it from dest.
> > > 
> > > I'm not sure that works in general. Even if we have a handshake and
> > > bi-directional comms for live migration, we still haave the save/restore
> > > to file codepath to deal with. The dst QEMU doesn't exist at the time
> > > the save process is done, so we can't add logic to VMSate handling that
> > > assumes knowledge of the dst version at time of serialization.
> > 
> > My current thought was still based on a new cap or anything the user would
> > need to specify first on both sides (but hopefully the last cap to set on
> > dest).
> > 
> > E.g. if with a new handshake cap we shouldn't set it on a exec: or file:
> > protocol migration, and it should just fail on qmp_migrate() telling that
> > the URI is not supported if the cap is set.  Return path is definitely
> > required here.
> 
> exec can support bi-directional migration - we have both stdin + stdout
> for the command. For exec it is mostly a documentation problem - you
> can't merely use 'cat' for example, but if you used 'socat' that could
> be made to work bi-directionally.

Okay.  Just an example that the handshake just cannot work for all the
cases, and it should always be able to fail.

So when exec doesn't properly provide return path, I think with
handshake=on we should get a timeout of not getting response properly and
fail the migration after the timeout, then.

There're a bunch of implications and details that need to be investigated
around such a handshake if it'll be proposed for real, so I'm not yet sure
whether there's something that may be surprising.  For channeltypes it
seems all fine for now.  Hopefully nothing obvious overlooked.

> 
> > > I don't think its possible for QEMU to validate that it has a fully
> > > bi-directional channel, without adding timeouts to its detection which I
> > > think we should strive to avoid.
> > > 
> > > I don't think we actually need self-bootstrapping anyway.
> > > 
> > > I think the mgmt app can just indicate the new v2 bi-directional
> > > protocol when issuing the 'migrate' and 'migrate-incoming'
> > > commands.  This becomes trivial when Het's refactoring of the
> > > migrate address QAPI is accepted:
> > > 
> > >   https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg04851.html
> > > 
> > > eg:
> > > 
> > > { "execute": "migrate",
> > >   "arguments": {
> > >   "channels": [ { "channeltype": "main",
> > >   "addr": { "transport": "socket", "type": "inet",
> > >"host": "10.12.34.9",
> > > "port": "1050" } } ] } }
> > > 
> > > note the 'channeltype' parameter here. If we declare the 'main'
> > > refers to the existing migration protocol, then we merely need
> > > to define a new 'channeltype' to use as an indicator for the
> > > v2 migration handshake protocol.
> > 
> > Using a new channeltype would also work at least on src qemu, but I'm not
> > sure on how dest qemu would know that it needs a handshake in that case,
> > because it knows nothing until the connection is established.
> 
> In Het's series the 'migrate_incoming' command similarly has a chaneltype
> parameter.

Oh, yeah then that'll just work.  Thanks.

-- 
Peter Xu



Re: [RFC 4/6] migration: Deprecate -incoming

2023-06-22 Thread Peter Xu
On Thu, Jun 22, 2023 at 05:33:29PM +0100, Daniel P. Berrangé wrote:
> On Thu, Jun 22, 2023 at 11:54:43AM -0400, Peter Xu wrote:
> > I can try to move the todo even higher.  Trying to list the initial goals
> > here:
> > 
> > - One extra phase of handshake between src/dst (maybe the time to boost
> >   QEMU_VM_FILE_VERSION) before anything else happens.
> > 
> > - Dest shouldn't need to apply any cap/param, it should get all from src.
> >   Dest still need to be setup with an URI and that should be all it needs.
> > 
> > - Src shouldn't need to worry on the binary version of dst anymore as long
> >   as dest qemu supports handshake, because src can fetch it from dest.
> 
> I'm not sure that works in general. Even if we have a handshake and
> bi-directional comms for live migration, we still haave the save/restore
> to file codepath to deal with. The dst QEMU doesn't exist at the time
> the save process is done, so we can't add logic to VMSate handling that
> assumes knowledge of the dst version at time of serialization.

My current thought was still based on a new cap or anything the user would
need to specify first on both sides (but hopefully the last cap to set on
dest).

E.g. if with a new handshake cap we shouldn't set it on a exec: or file:
protocol migration, and it should just fail on qmp_migrate() telling that
the URI is not supported if the cap is set.  Return path is definitely
required here.

> 
> > - Handshake can always fail gracefully if anything wrong happened, it
> >   normally should mean dest qemu is not compatible with src's setup (either
> >   machine, device, or migration configs) for whatever reason.  Src should
> >   be able to get a solid error from dest if so.
> > 
> > - Handshake protocol should always be self-bootstrap-able, it means when we
> >   change the handshake protocol it should always works with old binaries.
> > 
> >   - When src is newer it should be able to know what's missing on dest and
> > skip the new bits.
> > 
> >   - When dst is newer it should all rely on src (which is older) and it
> > should always understand src's language.
> 
> I'm not convinced it can reliably self-bootstrap in a backwards
> compatible manner, precisely because the current migration stream
> has no handshake and only requires a unidirectional channel.

Yes, please see above.  I meant when we grow the handshake protocol we
should make sure we don't need anything new to be setup either on src/dst
of qemu.  It won't apply to before-handshake binaries.

> I don't think its possible for QEMU to validate that it has a fully
> bi-directional channel, without adding timeouts to its detection which I
> think we should strive to avoid.
> 
> I don't think we actually need self-bootstrapping anyway.
> 
> I think the mgmt app can just indicate the new v2 bi-directional
> protocol when issuing the 'migrate' and 'migrate-incoming'
> commands.  This becomes trivial when Het's refactoring of the
> migrate address QAPI is accepted:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2023-05/msg04851.html
> 
> eg:
> 
> { "execute": "migrate",
>   "arguments": {
>   "channels": [ { "channeltype": "main",
>   "addr": { "transport": "socket", "type": "inet",
>"host": "10.12.34.9",
> "port": "1050" } } ] } }
> 
> note the 'channeltype' parameter here. If we declare the 'main'
> refers to the existing migration protocol, then we merely need
> to define a new 'channeltype' to use as an indicator for the
> v2 migration handshake protocol.

Using a new channeltype would also work at least on src qemu, but I'm not
sure on how dest qemu would know that it needs a handshake in that case,
because it knows nothing until the connection is established.

Maybe we still need QEMU_VM_FILE_VERSION to be boosted at least in this
case, so dest can read this at the very beginning, old binaries will fail
immediately, new binaries will start to talk with v2 language.

> 
> > - All !main channels need to be established later than the handshake - if
> >   we're going to do this anyway we probably should do it altogether to make
> >   channels named, so each channel used in migration needs to have a common
> >   header.  Prepare to deprecate the old tricks of channel orderings.
> 
> Once the primary channel involves a bi-directional handshake,
> we'll trivially ensure ordering - similar to how the existing
> code worked fnie in TLS mode which had a bi-directional TLS

Re: [RFC 4/6] migration: Deprecate -incoming

2023-06-22 Thread Peter Xu
On Thu, Jun 22, 2023 at 10:59:58AM +0100, Daniel P. Berrangé wrote:
> On Thu, Jun 22, 2023 at 10:52:12AM +0200, Juan Quintela wrote:
> > Paolo Bonzini  wrote:
> > > On 6/12/23 22:51, Juan Quintela wrote:
> > >>> Shall we just leave it there?  Or is deprecating it helps us in any 
> > >>> form?
> > >> See the patches two weeks ago when people complained that lisen(.., num)
> > >> was too low.  And there are other parameters that work the same way
> > >> (that I convenientely had forgotten).  So the easiest way to get things
> > >> right is to use "defer" always.  Using -incoming "uri" should only be
> > >> for people that "know what they are doing", so we had to ways to do it:
> > >> - review all migration options and see which ones work without defer
> > >>and document it
> > >> - deprecate everything that is not defer.
> > >
> > > "-incoming " is literally the same as running "migrate-incoming"
> > > as the first thing on the monitor:
> > >
> > > if (incoming) {
> > > Error *local_err = NULL;
> > > if (strcmp(incoming, "defer") != 0) {
> > > qmp_migrate_incoming(incoming, _err);
> > > if (local_err) {
> > > error_reportf_err(local_err, "-incoming %s: ", incoming);
> > > exit(1);
> > > }
> > > }
> > > } else if (autostart) {
> > > qmp_cont(NULL);
> > > }
> > >
> > > It's the only piece of code which distinguishes "-incoming defer" from
> > > "-incoming ".
> > >
> > > So I'm not sure what the problem would be with keeping it?
> > 
> > User friendliness.
> > 
> > First of all, I use it all the time.  And I know that it is useful for
> > developers.  I was the one asking peter to implement -global
> > migration.foo to be able to test multifd with it.
> > 
> > The problem is that if you use more than two channels with multifd, on
> > the incoming side, you need to do:
> > 
> > - migrate_set_parameter multifd-channels 16
> > - migrate_incoming 
> > 
> > And people continue to do:
> > 
> > - qemu -incoming 
> > - migrate_set_parameter multifd-channels 16 (on the command line)
> > 
> > And they complain that it fails, because we are calling listen with the
> > wrong value.
> 
> IMHO if we want to improve user friendliness then arguing about use
> of the CLI vs QMP for migration is completely missing the bigger
> picture IMHO.
> 
> I've mentioned several times before that the user should never need to
> set this multifd-channels parameter (nor many other parameters) on the
> destination in the first place.
> 
> The QEMU migration stream should be changed to add a full
> bi-directional handshake, with negotiation of most parameters.
> IOW, the src QEMU should be configured with 16 channels, and
> it should connect the primary control channel, and then directly
> tell the dest that it wants to use 16 multifd channels.
> 
> If we're expecting the user to pass this info across to the dest
> manually we've already spectacularly failed wrt user friendliness.

I can try to move the todo even higher.  Trying to list the initial goals
here:

- One extra phase of handshake between src/dst (maybe the time to boost
  QEMU_VM_FILE_VERSION) before anything else happens.

- Dest shouldn't need to apply any cap/param, it should get all from src.
  Dest still need to be setup with an URI and that should be all it needs.

- Src shouldn't need to worry on the binary version of dst anymore as long
  as dest qemu supports handshake, because src can fetch it from dest.

- Handshake can always fail gracefully if anything wrong happened, it
  normally should mean dest qemu is not compatible with src's setup (either
  machine, device, or migration configs) for whatever reason.  Src should
  be able to get a solid error from dest if so.

- Handshake protocol should always be self-bootstrap-able, it means when we
  change the handshake protocol it should always works with old binaries.

  - When src is newer it should be able to know what's missing on dest and
skip the new bits.

  - When dst is newer it should all rely on src (which is older) and it
should always understand src's language.

- All !main channels need to be established later than the handshake - if
  we're going to do this anyway we probably should do it altogether to make
  channels named, so each channel used in migration needs to have a common
  header.  Prepare to deprecate the old tricks of channel orderings.

Does it look reasonable?  Anything to add/remove?

Thanks,

-- 
Peter Xu



Re: [RFC 4/6] migration: Deprecate -incoming

2023-06-22 Thread Peter Xu
On Thu, Jun 22, 2023 at 11:22:56AM +0200, Thomas Huth wrote:
> Then simply forbid "migrate_set_parameter multifd-channels ..." if the uri
> has been specified on the command line?

Yeah, actually already in a pull (even though the pr may need a new one..):

https://lore.kernel.org/r/20230622021320.66124-23-quint...@redhat.com

-- 
Peter Xu



Re: [RFC 4/6] migration: Deprecate -incoming

2023-06-22 Thread Peter Xu
On Thu, Jun 22, 2023 at 12:01:55PM +0200, Juan Quintela wrote:
> Paolo Bonzini  wrote:
> > On 6/22/23 10:52, Juan Quintela wrote:
> >> User friendliness.
> >> The problem is that if you use more than two channels with multifd, on
> >> the incoming side, you need to do:
> >
> > You're sacrificing user-friendliness for the 99.99% that don't use
> > multifd, for an error (i.e. it's not even fixing the issue) for the
> > 0.01% that use multifd.  That's not user-friendly.
> 
> You are forgeting of the 0.01% that uses postocopy preempt (that is easy
> just changing the default value to 2), and the 0.001% that uses
> compression (they have exactly the same problem with compress_threads,
> compress_zlib, etc).
> 
> >> - migrate_set_parameter multifd-channels 16
> >> - migrate_incoming 
> >> 
> >>> The issue is not how many features the command line has, but how
> >>> they're implemented.
> >> Or if they are confusing for the user?
> >
> > Anyone using multifd is not a typical user anyway.
> 
> >>> If they're just QMP wrappers and as such they're self-contained in
> >>> softmmu/vl.c, that's fine.
> >>>
> >>> In fact, even for parameters, we could use keyval to parse "-incoming"
> >> What is keyval?
> >
> > util/keyval.c and include/qemu/keyval.h.  It parses a list of
> > key=value pairs into a QDict.  Once you have removed the "source" key
> > from the QDict you can use a visitor to parse the rest into a
> > MigrateSetParameters.  See the handling of QEMU_OPTION_audio, it could
> > be something like
> >
> >
> > case QEMU_OPTION_incoing: {
> > Visitor *v;
> > MigrateSetParameters *incoming_params = NULL;
> > QDict *dict = keyval_parse(optarg, "source", NULL,
> > _fatal);
> >
> > if (incoming) {
> > if (qdict_haskey(dict, "source")) {
> > error_setg(_fatal, "Parameter 'source'
> > is duplicate");
> > }
> > } else {
> > if (!qdict_haskey(dict, "source")) {
> > error_setg(_fatal, "Parameter 'source'
> > is missing");
> > }
> > runstate_set(RUN_STATE_INMIGRATE);
> > incoming = g_strdup(qdict_get_str(dict, "source"));
> > qdict_del(dict, "source");
> > }
> >
> > v = qobject_input_visitor_new_keyval(QOBJECT(dict));
> > qobject_unref(dict);
> > visit_type_MigrateSetParameters(v, NULL,
> > _params, _fatal);
> > visit_free(v);
> > qmp_migration_set_parameters(incoming_params,

PS: we may want to postpone this to be later than migration_object_init(),
when/if there's a real patch.

> > _fatal);
> > qapi_free_MigrateSetParameters(incoming_params);
> > }
> >
> >
> > For example "-incoming [source=]tcp:foo,multifd-channels=16" would
> > desugar to
> >
> >   migrate_set_parameter multifd-channels 16
> >   migrate_incoming tcp:foo
> >
> > The only incompatibility is for people who are using "," in an URI,
> > which is rare and only an issue for the "exec" protocol.

If we worry on breaking anyone, we can apply the keyval parsing only when
!exec.  Not sure whether it matters a huge lot..

> 
> Aha, that makes sense.  And will allow us to deprecate/remove the
> --global migration.* stuff.

We may still need a way to set the caps/params for src qemu?..

Thanks,

-- 
Peter Xu



Re: [RFC 4/6] migration: Deprecate -incoming

2023-06-20 Thread Peter Xu
On Tue, Jun 20, 2023 at 01:10:55PM +0100, Daniel P. Berrangé wrote:
> In some cases it is worth having a convenience option for user friendliness.
> 
> In this case, however, users are already needing to use QMP/HMP on the
> source side to set migration parameters. I think it is reasonable to say
> that doing *exactly* the same with QMP/HMP on the destination side is
> actually more friendly than requiring use of -global on the dest side
> which is different syntax.

The -global parameters also work for the src side for my scripts, so I only
need to attach "-incoming tcp:XXX" for dst cmdline here.

> 
> We don't document the way to use -global with migration parameters so
> when people see '-incoming' I think we are steering them into a trap,
> making it look like -incoming is sufficient on its own. Hence that user's
> mistake recently about not setting migration parameters.
> 
> Overall I agree with deprecating  '-incoming' != 'defer', as I think it
> will better guide users to following the correct steps, as is not a
> usability burden as they're already using QMP/HMP on the source side.

I'd say -global is definitely not for users but for developers.  Just like
we keep around HMP - I'm not sure whether it's used in productions, I
thought it was only for developers and we don't deprecate it eagerly.

No strong opinions here if everyone wants to get rid of that, but before
that I hope we can have some kind of cmdline tool that can easily tunnel
qmp commands so whoever developers using pure cmdlines can switch to the
new way easily.

Thanks,

-- 
Peter Xu



Re: [RFC 4/6] migration: Deprecate -incoming

2023-06-12 Thread Peter Xu
On Mon, Jun 12, 2023 at 10:51:08PM +0200, Juan Quintela wrote:
> Peter Xu  wrote:
> > On Mon, Jun 12, 2023 at 09:33:42PM +0200, Juan Quintela wrote:
> >> Only "defer" is recommended.  After setting all migation parameters,
> >> start incoming migration with "migrate-incoming uri" command.
> >> 
> >> Signed-off-by: Juan Quintela 
> >> ---
> >>  docs/about/deprecated.rst | 7 +++
> >>  softmmu/vl.c  | 2 ++
> >>  2 files changed, 9 insertions(+)
> >> 
> >> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> >> index 47e98dc95e..518672722d 100644
> >> --- a/docs/about/deprecated.rst
> >> +++ b/docs/about/deprecated.rst
> >> @@ -447,3 +447,10 @@ The new way to modify migration is using migration 
> >> parameters.
> >>  ``blk`` functionality can be acchieved using
> >>  ``migrate_set_parameter block-incremental true``.
> >>  
> >> +``-incoming uri`` (since 8.1)
> >> +'
> >> +
> >> +Everything except ``-incoming defer`` are deprecated.  This allows to
> >> +setup parameters before launching the proper migration with
> >> +``migrate-incoming uri``.
> >> +
> >> diff --git a/softmmu/vl.c b/softmmu/vl.c
> >> index b0b96f67fa..7fe865ab59 100644
> >> --- a/softmmu/vl.c
> >> +++ b/softmmu/vl.c
> >> @@ -2651,6 +2651,8 @@ void qmp_x_exit_preconfig(Error **errp)
> >>  if (incoming) {
> >>  Error *local_err = NULL;
> >>  if (strcmp(incoming, "defer") != 0) {
> >> +warn_report("-incoming %s is deprecated, use -incoming defer 
> >> and "
> >> +" set the uri with migrate-incoming.", incoming);
> >
> > I still use uri for all my scripts, alongside with "-global migration.xxx"
> > and it works.
> 
> You know what you are doing (TM).
> And remember that we don't support -gobal migration.x-foo.
> Yes, I know, we should drop the "x-" prefixes.

I hope they'll always be there. :) They're pretty handy for tests, when we
want to boot a VM without the need to script the sequences of qmp cmds.

Yes, we probably should just always drop the x-.  We can always declare
debugging purpose for all -global migration.* fields.

> 
> > Shall we just leave it there?  Or is deprecating it helps us in any form?
> 
> See the patches two weeks ago when people complained that lisen(.., num)
> was too low.  And there are other parameters that work the same way
> (that I convenientely had forgotten).  So the easiest way to get things
> right is to use "defer" always.  Using -incoming "uri" should only be
> for people that "know what they are doing", so we had to ways to do it:
> - review all migration options and see which ones work without defer
>   and document it
> - deprecate everything that is not defer.
> 
> Anything else is not going to be very user unfriendly.
> What do you think.

IIRC Wei Wang had a series just for that, so after that patchset applied we
should have fixed all issues cleanly?  Is there one more thing that's not
working right there?

> 
> Later, Juan.
> 
> PD.  This series are RFC for multiple reasons O:-)

Happy to know the rest (besides which I know will break my script :).

-- 
Peter Xu



Re: [RFC 4/6] migration: Deprecate -incoming

2023-06-12 Thread Peter Xu
On Mon, Jun 12, 2023 at 09:33:42PM +0200, Juan Quintela wrote:
> Only "defer" is recommended.  After setting all migation parameters,
> start incoming migration with "migrate-incoming uri" command.
> 
> Signed-off-by: Juan Quintela 
> ---
>  docs/about/deprecated.rst | 7 +++
>  softmmu/vl.c  | 2 ++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index 47e98dc95e..518672722d 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -447,3 +447,10 @@ The new way to modify migration is using migration 
> parameters.
>  ``blk`` functionality can be acchieved using
>  ``migrate_set_parameter block-incremental true``.
>  
> +``-incoming uri`` (since 8.1)
> +'
> +
> +Everything except ``-incoming defer`` are deprecated.  This allows to
> +setup parameters before launching the proper migration with
> +``migrate-incoming uri``.
> +
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index b0b96f67fa..7fe865ab59 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -2651,6 +2651,8 @@ void qmp_x_exit_preconfig(Error **errp)
>  if (incoming) {
>  Error *local_err = NULL;
>  if (strcmp(incoming, "defer") != 0) {
> +warn_report("-incoming %s is deprecated, use -incoming defer and 
> "
> +" set the uri with migrate-incoming.", incoming);

I still use uri for all my scripts, alongside with "-global migration.xxx"
and it works.

Shall we just leave it there?  Or is deprecating it helps us in any form?

-- 
Peter Xu



Re: [libvirt PATCH v2 81/81] RFC: qemu: Keep vCPUs paused while migration is in postcopy-paused

2022-06-06 Thread Peter Xu
[copy Dave, for real]

On Mon, Jun 06, 2022 at 10:32:03AM -0400, Peter Xu wrote:
> [copy Dave]
> 
> On Mon, Jun 06, 2022 at 12:29:39PM +0100, Daniel P. Berrangé wrote:
> > On Wed, Jun 01, 2022 at 02:50:21PM +0200, Jiri Denemark wrote:
> > > QEMU keeps guest CPUs running even in postcopy-paused migration state so
> > > that processes that already have all memory pages they need migrated to
> > > the destination can keep running. However, this behavior might bring
> > > unexpected delays in interprocess communication as some processes will
> > > be stopped until migration is recover and their memory pages migrated.
> > > So let's make sure all guest CPUs are paused while postcopy migration is
> > > paused.
> > > ---
> > > 
> > > Notes:
> > > Version 2:
> > > - new patch
> > > 
> > > - this patch does not currently work as QEMU cannot handle "stop"
> > >   QMP command while in postcopy-paused state... the monitor just
> > >   hangs (see https://gitlab.com/qemu-project/qemu/-/issues/1052 )
> > > - an ideal solution of the QEMU bug would be if QEMU itself paused
> > >   the CPUs for us and we just got notified about it via QMP events
> > > - but Peter Xu thinks this behavior is actually worse than keeping
> > >   vCPUs running
> > 
> > I'd like to know what the rationale is here ?
> 
> I think the wording here is definitely stronger than what I meant. :-)
> 
> My understanding was stopping the VM may or may not help the guest,
> depending on the guest behavior at the point of migration failure.  And if
> we're not 100% sure of that, doing nothing is the best we have, as
> explicitly stopping the VM is something extra we do, and it's not part of
> the requirements for either postcopy itself or the recovery routine.
> 
> Some examples below.
> 
> 1) If many of the guest threads are doing cpu intensive work, and if the
> needed pageset is already migrated, then stopping the vcpu threads means
> they could have been running during this "downtime" but we forced them not
> to.  Actually if the postcopy didn't pause immediately right after switch,
> we could very possibly migrated the workload pages if the working set is
> not very large.
> 
> 2) If we're reaching the end of the postcopy phase and it paused, most of
> the pages could have been migrated already.  So maybe only a few or even
> none thread will be stopped due to remote page faults.
> 
> 3) Think about kvm async page fault: that's a feature that the guest can do
> to yield the guest thread when there's a page fault.  It means even if some
> of the page faulted threads got stuck for a long time due to postcopy
> pausing, the guest is "smart" to know it'll take a long time (userfaultfd
> is a major fault, and as long as KVM gup won't get the page we put the page
> fault into async pf queue) then the guest vcpu can explicitly schedule()
> the faulted context and run some other threads that may not need to be
> blocked.
> 
> What I wanted to say is I don't know whether assuming "stopping the VM will
> be better than not doing so" will always be true here.  If it's case by
> case I feel like the better way to do is to do nothing special.
> 
> > 
> > We've got a long history knowing the behaviour and impact when
> > pausing a VM as a whole. Of course some apps may have timeouts
> > that are hit if the paused time was too long, but overall this
> > scenario is not that different from a bare metal machine doing
> > suspend-to-ram. Application impact is limited & predictable and
> > genrally well understood.
> 
> My other question is, even if we stopped the VM then right after we resume
> the VM won't many of those timeout()s trigger as well?  I think I asked
> similar question to Jiri and the answer at that time was that we could have
> not called the timeout() function, however I think it's not persuasive
> enough as timeout() is the function that should take the major time so at
> least we're not sure whether we'll be on it already.
> 
> My understanding is that a VM can work properly after a migration because
> the guest timekeeping will gradually sync up with the real world time, so
> if there's a major donwtime triggered we can hardly make it not affecting
> the guest.  What we can do is if we know a software is in VM context we
> should be robust on the timeout (and that's at least what I do on programs
> even on bare metal because I'd assume the program be run on an extremely
> busy host).
> 
> But I could be all wrong on that, because I don't know enough on the whole
> ra

Re: [libvirt PATCH v2 81/81] RFC: qemu: Keep vCPUs paused while migration is in postcopy-paused

2022-06-06 Thread Peter Xu
[copy Dave]

On Mon, Jun 06, 2022 at 12:29:39PM +0100, Daniel P. Berrangé wrote:
> On Wed, Jun 01, 2022 at 02:50:21PM +0200, Jiri Denemark wrote:
> > QEMU keeps guest CPUs running even in postcopy-paused migration state so
> > that processes that already have all memory pages they need migrated to
> > the destination can keep running. However, this behavior might bring
> > unexpected delays in interprocess communication as some processes will
> > be stopped until migration is recover and their memory pages migrated.
> > So let's make sure all guest CPUs are paused while postcopy migration is
> > paused.
> > ---
> > 
> > Notes:
> > Version 2:
> > - new patch
> > 
> > - this patch does not currently work as QEMU cannot handle "stop"
> >   QMP command while in postcopy-paused state... the monitor just
> >   hangs (see https://gitlab.com/qemu-project/qemu/-/issues/1052 )
> > - an ideal solution of the QEMU bug would be if QEMU itself paused
> >   the CPUs for us and we just got notified about it via QMP events
> > - but Peter Xu thinks this behavior is actually worse than keeping
> >   vCPUs running
> 
> I'd like to know what the rationale is here ?

I think the wording here is definitely stronger than what I meant. :-)

My understanding was stopping the VM may or may not help the guest,
depending on the guest behavior at the point of migration failure.  And if
we're not 100% sure of that, doing nothing is the best we have, as
explicitly stopping the VM is something extra we do, and it's not part of
the requirements for either postcopy itself or the recovery routine.

Some examples below.

1) If many of the guest threads are doing cpu intensive work, and if the
needed pageset is already migrated, then stopping the vcpu threads means
they could have been running during this "downtime" but we forced them not
to.  Actually if the postcopy didn't pause immediately right after switch,
we could very possibly migrated the workload pages if the working set is
not very large.

2) If we're reaching the end of the postcopy phase and it paused, most of
the pages could have been migrated already.  So maybe only a few or even
none thread will be stopped due to remote page faults.

3) Think about kvm async page fault: that's a feature that the guest can do
to yield the guest thread when there's a page fault.  It means even if some
of the page faulted threads got stuck for a long time due to postcopy
pausing, the guest is "smart" to know it'll take a long time (userfaultfd
is a major fault, and as long as KVM gup won't get the page we put the page
fault into async pf queue) then the guest vcpu can explicitly schedule()
the faulted context and run some other threads that may not need to be
blocked.

What I wanted to say is I don't know whether assuming "stopping the VM will
be better than not doing so" will always be true here.  If it's case by
case I feel like the better way to do is to do nothing special.

> 
> We've got a long history knowing the behaviour and impact when
> pausing a VM as a whole. Of course some apps may have timeouts
> that are hit if the paused time was too long, but overall this
> scenario is not that different from a bare metal machine doing
> suspend-to-ram. Application impact is limited & predictable and
> genrally well understood.

My other question is, even if we stopped the VM then right after we resume
the VM won't many of those timeout()s trigger as well?  I think I asked
similar question to Jiri and the answer at that time was that we could have
not called the timeout() function, however I think it's not persuasive
enough as timeout() is the function that should take the major time so at
least we're not sure whether we'll be on it already.

My understanding is that a VM can work properly after a migration because
the guest timekeeping will gradually sync up with the real world time, so
if there's a major donwtime triggered we can hardly make it not affecting
the guest.  What we can do is if we know a software is in VM context we
should be robust on the timeout (and that's at least what I do on programs
even on bare metal because I'd assume the program be run on an extremely
busy host).

But I could be all wrong on that, because I don't know enough on the whole
rational of the importance of stopping the VM in the past.

> 
> I don't think we can say the same about the behaviour & impact
> on the guest OS if we selectively block execution of random
> CPUs.  An OS where a certain physical CPU simply stops executing
> is not a normal scenario that any application or OS is designed
> to expect. I think the chance of the guest OS or application
> breaking in a non-recoverable way is high. IOW, we might perform
> post-copy recovery and all m

Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-08-04 Thread Peter Xu
On Mon, Aug 03, 2020 at 06:00:34PM +0200, Eugenio Pérez wrote:
> On Fri, 2020-07-03 at 15:24 +0800, Jason Wang wrote:
> > On 2020/7/2 下午11:45, Peter Xu wrote:
> > > On Thu, Jul 02, 2020 at 11:01:54AM +0800, Jason Wang wrote:
> > > > So I think we agree that a new notifier is needed?
> > > Good to me, or a new flag should be easier (IOMMU_NOTIFIER_DEV_IOTLB)?
> > 
> > That should work but I wonder something as following is better.
> > 
> > Instead of introducing new flags, how about carry the type of event in 
> > the notifier then the device (vhost) can choose the message it want to 
> > process like:
> > 
> > static vhost_iommu_event(IOMMUNotifier *n, IOMMUTLBEvent *event)
> > 
> > {
> > 
> > switch (event->type) {
> > 
> > case IOMMU_MAP:
> > case IOMMU_UNMAP:
> > case IOMMU_DEV_IOTLB_UNMAP:
> > ...
> > 
> > }
> > 
> > Thanks
> > 
> > 
> 
> Hi!
> 
> Sorry, I thought I had this clear but now it seems not so clear to me. Do you 
> mean to add that switch to the current
> vhost_iommu_unmap_notify, and then the "type" field to the IOMMUTLBEntry? Is 
> that the scope of the changes, or there is
> something I'm missing?
> 
> If that is correct, what is the advantage for vhost or other notifiers? I 
> understand that move the IOMMUTLBEntry (addr,
> len) -> (iova, mask) split/transformation to the different notifiers 
> implementation could pollute them, but this is even a deeper change and vhost 
> is not insterested in other events but IOMMU_UNMAP, isn't?
> 
> On the other hand, who decide what type of event is? If I follow the 
> backtrace of the assert in 
> https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01015.html, it seems 
> to me that it should be
> vtd_process_device_iotlb_desc. How do I know if it should be IOMMU_UNMAP or 
> IOMMU_DEV_IOTLB_UNMAP? If I set it in some
> function of memory.c, I should decide the type looking the actual notifier, 
> isn't?

(Since Jason didn't reply yesterday, I'll try to; Jason, feel free to correct
 me...)

IMHO whether to put the type into the IOMMUTLBEntry is not important.  The
important change should be that we introduce IOMMU_DEV_IOTLB_UNMAP (or I'd
rather call it IOMMU_DEV_IOTLB directly which is shorter and cleaner).  With
that information we can make the failing assertion conditional for MAP/UNMAP
only.  We can also allow dev-iotlb messages to take arbitrary addr_mask (so it
becomes a length of address range; imho we can keep using addr_mask for
simplicity, but we can comment for addr_mask that for dev-iotlb it can be not a
real mask).

Thanks,

-- 
Peter Xu



Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-21 Thread Peter Xu
On Tue, Jul 21, 2020 at 02:20:01PM +0800, Jason Wang wrote:
> 
> On 2020/7/20 下午9:03, Peter Xu wrote:
> > On Mon, Jul 20, 2020 at 12:02:06PM +0800, Jason Wang wrote:
> > > Right, so there's no need to deal with unmap in vtd's replay 
> > > implementation
> > > (as what generic one did).
> > We don't even for now; see vtd_page_walk_info.notify_unmap.  Thanks,
> 
> 
> Right, but I meant the vtd_address_space_unmap() in vtd_iomm_replay(). It
> looks to me it will trigger UNMAP notifiers.

Should be pretty safe for now, but I agree it seems optional (as we've
discussed about this previously).  Thanks,

-- 
Peter Xu



Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-20 Thread Peter Xu
On Mon, Jul 20, 2020 at 12:02:06PM +0800, Jason Wang wrote:
> Right, so there's no need to deal with unmap in vtd's replay implementation
> (as what generic one did).

We don't even for now; see vtd_page_walk_info.notify_unmap.  Thanks,

-- 
Peter Xu



Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-17 Thread Peter Xu
On Thu, Jul 16, 2020 at 10:54:31AM +0800, Jason Wang wrote:
> 
> On 2020/7/16 上午9:00, Peter Xu wrote:
> > On Mon, Jul 13, 2020 at 12:04:16PM +0800, Jason Wang wrote:
> > > On 2020/7/10 下午9:30, Peter Xu wrote:
> > > > On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote:
> > > > > On 2020/7/9 下午10:10, Peter Xu wrote:
> > > > > > On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote:
> > > > > > > > > - If we care the performance, it's better to implement the 
> > > > > > > > > MAP event for
> > > > > > > > > vhost, otherwise it could be a lot of IOTLB miss
> > > > > > > > I feel like these are two things.
> > > > > > > > 
> > > > > > > > So far what we are talking about is whether vt-d should have 
> > > > > > > > knowledge about
> > > > > > > > what kind of events one iommu notifier is interested in.  I 
> > > > > > > > still think we
> > > > > > > > should keep this as answered in question 1.
> > > > > > > > 
> > > > > > > > The other question is whether we want to switch vhost from 
> > > > > > > > UNMAP to MAP/UNMAP
> > > > > > > > events even without vDMA, so that vhost can establish the 
> > > > > > > > mapping even before
> > > > > > > > IO starts.  IMHO it's doable, but only if the guest runs DPDK 
> > > > > > > > workloads.  When
> > > > > > > > the guest is using dynamic iommu page mappings, I feel like 
> > > > > > > > that can be even
> > > > > > > > slower, because then the worst case is for each IO we'll need 
> > > > > > > > to vmexit twice:
> > > > > > > > 
> > > > > > > >   - The first vmexit caused by an invalidation to MAP the 
> > > > > > > > page tables, so vhost
> > > > > > > > will setup the page table before IO starts
> > > > > > > > 
> > > > > > > >   - IO/DMA triggers and completes
> > > > > > > > 
> > > > > > > >   - The second vmexit caused by another invalidation to 
> > > > > > > > UNMAP the page tables
> > > > > > > > 
> > > > > > > > So it seems to be worse than when vhost only uses UNMAP like 
> > > > > > > > right now.  At
> > > > > > > > least we only have one vmexit (when UNMAP).  We'll have a vhost 
> > > > > > > > translate()
> > > > > > > > request from kernel to userspace, but IMHO that's cheaper than 
> > > > > > > > the vmexit.
> > > > > > > Right but then I would still prefer to have another notifier.
> > > > > > > 
> > > > > > > Since vtd_page_walk has nothing to do with device IOTLB. IOMMU 
> > > > > > > have a
> > > > > > > dedicated command for flushing device IOTLB. But the check for
> > > > > > > vtd_as_has_map_notifier is used to skip the device which can do 
> > > > > > > demand
> > > > > > > paging via ATS or device specific way. If we have two different 
> > > > > > > notifiers,
> > > > > > > vhost will be on the device iotlb notifier so we don't need it at 
> > > > > > > all?
> > > > > > But we can still have iommu notifier that only registers to UNMAP 
> > > > > > even after we
> > > > > > introduce dev-iotlb notifier?  We don't want to do page walk for 
> > > > > > them as well.
> > > > > > TCG should be the only one so far, but I don't know.. maybe there 
> > > > > > can still be
> > > > > > new ones?
> > > > > I think you're right. But looking at the codes, it looks like the 
> > > > > check of
> > > > > vtd_as_has_map_notifier() was only used in:
> > > > > 
> > > > > 1) vtd_iommu_replay()
> > > > > 2) vtd_iotlb_page_invalidate_notify() (PSI)
> > > > > 
> > > > > For the replay, it's expensive anyhow. For PSI, I think it's just 
> > > > > about one
> > > > > or few mappings, no

Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-15 Thread Peter Xu
On Mon, Jul 13, 2020 at 12:04:16PM +0800, Jason Wang wrote:
> 
> On 2020/7/10 下午9:30, Peter Xu wrote:
> > On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote:
> > > On 2020/7/9 下午10:10, Peter Xu wrote:
> > > > On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote:
> > > > > > > - If we care the performance, it's better to implement the MAP 
> > > > > > > event for
> > > > > > > vhost, otherwise it could be a lot of IOTLB miss
> > > > > > I feel like these are two things.
> > > > > > 
> > > > > > So far what we are talking about is whether vt-d should have 
> > > > > > knowledge about
> > > > > > what kind of events one iommu notifier is interested in.  I still 
> > > > > > think we
> > > > > > should keep this as answered in question 1.
> > > > > > 
> > > > > > The other question is whether we want to switch vhost from UNMAP to 
> > > > > > MAP/UNMAP
> > > > > > events even without vDMA, so that vhost can establish the mapping 
> > > > > > even before
> > > > > > IO starts.  IMHO it's doable, but only if the guest runs DPDK 
> > > > > > workloads.  When
> > > > > > the guest is using dynamic iommu page mappings, I feel like that 
> > > > > > can be even
> > > > > > slower, because then the worst case is for each IO we'll need to 
> > > > > > vmexit twice:
> > > > > > 
> > > > > >  - The first vmexit caused by an invalidation to MAP the page 
> > > > > > tables, so vhost
> > > > > >will setup the page table before IO starts
> > > > > > 
> > > > > >  - IO/DMA triggers and completes
> > > > > > 
> > > > > >  - The second vmexit caused by another invalidation to UNMAP 
> > > > > > the page tables
> > > > > > 
> > > > > > So it seems to be worse than when vhost only uses UNMAP like right 
> > > > > > now.  At
> > > > > > least we only have one vmexit (when UNMAP).  We'll have a vhost 
> > > > > > translate()
> > > > > > request from kernel to userspace, but IMHO that's cheaper than the 
> > > > > > vmexit.
> > > > > Right but then I would still prefer to have another notifier.
> > > > > 
> > > > > Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a
> > > > > dedicated command for flushing device IOTLB. But the check for
> > > > > vtd_as_has_map_notifier is used to skip the device which can do demand
> > > > > paging via ATS or device specific way. If we have two different 
> > > > > notifiers,
> > > > > vhost will be on the device iotlb notifier so we don't need it at all?
> > > > But we can still have iommu notifier that only registers to UNMAP even 
> > > > after we
> > > > introduce dev-iotlb notifier?  We don't want to do page walk for them 
> > > > as well.
> > > > TCG should be the only one so far, but I don't know.. maybe there can 
> > > > still be
> > > > new ones?
> > > 
> > > I think you're right. But looking at the codes, it looks like the check of
> > > vtd_as_has_map_notifier() was only used in:
> > > 
> > > 1) vtd_iommu_replay()
> > > 2) vtd_iotlb_page_invalidate_notify() (PSI)
> > > 
> > > For the replay, it's expensive anyhow. For PSI, I think it's just about 
> > > one
> > > or few mappings, not sure it will have obvious performance impact.
> > > 
> > > And I had two questions:
> > > 
> > > 1) The codes doesn't check map for DSI or GI, does this match what spec
> > > said? (It looks to me the spec is unclear in this part)
> > Both DSI/GI should cover maps too?  E.g. vtd_sync_shadow_page_table() in
> > vtd_iotlb_domain_invalidate().
> 
> 
> I meant the code doesn't check whether there's an MAP notifier :)

It's actually checked, because it loops over vtd_as_with_notifiers, and only
MAP notifiers register to that. :)

But I agree with you that it should be cleaner to introduce the dev-iotlb
notifier type.

> 
> 
> > 
> > > 2) for the replay() I don't see other implementations (either spapr or
> > > generic one) that did unmap (actually they skip unmap explicitly),

Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-10 Thread Peter Xu
On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote:
> 
> On 2020/7/9 下午10:10, Peter Xu wrote:
> > On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote:
> > > > > - If we care the performance, it's better to implement the MAP event 
> > > > > for
> > > > > vhost, otherwise it could be a lot of IOTLB miss
> > > > I feel like these are two things.
> > > > 
> > > > So far what we are talking about is whether vt-d should have knowledge 
> > > > about
> > > > what kind of events one iommu notifier is interested in.  I still think 
> > > > we
> > > > should keep this as answered in question 1.
> > > > 
> > > > The other question is whether we want to switch vhost from UNMAP to 
> > > > MAP/UNMAP
> > > > events even without vDMA, so that vhost can establish the mapping even 
> > > > before
> > > > IO starts.  IMHO it's doable, but only if the guest runs DPDK 
> > > > workloads.  When
> > > > the guest is using dynamic iommu page mappings, I feel like that can be 
> > > > even
> > > > slower, because then the worst case is for each IO we'll need to vmexit 
> > > > twice:
> > > > 
> > > > - The first vmexit caused by an invalidation to MAP the page 
> > > > tables, so vhost
> > > >   will setup the page table before IO starts
> > > > 
> > > > - IO/DMA triggers and completes
> > > > 
> > > > - The second vmexit caused by another invalidation to UNMAP the 
> > > > page tables
> > > > 
> > > > So it seems to be worse than when vhost only uses UNMAP like right now. 
> > > >  At
> > > > least we only have one vmexit (when UNMAP).  We'll have a vhost 
> > > > translate()
> > > > request from kernel to userspace, but IMHO that's cheaper than the 
> > > > vmexit.
> > > 
> > > Right but then I would still prefer to have another notifier.
> > > 
> > > Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a
> > > dedicated command for flushing device IOTLB. But the check for
> > > vtd_as_has_map_notifier is used to skip the device which can do demand
> > > paging via ATS or device specific way. If we have two different notifiers,
> > > vhost will be on the device iotlb notifier so we don't need it at all?
> > But we can still have iommu notifier that only registers to UNMAP even 
> > after we
> > introduce dev-iotlb notifier?  We don't want to do page walk for them as 
> > well.
> > TCG should be the only one so far, but I don't know.. maybe there can still 
> > be
> > new ones?
> 
> 
> I think you're right. But looking at the codes, it looks like the check of
> vtd_as_has_map_notifier() was only used in:
> 
> 1) vtd_iommu_replay()
> 2) vtd_iotlb_page_invalidate_notify() (PSI)
> 
> For the replay, it's expensive anyhow. For PSI, I think it's just about one
> or few mappings, not sure it will have obvious performance impact.
> 
> And I had two questions:
> 
> 1) The codes doesn't check map for DSI or GI, does this match what spec
> said? (It looks to me the spec is unclear in this part)

Both DSI/GI should cover maps too?  E.g. vtd_sync_shadow_page_table() in
vtd_iotlb_domain_invalidate().

> 2) for the replay() I don't see other implementations (either spapr or
> generic one) that did unmap (actually they skip unmap explicitly), any
> reason for doing this in intel IOMMU?

I could be wrong, but I'd guess it's because vt-d implemented the caching mode
by leveraging the same invalidation strucuture, so it's harder to make all
things right (IOW, we can't clearly identify MAP with UNMAP when we receive an
invalidation request, because MAP/UNMAP requests look the same).

I didn't check others, but I believe spapr is doing it differently by using
some hypercalls to deliver IOMMU map/unmap requests, which seems a bit close to
what virtio-iommu is doing.  Anyway, the point is if we have explicit MAP/UNMAP
from the guest, logically the replay indeed does not need to do any unmap
because we don't need to call replay() on an already existing device but only
for e.g. hot plug.  VT-d does not have that clear interface, so VT-d needs to
maintain its own mapping structures, and also vt-d is using the same replay &
page_walk operations to sync all these structures, which complicated the vt-d
replay a bit.  With that, we assume replay() can be called anytime on a device,
and we won't notify duplicated MAPs to lower layer like vfio if it is mapped
before.  At the meantime, since we'll compare the latest mapping with the one
we cached in the iova tree, UNMAP becomes possible too.

Thanks,

-- 
Peter Xu



Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-09 Thread Peter Xu
On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote:
> > > - If we care the performance, it's better to implement the MAP event for
> > > vhost, otherwise it could be a lot of IOTLB miss
> > I feel like these are two things.
> > 
> > So far what we are talking about is whether vt-d should have knowledge about
> > what kind of events one iommu notifier is interested in.  I still think we
> > should keep this as answered in question 1.
> > 
> > The other question is whether we want to switch vhost from UNMAP to 
> > MAP/UNMAP
> > events even without vDMA, so that vhost can establish the mapping even 
> > before
> > IO starts.  IMHO it's doable, but only if the guest runs DPDK workloads.  
> > When
> > the guest is using dynamic iommu page mappings, I feel like that can be even
> > slower, because then the worst case is for each IO we'll need to vmexit 
> > twice:
> > 
> >- The first vmexit caused by an invalidation to MAP the page tables, so 
> > vhost
> >  will setup the page table before IO starts
> > 
> >- IO/DMA triggers and completes
> > 
> >- The second vmexit caused by another invalidation to UNMAP the page 
> > tables
> > 
> > So it seems to be worse than when vhost only uses UNMAP like right now.  At
> > least we only have one vmexit (when UNMAP).  We'll have a vhost translate()
> > request from kernel to userspace, but IMHO that's cheaper than the vmexit.
> 
> 
> Right but then I would still prefer to have another notifier.
> 
> Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a
> dedicated command for flushing device IOTLB. But the check for
> vtd_as_has_map_notifier is used to skip the device which can do demand
> paging via ATS or device specific way. If we have two different notifiers,
> vhost will be on the device iotlb notifier so we don't need it at all?

But we can still have iommu notifier that only registers to UNMAP even after we
introduce dev-iotlb notifier?  We don't want to do page walk for them as well.
TCG should be the only one so far, but I don't know.. maybe there can still be
new ones?

Thanks,

-- 
Peter Xu



Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-08 Thread Peter Xu
On Wed, Jul 08, 2020 at 01:42:30PM +0800, Jason Wang wrote:
> > > So it should be functional equivalent to vtd_as_has_notifier().
> > For example: in vtd_iommu_replay() we'll skip the replay if vhost has
> > registered the iommu notifier because vtd_as_has_map_notifier() will return
> > false.
> 
> 
> Two questions:
> 
> - Do we care the performance here? If not, vhost may just ignore the MAP
> event?

I think we care, because vtd_page_walk() can be expensive.

> - If we care the performance, it's better to implement the MAP event for
> vhost, otherwise it could be a lot of IOTLB miss

I feel like these are two things.

So far what we are talking about is whether vt-d should have knowledge about
what kind of events one iommu notifier is interested in.  I still think we
should keep this as answered in question 1.

The other question is whether we want to switch vhost from UNMAP to MAP/UNMAP
events even without vDMA, so that vhost can establish the mapping even before
IO starts.  IMHO it's doable, but only if the guest runs DPDK workloads.  When
the guest is using dynamic iommu page mappings, I feel like that can be even
slower, because then the worst case is for each IO we'll need to vmexit twice:

  - The first vmexit caused by an invalidation to MAP the page tables, so vhost
will setup the page table before IO starts

  - IO/DMA triggers and completes

  - The second vmexit caused by another invalidation to UNMAP the page tables

So it seems to be worse than when vhost only uses UNMAP like right now.  At
least we only have one vmexit (when UNMAP).  We'll have a vhost translate()
request from kernel to userspace, but IMHO that's cheaper than the vmexit.

-- 
Peter Xu



Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-07 Thread Peter Xu
On Tue, Jul 07, 2020 at 04:03:10PM +0800, Jason Wang wrote:
> 
> On 2020/7/3 下午9:03, Peter Xu wrote:
> > On Fri, Jul 03, 2020 at 03:24:19PM +0800, Jason Wang wrote:
> > > On 2020/7/2 下午11:45, Peter Xu wrote:
> > > > On Thu, Jul 02, 2020 at 11:01:54AM +0800, Jason Wang wrote:
> > > > > So I think we agree that a new notifier is needed?
> > > > Good to me, or a new flag should be easier (IOMMU_NOTIFIER_DEV_IOTLB)?
> > > 
> > > That should work but I wonder something as following is better.
> > > 
> > > Instead of introducing new flags, how about carry the type of event in the
> > > notifier then the device (vhost) can choose the message it want to process
> > > like:
> > > 
> > > static vhost_iommu_event(IOMMUNotifier *n, IOMMUTLBEvent *event)
> > > 
> > > {
> > > 
> > > switch (event->type) {
> > > 
> > > case IOMMU_MAP:
> > > case IOMMU_UNMAP:
> > > case IOMMU_DEV_IOTLB_UNMAP:
> > > ...
> > > 
> > > }
> > Looks ok to me, though imo we should still keep the registration 
> > information,
> > so VT-d knows which notifiers is interested in which events.  E.g., we can
> > still do something like vtd_as_has_map_notifier().
> 
> 
> Is this for a better performance?
> 
> I wonder whether it's worth since we can't not have both vhost and vtd to be
> registered into the same as.

/me failed to follow this sentence.. :(

> 
> So it should be functional equivalent to vtd_as_has_notifier().

For example: in vtd_iommu_replay() we'll skip the replay if vhost has
registered the iommu notifier because vtd_as_has_map_notifier() will return
false.  It'll only return true if the device is a vfio-pci device.

Without vtd_as_has_map_notifier(), how could we do that?

-- 
Peter Xu



Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-03 Thread Peter Xu
On Fri, Jul 03, 2020 at 03:24:19PM +0800, Jason Wang wrote:
> 
> On 2020/7/2 下午11:45, Peter Xu wrote:
> > On Thu, Jul 02, 2020 at 11:01:54AM +0800, Jason Wang wrote:
> > > So I think we agree that a new notifier is needed?
> > Good to me, or a new flag should be easier (IOMMU_NOTIFIER_DEV_IOTLB)?
> 
> 
> That should work but I wonder something as following is better.
> 
> Instead of introducing new flags, how about carry the type of event in the
> notifier then the device (vhost) can choose the message it want to process
> like:
> 
> static vhost_iommu_event(IOMMUNotifier *n, IOMMUTLBEvent *event)
> 
> {
> 
> switch (event->type) {
> 
> case IOMMU_MAP:
> case IOMMU_UNMAP:
> case IOMMU_DEV_IOTLB_UNMAP:
> ...
> 
> }

Looks ok to me, though imo we should still keep the registration information,
so VT-d knows which notifiers is interested in which events.  E.g., we can
still do something like vtd_as_has_map_notifier().

So these are probably two different things: the new IOMMU_NOTIFIER_DEV_IOTLB
flag is one as discussed; the other one is whether we would like to introduce
IOMMUTLBEvent to include the type, so that we can avoid introduce two notifiers
for one device majorly to identify dev-iotlb from unmaps.

Thanks,

-- 
Peter Xu



Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

2020-07-02 Thread Peter Xu
On Thu, Jul 02, 2020 at 11:01:54AM +0800, Jason Wang wrote:
> So I think we agree that a new notifier is needed?

Good to me, or a new flag should be easier (IOMMU_NOTIFIER_DEV_IOTLB)?

-- 
Peter Xu



Re: [libvirt] [PATCH 2/4] q35: set split kernel irqchip as default

2018-12-19 Thread Peter Xu
On Wed, Dec 19, 2018 at 10:45:40PM +0100, Paolo Bonzini wrote:
> On 19/12/18 22:24, Eduardo Habkost wrote:
> > On Wed, Dec 19, 2018 at 09:12:55PM +0100, Paolo Bonzini wrote:
> >> On 19/12/18 09:50, Peter Xu wrote:
> >>> Starting from QEMU 4.0, let's specify "split" as the default value for
> >>> kernel-irqchip.
> >>>
> >>> So for QEMU>=4.0 we'll have: allowed=Y,required=N,split=Y
> >>>for QEMU<=3.1 we'll have: allowed=Y,required=N,split=N
> >>>(omitting all the "kernel_irqchip_" prefix)
> >>>
> >>> Note that this "split" is optional - we'll first try to enable split
> >>> kernel irqchip, and we'll fall back to complete kernel irqchip if we
> >>> found that the kernel capability is missing.
> >>
> >> Please just fail completely and require a new kernel for the 4.0 machine
> >> type.  There are subtle differences between kernel and QEMU irqchip, I
> >> don't think we want to open that can of worms.
> > 
> > This would make existing VMs that are runnable with pc-q35-3.1.0
> > not runnable by only updating the machine-type.
> > 
> > The good news is that we can make this a non-issue by clearly
> > documenting that QEMU needs a more recent kernel (just like we'll
> > do for RDTSCP[1]).
> 
> Right, RDTSCP is exactly what came to mind.

Ok so I think I'll just make it even simpler by dropping patch 1.
Also I noticed that the documentation on linux kernel version
requirement has not yet reached master but I'll assume it'll be there
some day very soon so I'll ignore that part.

Thanks everyone!  I'll repost soon.

-- 
Peter Xu

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v2 4/4] hw/core/machine: Officially deprecate the enforce-config-section parameter

2018-09-20 Thread Peter Xu
On Thu, Sep 20, 2018 at 10:50:16AM +0200, Thomas Huth wrote:
> Commit 16f7244842b5135543ef068a1adafd94c6965953 added this parameter
> to the documentation, including a note that it is deprecated. But it
> has never been added to the "Deprecated features" appendix, which is
> our official way to deprecate legacy parameters. So let's do this now.
> 
> Signed-off-by: Thomas Huth 

Reviewed-by: Peter Xu 

Regards,

-- 
Peter Xu

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCHv3 5/6] conf: add caching attribute to iommu device

2017-04-26 Thread Peter Xu
rgvdata/qemuxml2argv-intel-iommu-caching.xml
> @@ -0,0 +1,50 @@
> +
> +  QEMUGuest1
> +  c7a5fdbd-edaf-9455-926a-d65c16db1809
> +  219100
> +  219100
> +  1
> +  
> +hvm
> +
> +  
> +  
> +  destroy
> +  restart
> +  destroy
> +  
> +/usr/bin/qemu-system-x86_64
> +
> +
> +  
> +   function='0x0'/>
> +
> +
> +  
> +  
> +   function='0x0'/>
> +
> +
> +  
> +  
> +   function='0x0'/>
> +
> +
> +   function='0x2'/>
> +
> +
> +   function='0x7'/>
> +
> +
> +  
> +  
> +   function='0x0'/>
> +
> +
> +
> +
> +
> +  
> +
> +  
> +
> diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-intel-iommu-caching.xml 
> b/tests/qemuxml2xmloutdata/qemuxml2xmlout-intel-iommu-caching.xml
> new file mode 12
> index 000..6935d93
> --- /dev/null
> +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-intel-iommu-caching.xml
> @@ -0,0 +1 @@
> +../qemuxml2argvdata/qemuxml2argv-intel-iommu-caching.xml
> \ No newline at end of file
> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c
> index 40425f5..848da4f 100644
> --- a/tests/qemuxml2xmltest.c
> +++ b/tests/qemuxml2xmltest.c
> @@ -1123,6 +1123,7 @@ mymain(void)
>  QEMU_CAPS_MACHINE_OPT,
>  QEMU_CAPS_MACHINE_IOMMU);
>  DO_TEST("intel-iommu-irqchip", NONE);
> +DO_TEST("intel-iommu-caching", NONE);
>  
>  DO_TEST("cpu-check-none", NONE);
>  DO_TEST("cpu-check-partial", NONE);
> -- 
> 2.10.2
> 

-- 
Peter Xu

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCHv3 1/6] conf: add to

2017-04-26 Thread Peter Xu
TCH_ON;
> +}
> +ctxt->node = node;
> +break;
> +
>  /* coverity[dead_error_begin] */
>  case VIR_DOMAIN_FEATURE_LAST:
>  break;
> @@ -24601,6 +24626,13 @@ virDomainDefFormatInternal(virDomainDefPtr def,
>  }
>  break;
>  
> +case VIR_DOMAIN_FEATURE_IRQCHIP:
> +if (def->features[i] == VIR_TRISTATE_SWITCH_ON) {
> +virBufferAsprintf(buf, "\n",
> +  
> virDomainIRQChipTypeToString(def->irqchip));
> +}
> +break;
> +
>  /* coverity[dead_error_begin] */
>  case VIR_DOMAIN_FEATURE_LAST:
>  break;
> diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h
> index 3b6b174..61af012 100644
> --- a/src/conf/domain_conf.h
> +++ b/src/conf/domain_conf.h
> @@ -1673,6 +1673,7 @@ typedef enum {
>  VIR_DOMAIN_FEATURE_VMPORT,
>  VIR_DOMAIN_FEATURE_GIC,
>  VIR_DOMAIN_FEATURE_SMM,
> +VIR_DOMAIN_FEATURE_IRQCHIP,
>  
>  VIR_DOMAIN_FEATURE_LAST
>  } virDomainFeature;
> @@ -1812,6 +1813,16 @@ struct _virDomainLoaderDef {
>  
>  void virDomainLoaderDefFree(virDomainLoaderDefPtr loader);
>  
> +typedef enum {
> +VIR_DOMAIN_IRQCHIP_OFF = 0,
> +VIR_DOMAIN_IRQCHIP_SPLIT,
> +VIR_DOMAIN_IRQCHIP_ON,
> +
> +VIR_DOMAIN_IRQCHIP_LAST
> +} virDomainIRQChip;
> +
> +VIR_ENUM_DECL(virDomainIRQChip);
> +
>  /* Operating system configuration data & machine / arch */
>  typedef struct _virDomainOSDef virDomainOSDef;
>  typedef virDomainOSDef *virDomainOSDefPtr;
> @@ -2261,6 +2272,7 @@ struct _virDomainDef {
>  unsigned int hyperv_spinlocks;
>  virGICVersion gic_version;
>  char *hyperv_vendor_id;
> +virDomainIRQChip irqchip;
>  
>  /* These options are of type virTristateSwitch: ON = keep, OFF = drop */
>  int caps_features[VIR_DOMAIN_CAPS_FEATURE_LAST];
> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-intel-iommu-irqchip.xml 
> b/tests/qemuxml2argvdata/qemuxml2argv-intel-iommu-irqchip.xml
> new file mode 100644
> index 000..cc895af
> --- /dev/null
> +++ b/tests/qemuxml2argvdata/qemuxml2argv-intel-iommu-irqchip.xml
> @@ -0,0 +1,29 @@
> +
> +  QEMUGuest1
> +  c7a5fdbd-edaf-9455-926a-d65c16db1809
> +  219100
> +  219100
> +  1
> +  
> +hvm
> +
> +  
> +  
> +
> +  
> +  
> +  destroy
> +  restart
> +  destroy
> +  
> +/usr/bin/qemu-system-x86_64
> +
> +
> +   function='0x2'/>
> +
> +
> +
> +
> +    
> +  
> +
> diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-intel-iommu-irqchip.xml 
> b/tests/qemuxml2xmloutdata/qemuxml2xmlout-intel-iommu-irqchip.xml
> new file mode 12
> index 000..58a0199
> --- /dev/null
> +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-intel-iommu-irqchip.xml
> @@ -0,0 +1 @@
> +../qemuxml2argvdata/qemuxml2argv-intel-iommu-irqchip.xml
> \ No newline at end of file
> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c
> index 2dccde7..40425f5 100644
> --- a/tests/qemuxml2xmltest.c
> +++ b/tests/qemuxml2xmltest.c
> @@ -1122,6 +1122,7 @@ mymain(void)
>  DO_TEST("intel-iommu-machine",
>  QEMU_CAPS_MACHINE_OPT,
>  QEMU_CAPS_MACHINE_IOMMU);
> +DO_TEST("intel-iommu-irqchip", NONE);
>  
>  DO_TEST("cpu-check-none", NONE);
>  DO_TEST("cpu-check-partial", NONE);
> -- 
> 2.10.2
> 

-- 
Peter Xu

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v7 3/4] kvm: add kvm_device_supported() helper function

2016-03-23 Thread Peter Xu
This can be used when probing whether KVM support specific device. Here,
a raw vmfd is used.

Signed-off-by: Peter Xu <pet...@redhat.com>
---
 include/sysemu/kvm.h |  9 +
 kvm-all.c| 15 +++
 2 files changed, 24 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 6695fa7..0e18f15 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -306,6 +306,15 @@ void kvm_device_access(int fd, int group, uint64_t attr,
  */
 int kvm_create_device(KVMState *s, uint64_t type, bool test);
 
+/**
+ * kvm_device_supported - probe whether KVM supports specific device
+ *
+ * @vmfd: The fd handler for VM
+ * @type: type of device
+ *
+ * @return: true if supported, otherwise false.
+ */
+bool kvm_device_supported(int vmfd, uint64_t type);
 
 /* Arch specific hooks */
 
diff --git a/kvm-all.c b/kvm-all.c
index 44c0464..e7b66df 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -2339,6 +2339,21 @@ int kvm_create_device(KVMState *s, uint64_t type, bool 
test)
 return test ? 0 : create_dev.fd;
 }
 
+bool kvm_device_supported(int vmfd, uint64_t type)
+{
+struct kvm_create_device create_dev = {
+.type = type,
+.fd = -1,
+.flags = KVM_CREATE_DEVICE_TEST,
+};
+
+if (ioctl(vmfd, KVM_CHECK_EXTENSION, KVM_CAP_DEVICE_CTRL) <= 0) {
+return false;
+}
+
+return (ioctl(vmfd, KVM_CREATE_DEVICE, _dev) >= 0);
+}
+
 int kvm_set_one_reg(CPUState *cs, uint64_t id, void *source)
 {
 struct kvm_one_reg reg;
-- 
2.4.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v7 4/4] arm: implement query-gic-capabilities

2016-03-23 Thread Peter Xu
For emulated GIC capabilities, currently only gicv2 is supported. We
need to add gicv3 in when emulated gicv3 ready. For KVM accelerated ARM
VM, we detect the capability bits by creating a scratch VM.

Signed-off-by: Peter Xu <pet...@redhat.com>
---
 target-arm/monitor.c | 58 +++-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/target-arm/monitor.c b/target-arm/monitor.c
index 4c9bef3..1ee59a2 100644
--- a/target-arm/monitor.c
+++ b/target-arm/monitor.c
@@ -21,8 +21,64 @@
  */
 #include "qemu/osdep.h"
 #include "qmp-commands.h"
+#include "hw/boards.h"
+#include "kvm_arm.h"
+
+static GICCapability *gic_cap_new(int version)
+{
+GICCapability *cap = g_new0(GICCapability, 1);
+cap->version = version;
+/* by default, support none */
+cap->emulated = false;
+cap->kernel = false;
+return cap;
+}
+
+static GICCapabilityList *gic_cap_list_add(GICCapabilityList *head,
+   GICCapability *cap)
+{
+GICCapabilityList *item = g_new0(GICCapabilityList, 1);
+item->value = cap;
+item->next = head;
+return item;
+}
+
+static inline void gic_cap_kvm_probe(GICCapability *v2, GICCapability *v3)
+{
+#ifdef CONFIG_KVM
+int fdarray[3];
+
+if (!kvm_arm_create_scratch_host_vcpu(NULL, fdarray, NULL)) {
+return;
+}
+
+/* Test KVM GICv2 */
+if (kvm_device_supported(fdarray[1], KVM_DEV_TYPE_ARM_VGIC_V2)) {
+v2->kernel = true;
+}
+
+/* Test KVM GICv3 */
+if (kvm_device_supported(fdarray[1], KVM_DEV_TYPE_ARM_VGIC_V3)) {
+v3->kernel = true;
+}
+
+kvm_arm_destroy_scratch_host_vcpu(fdarray);
+#endif
+}
 
 GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
 {
-return NULL;
+GICCapabilityList *head = NULL;
+GICCapability *v2 = gic_cap_new(2), *v3 = gic_cap_new(3);
+
+v2->emulated = true;
+/* TODO: we'd change to true after we get emulated GICv3. */
+v3->emulated = false;
+
+gic_cap_kvm_probe(v2, v3);
+
+head = gic_cap_list_add(head, v2);
+head = gic_cap_list_add(head, v3);
+
+return head;
 }
-- 
2.4.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v7 0/4] ARM: add query-gic-capabilities QMP command

2016-03-23 Thread Peter Xu
v7 changes:
- patch 1
  - add more to commit log, about how to use the results [Markus]
  - change interface from dict back to array [Markus, Eric]
- patch 2
  - kvm_arm_create_scratch_host_vcpu should raise error when init
non-zero, but failed to find a good CPU model [Sergey]
- patch 3
  - rename function to kvm_device_supported [Sergey]
- patch 4
  - comment line from "FIXME" to "TODO" [Sergey]
  - make kvm-related part another function [Sergey]
  - change result to array to follow patch 1's change

v6 changes:
- patch 1 (squashed into patch 2)
  - explain more about the following in commit message: why we need
this command, and what does the entries mean [Markus]
  - explain what GIC is [Eric]
  - squash this patch into patch 2 [Eric]
- patch 2 (new patch 1)
  - fix "does not implement" english error [Eric]
  - remove useless headers in target-arm/monitor.c [Markus]
  - remove this command from whitelist [Markus, Eric]
  - return dict rather than array, with single key points to the
original array results [Markus, Eric]
- patch 5 (new patch 4)
  - tiny change in implementation to suite the new interface.

v5 changes:
- patch 2: moved to target-arm/monitor.c (from target-arm/machine.c)
   [Peter]
- patch 3: splitted into three patches: [all from Peter's comments]
  - patch 3 (new): leverage kvm_arm_create_scratch_host_vcpu(), tiny
enhancement of old one to suite our need
  - patch 4: introduce kvm_support_device() in kvm-all.c
  - patch 5: do the implementation.

v4 changes:
- all: rename query-gic-capability to query-gic-capabilities [Andrea]
- patch 3: rename helper function to kvm_support_device, make it
  inline and lighter. [Drew]

v3 changes:
- patch 2: remove func declaration, add qmp header [Drew]
- patch 3: being able to detect KVM GIC capabilities even without
  kvm enabled [Andrea]: this is a little bit hacky, need some more
  review on this.

v2 changes:
- result layout change: use array and dict for the capability bits
  rather than a single array of strings [Andrea/Markus]
- spelling out what GIC is in doc [Eric]

This patch is to add ARM-specific command "query-gic-capability".

The new command can report which kind of GIC device the host/QEMU
support. The returned result is in the form of array.

Sample command and output:

{"execute": "query-gic-capability"}
{"return": [{"emulated": false, "version": 3, "kernel": false},
{"emulated": true, "version": 2, "kernel": true}]}

Testing:

Smoke tests on both x86 (emulated) and another moonshot ARM server.

Peter Xu (4):
  arm: qmp: add query-gic-capabilities interface
  arm: enhance kvm_arm_create_scratch_host_vcpu
  kvm: add kvm_device_supported() helper function
  arm: implement query-gic-capabilities

 include/sysemu/kvm.h |  9 ++
 kvm-all.c| 15 +
 monitor.c|  8 +
 qapi-schema.json | 36 +
 qmp-commands.hx  | 27 
 target-arm/Makefile.objs |  2 +-
 target-arm/kvm.c | 14 +++-
 target-arm/kvm_arm.h |  6 ++--
 target-arm/monitor.c | 84 
 9 files changed, 197 insertions(+), 4 deletions(-)
 create mode 100644 target-arm/monitor.c

-- 
2.4.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v7 2/4] arm: enhance kvm_arm_create_scratch_host_vcpu

2016-03-23 Thread Peter Xu
Some more lines to make sure we allow NULL for 1st/3rd parameter.

Signed-off-by: Peter Xu <pet...@redhat.com>
---
 target-arm/kvm.c | 14 +-
 target-arm/kvm_arm.h |  6 --
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 969ab0b..bb6935e 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -62,13 +62,17 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
*cpus_to_try,
 goto err;
 }
 
+if (!init) {
+goto finish;
+}
+
 ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, init);
 if (ret >= 0) {
 ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
 if (ret < 0) {
 goto err;
 }
-} else {
+} else if (cpus_to_try) {
 /* Old kernel which doesn't know about the
  * PREFERRED_TARGET ioctl: we know it will only support
  * creating one kind of guest CPU which is its preferred
@@ -85,8 +89,16 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
*cpus_to_try,
 if (ret < 0) {
 goto err;
 }
+} else {
+/*
+ * Here, we got init != NULL but cpus_to_retry == NULL,
+ * meanwhile, we failed to probe one preferred target type
+ * (no matter what). So we fail.
+ */
+goto err;
 }
 
+finish:
 fdarray[0] = kvmfd;
 fdarray[1] = vmfd;
 fdarray[2] = cpufd;
diff --git a/target-arm/kvm_arm.h b/target-arm/kvm_arm.h
index 07f0c72..4f01a99 100644
--- a/target-arm/kvm_arm.h
+++ b/target-arm/kvm_arm.h
@@ -124,9 +124,11 @@ void kvm_arm_reset_vcpu(ARMCPU *cpu);
  * kvm_arm_create_scratch_host_vcpu:
  * @cpus_to_try: array of QEMU_KVM_ARM_TARGET_* values (terminated with
  * QEMU_KVM_ARM_TARGET_NONE) to try as fallback if the kernel does not
- * know the PREFERRED_TARGET ioctl
+ * know the PREFERRED_TARGET ioctl. If NULL is provided, will try
+ * to use perferred target, and fail if no preferred found.
  * @fdarray: filled in with kvmfd, vmfd, cpufd file descriptors in that order
- * @init: filled in with the necessary values for creating a host vcpu
+ * @init: filled in with the necessary values for creating a host
+ * vcpu. If NULL is provided, will not init the vCPU.
  *
  * Create a scratch vcpu in its own VM of the type preferred by the host
  * kernel (as would be used for '-cpu host'), for purposes of probing it
-- 
2.4.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v7 1/4] arm: qmp: add query-gic-capabilities interface

2016-03-23 Thread Peter Xu
This patch add "query-gic-capabilities" but does not implemnet it. The
command is ARM-only. The command will return a list of GICCapability
struct that describes all GIC versions that current QEMU and system
support.

Libvirt is possibly the first consumer of this new command.

Before this patch, user will successfully configure all kinds of GIC
devices for ARM guests, no matter whether current QEMU/kernel support
it. If the specified GIC version/type is not supported, user will got an
ambiguous "QEMU boot failure" when trying to start the VM. This is not
user-friendly.

With this patch, libvirt should be able to query which type (and which
version) of GIC device that we support. Use this information, libvirt
can warn the user during configuration of guests when specified GIC
device type is not supported. Or better, we can just list those versions
that we support, and filter out those not-supported ones.

For example, if we got the query result:

{"return": [{"emulated": false, "version": 3, "kernel": true},
{"emulated": true, "version": 2, "kernel": false}]}

Then it means that we support emulated GIC version 2 using:

  qemu-system-aarch64 -M virt,accel=tcg,gic-version=2 ...

or kvm-accelerated GIC version 3 using:

  qemu-system-aarch64 -M virt,accel=kvm,gic-version=3 ...

If we specify other explicit GIC version rather than the above, QEMU
will not be able to boot.

Besides, the community is working on a more generic way to query these
kind of information. However, due to the eagerness of this command, we
decided to first implement this ad-hoc one, then when the generic method
is ready, we can move on to that one smoothly.

Signed-off-by: Peter Xu <pet...@redhat.com>
---
 monitor.c|  8 
 qapi-schema.json | 36 
 qmp-commands.hx  | 27 +++
 target-arm/Makefile.objs |  2 +-
 target-arm/monitor.c | 28 
 5 files changed, 100 insertions(+), 1 deletion(-)
 create mode 100644 target-arm/monitor.c

diff --git a/monitor.c b/monitor.c
index 4c02f0f..5f2be8b 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4258,3 +4258,11 @@ void qmp_dump_skeys(const char *filename, Error **errp)
 error_setg(errp, QERR_FEATURE_DISABLED, "dump-skeys");
 }
 #endif
+
+#ifndef TARGET_ARM
+GICCapabilityList *qmp_query_gic_capabilities(Error **errp)
+{
+error_setg(errp, QERR_FEATURE_DISABLED, "query-gic-capabilities");
+return NULL;
+}
+#endif
diff --git a/qapi-schema.json b/qapi-schema.json
index 88f9b81..19921d5 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4126,3 +4126,39 @@
 ##
 { 'enum': 'ReplayMode',
   'data': [ 'none', 'record', 'play' ] }
+
+##
+# @GICCapability:
+#
+# The struct describes capability for a specific GIC (Generic
+# Interrupt Controller) version. These bits are not only decided by
+# QEMU/KVM software version, but also decided by the hardware that
+# the program is running upon.
+#
+# @version:  version of GIC to be described. Currently, only 2 and 3
+#are supported.
+#
+# @emulated: whether current QEMU/hardware supports emulated GIC
+#device in user space.
+#
+# @kernel:   whether current QEMU/hardware supports hardware
+#accelerated GIC device in kernel.
+#
+# Since: 2.6
+##
+{ 'struct': 'GICCapability',
+  'data': { 'version': 'int',
+'emulated': 'bool',
+'kernel': 'bool' } }
+
+##
+# @query-gic-capabilities:
+#
+# This command is ARM-only. It will return a list of GICCapability
+# objects that describe its capability bits.
+#
+# Returns: a list of GICCapability objects.
+#
+# Since: 2.6
+##
+{ 'command': 'query-gic-capabilities', 'returns': ['GICCapability'] }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 9e05365..de896a5 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4853,3 +4853,30 @@ Example:
  {"type": 0, "out-pport": 0, "pport": 0, "vlan-id": 3840,
   "pop-vlan": 1, "id": 251658240}
]}
+
+EQMP
+
+#if defined TARGET_ARM
+{
+.name   = "query-gic-capabilities",
+.args_type  = "",
+.mhandler.cmd_new = qmp_marshal_query_gic_capabilities,
+},
+#endif
+
+SQMP
+query-gic-capabilities
+---
+
+Return a list of GICCapability objects, describing supported GIC
+(Generic Interrupt Controller) versions.
+
+Arguments: None
+
+Example:
+
+-> { "execute": "query-gic-capabilities" }
+<- { "return": [{ "version": 2, "emulated": true, "kernel": false },
+{ "version": 3, "emulated": false, "kernel": true } ] }
+
+EQMP
diff --git a/target-arm/Makefile.objs b/target-arm/Makefile.objs
index a80eb39..334074c 1

Re: [libvirt] [Qemu-arm] [PATCH v6 4/4] arm: implement query-gic-capabilities

2016-03-23 Thread Peter Xu
On Wed, Mar 23, 2016 at 03:33:25PM +0300, Sergey Fedorov wrote:
> On 23/03/16 08:32, Peter Xu wrote:
> > diff --git a/target-arm/monitor.c b/target-arm/monitor.c
> > index 254a9c9..4a2db59 100644
> > --- a/target-arm/monitor.c
> > +++ b/target-arm/monitor.c
> > @@ -21,8 +21,66 @@
> (snip)
> >  GICCapabilityResult *qmp_query_gic_capabilities(Error **errp)
> >  {
> > -return NULL;
> > +GICCapabilityResult *result = g_new0(GICCapabilityResult, 1);
> > +GICCapabilityList *head = NULL;
> > +GICCapability *v2 = gic_cap_new(2), *v3 = gic_cap_new(3);
> > +
> > +v2->emulated = true;
> > +/* FIXME: we'd change to true after we get emulated GICv3. */
> 
> Maybewe'd better use'NOTE:' or 'TODO:' instead of 'FIXME:'?

Right.

> 
> > +v3->emulated = false;
> > +
> > +#ifdef CONFIG_KVM
> > +{
> > +int fdarray[3];
> > +
> > +if (!kvm_arm_create_scratch_host_vcpu(NULL, fdarray, NULL)) {
> > +goto out;
> > +}
> > +
> > +/* Test KVM GICv2 */
> > +if (kvm_support_device(fdarray[1], KVM_DEV_TYPE_ARM_VGIC_V2)) {
> > +v2->kernel = true;
> > +}
> > +
> > +/* Test KVM GICv3 */
> > +if (kvm_support_device(fdarray[1], KVM_DEV_TYPE_ARM_VGIC_V3)) {
> > +v3->kernel = true;
> > +}
> > +
> > +kvm_arm_destroy_scratch_host_vcpu(fdarray);
> > +out:
> > +;
> > +}
> > +#endif
> 
> Probably, it would be neater to put KVM part into a separate static
> inline function.

Seems so. Will fix.

Thanks.

-- peterx

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH v6 3/4] kvm: add kvm_support_device() helper function

2016-03-23 Thread Peter Xu
On Wed, Mar 23, 2016 at 06:03:32PM +0300, Sergey Fedorov wrote:
> Yes, but kvm_create_device() returns a file descriptor whereas this
> function is predicative. Personally, I like the convention described in
> chapter 16 of Linux kernel coding style [1]:
> 
>   If the name of a function is an action or an imperative command,
>   the function should return an error-code integer.  If the name
>   is a predicate, the function should return a "succeeded" boolean.

The above is talking about return values. Maybe you were trying to
reference this one as example about add_work() and
pci_dev_present():

For example, "add work" is a command, and the add_work()
function returns 0 for success or -EBUSY for failure.  In the
same way, "PCI device present" is a predicate, and the
pci_dev_present() function returns 1 if it succeeds in finding a
matching device or 0 if it doesn't.

Seems make sense. Will take your advice.  Thanks.

-- peterx

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH v6 3/4] kvm: add kvm_support_device() helper function

2016-03-23 Thread Peter Xu
On Wed, Mar 23, 2016 at 03:28:28PM +0300, Sergey Fedorov wrote:
> On 23/03/16 08:32, Peter Xu wrote:
> > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> > index 6695fa7..8738fa1 100644
> > --- a/include/sysemu/kvm.h
> > +++ b/include/sysemu/kvm.h
> > @@ -306,6 +306,15 @@ void kvm_device_access(int fd, int group, uint64_t 
> > attr,
> >   */
> >  int kvm_create_device(KVMState *s, uint64_t type, bool test);
> >  
> > +/**
> > + * kvm_support_device - probe whether KVM support specific device
> > + *
> > + * @vmfd: The fd handler for VM
> > + * @type: type of device
> > + *
> > + * @return: true if supported, otherwise false.
> > + */
> > +bool kvm_support_device(int vmfd, uint64_t type);
> 
> Why don't name the function like 'kvm_device_supported' to better express its 
> predicative nature?

Because I am trying to follow existing naming style, like:
"kvm_create_device" (please see above).

Thanks.

-- peterx

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-arm] [PATCH v6 2/4] arm: enhance kvm_arm_create_scratch_host_vcpu

2016-03-23 Thread Peter Xu
On Wed, Mar 23, 2016 at 03:24:16PM +0300, Sergey Fedorov wrote:
> On 23/03/16 08:32, Peter Xu wrote:
> > diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> > index 969ab0b..0a7f9a6 100644
> > --- a/target-arm/kvm.c
> > +++ b/target-arm/kvm.c
> > @@ -62,13 +62,17 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
> > *cpus_to_try,
> >  goto err;
> >  }
> >  
> > +if (!init) {
> > +goto finish;
> > +}
> > +
> >  ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, init);
> >  if (ret >= 0) {
> >  ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
> >  if (ret < 0) {
> >  goto err;
> >  }
> > -} else {
> > +} else if (cpus_to_try) {
> >  /* Old kernel which doesn't know about the
> >   * PREFERRED_TARGET ioctl: we know it will only support
> >   * creating one kind of guest CPU which is its preferred
> > @@ -85,8 +89,12 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
> > *cpus_to_try,
> >  if (ret < 0) {
> >  goto err;
> >  }
> > +} else {
> > +/* Not providing cpus_to_try, do nothing. */
> > +;
> 
> I think it's probably not the best idea to skip CPU initialization here.
> I'd rather raise an error in such case. If we supplied non-NULL init
> argument then we need VCPU been initialized, don't we? If we pass NULL
> as init then we actually skip this code.

Though current we will never go into this else... I agree that we
should raise error if stepped into it.

Will fix. Thanks!

-- peterx

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [PATCH v6 0/4] ARM: add query-gic-capabilities SMP command

2016-03-23 Thread Peter Xu
On Wed, Mar 23, 2016 at 11:32:06AM +0100, Christoffer Dall wrote:
> Hi Peter,
> 
> Stupid question: The subject says "SMP command".  Did you mean "QMP
> command" ?

It's for sure a good question... My fault. It should be QMP.

Thanks to point out.

-- peterx

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v6 0/4] ARM: add query-gic-capabilities SMP command

2016-03-23 Thread Peter Xu
On Wed, Mar 23, 2016 at 09:54:03AM +0100, Andrea Bolognani wrote:
> On Wed, 2016-03-23 at 13:32 +0800, Peter Xu wrote:
> > v6 changes:
> > - patch 1 (squashed into patch 2)
> >   - explain more about the following in commit message: why we need
> > this command, and what does the entries mean [Markus]
> >   - explain what GIC is [Eric]
> >   - squash this patch into patch 2 [Eric]
> > - patch 2 (new patch 1)
> >   - fix "does not implement" english error [Eric]
> >   - remove useless headers in target-arm/monitor.c [Markus]
> >   - remove this command from whitelist [Markus, Eric]
> >   - return dict rather than array, with single key points to the
> > original array results [Markus, Eric]
> > - patch 5 (new patch 4)
> >   - tiny change in implementation to suite the new interface.
> 
> Tested again on two different aarch64 hosts, one with GIC
> v3 support and the other one with GIC v2 only; seems to be
> working just fine.
> 
> Now on to tweak my RFC libvirt patches to cope with the
> interface change :)

Andrea, 

Thank you very much for your quick follow-ups (merely every
time). :-)

Hope that the interface (at least) could suffice this time.

-- peterx

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v6 0/4] ARM: add query-gic-capabilities SMP command

2016-03-23 Thread Peter Xu
On Wed, Mar 23, 2016 at 01:32:29PM +0800, Peter Xu wrote:
> The new command can report which kind of GIC device the host/QEMU
> support. The returned result is in the form of array.
> 
> Sample command and output:
> 
> {"execute": "query-gic-capability"}
> {"return": [{"emulated": false, "version": 3, "kernel": false},
> {"emulated": true, "version": 2, "kernel": true}]}

Sorry! This is wrong, which is v5 version of interface.

The new interface should return a dict rather than array:

-> { "execute": "query-gic-capabilities" }
<- { "return": { "capabilities": [
{ "version": 2, "emulated": true, "kernel": false },
{ "version": 3, "emulated": false, "kernel": true } ] } }

Thanks.

-- peterx

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v6 0/4] ARM: add query-gic-capabilities SMP command

2016-03-22 Thread Peter Xu
v6 changes:
- patch 1 (squashed into patch 2)
  - explain more about the following in commit message: why we need
this command, and what does the entries mean [Markus]
  - explain what GIC is [Eric]
  - squash this patch into patch 2 [Eric]
- patch 2 (new patch 1)
  - fix "does not implement" english error [Eric]
  - remove useless headers in target-arm/monitor.c [Markus]
  - remove this command from whitelist [Markus, Eric]
  - return dict rather than array, with single key points to the
original array results [Markus, Eric]
- patch 5 (new patch 4)
  - tiny change in implementation to suite the new interface.

v5 changes:
- patch 2: moved to target-arm/monitor.c (from target-arm/machine.c)
   [Peter]
- patch 3: splitted into three patches: [all from Peter's comments]
  - patch 3 (new): leverage kvm_arm_create_scratch_host_vcpu(), tiny
enhancement of old one to suite our need
  - patch 4: introduce kvm_support_device() in kvm-all.c
  - patch 5: do the implementation.

v4 changes:
- all: rename query-gic-capability to query-gic-capabilities [Andrea]
- patch 3: rename helper function to kvm_support_device, make it
  inline and lighter. [Drew]

v3 changes:
- patch 2: remove func declaration, add qmp header [Drew]
- patch 3: being able to detect KVM GIC capabilities even without
  kvm enabled [Andrea]: this is a little bit hacky, need some more
  review on this.

v2 changes:
- result layout change: use array and dict for the capability bits
  rather than a single array of strings [Andrea/Markus]
- spelling out what GIC is in doc [Eric]

This patch is to add ARM-specific command "query-gic-capability".

The new command can report which kind of GIC device the host/QEMU
support. The returned result is in the form of array.

Sample command and output:

{"execute": "query-gic-capability"}
{"return": [{"emulated": false, "version": 3, "kernel": false},
{"emulated": true, "version": 2, "kernel": true}]}

Testing:

Smoke tests on both x86 (emulated) and another moonshot ARM server.

Peter Xu (4):
  arm: qmp: add query-gic-capabilities interface
  arm: enhance kvm_arm_create_scratch_host_vcpu
  kvm: add kvm_support_device() helper function
  arm: implement query-gic-capabilities

 include/sysemu/kvm.h |  9 +
 kvm-all.c| 15 +
 monitor.c|  8 +
 qapi-schema.json | 50 
 qmp-commands.hx  | 28 
 target-arm/Makefile.objs |  2 +-
 target-arm/kvm.c | 10 +-
 target-arm/kvm_arm.h |  6 ++--
 target-arm/monitor.c | 86 
 9 files changed, 210 insertions(+), 4 deletions(-)
 create mode 100644 target-arm/monitor.c

-- 
2.4.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v6 2/4] arm: enhance kvm_arm_create_scratch_host_vcpu

2016-03-22 Thread Peter Xu
Some more lines to make sure we allow NULL for 1st/3rd parameter.

Signed-off-by: Peter Xu <pet...@redhat.com>
---
 target-arm/kvm.c | 10 +-
 target-arm/kvm_arm.h |  6 --
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/target-arm/kvm.c b/target-arm/kvm.c
index 969ab0b..0a7f9a6 100644
--- a/target-arm/kvm.c
+++ b/target-arm/kvm.c
@@ -62,13 +62,17 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
*cpus_to_try,
 goto err;
 }
 
+if (!init) {
+goto finish;
+}
+
 ret = ioctl(vmfd, KVM_ARM_PREFERRED_TARGET, init);
 if (ret >= 0) {
 ret = ioctl(cpufd, KVM_ARM_VCPU_INIT, init);
 if (ret < 0) {
 goto err;
 }
-} else {
+} else if (cpus_to_try) {
 /* Old kernel which doesn't know about the
  * PREFERRED_TARGET ioctl: we know it will only support
  * creating one kind of guest CPU which is its preferred
@@ -85,8 +89,12 @@ bool kvm_arm_create_scratch_host_vcpu(const uint32_t 
*cpus_to_try,
 if (ret < 0) {
 goto err;
 }
+} else {
+/* Not providing cpus_to_try, do nothing. */
+;
 }
 
+finish:
 fdarray[0] = kvmfd;
 fdarray[1] = vmfd;
 fdarray[2] = cpufd;
diff --git a/target-arm/kvm_arm.h b/target-arm/kvm_arm.h
index 07f0c72..6bcfe6c 100644
--- a/target-arm/kvm_arm.h
+++ b/target-arm/kvm_arm.h
@@ -124,9 +124,11 @@ void kvm_arm_reset_vcpu(ARMCPU *cpu);
  * kvm_arm_create_scratch_host_vcpu:
  * @cpus_to_try: array of QEMU_KVM_ARM_TARGET_* values (terminated with
  * QEMU_KVM_ARM_TARGET_NONE) to try as fallback if the kernel does not
- * know the PREFERRED_TARGET ioctl
+ * know the PREFERRED_TARGET ioctl. If NULL is provided, will try
+ * nothing.
  * @fdarray: filled in with kvmfd, vmfd, cpufd file descriptors in that order
- * @init: filled in with the necessary values for creating a host vcpu
+ * @init: filled in with the necessary values for creating a host
+ * vcpu. If NULL is provided, will not init the vCPU.
  *
  * Create a scratch vcpu in its own VM of the type preferred by the host
  * kernel (as would be used for '-cpu host'), for purposes of probing it
-- 
2.4.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v6 3/4] kvm: add kvm_support_device() helper function

2016-03-22 Thread Peter Xu
This can be used when probing whether KVM support specific device. Here,
a raw vmfd is used.

Signed-off-by: Peter Xu <pet...@redhat.com>
---
 include/sysemu/kvm.h |  9 +
 kvm-all.c| 15 +++
 2 files changed, 24 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 6695fa7..8738fa1 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -306,6 +306,15 @@ void kvm_device_access(int fd, int group, uint64_t attr,
  */
 int kvm_create_device(KVMState *s, uint64_t type, bool test);
 
+/**
+ * kvm_support_device - probe whether KVM support specific device
+ *
+ * @vmfd: The fd handler for VM
+ * @type: type of device
+ *
+ * @return: true if supported, otherwise false.
+ */
+bool kvm_support_device(int vmfd, uint64_t type);
 
 /* Arch specific hooks */
 
diff --git a/kvm-all.c b/kvm-all.c
index 44c0464..77deccc 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -2339,6 +2339,21 @@ int kvm_create_device(KVMState *s, uint64_t type, bool 
test)
 return test ? 0 : create_dev.fd;
 }
 
+bool kvm_support_device(int vmfd, uint64_t type)
+{
+struct kvm_create_device create_dev = {
+.type = type,
+.fd = -1,
+.flags = KVM_CREATE_DEVICE_TEST,
+};
+
+if (ioctl(vmfd, KVM_CHECK_EXTENSION, KVM_CAP_DEVICE_CTRL) <= 0) {
+return false;
+}
+
+return (ioctl(vmfd, KVM_CREATE_DEVICE, _dev) >= 0);
+}
+
 int kvm_set_one_reg(CPUState *cs, uint64_t id, void *source)
 {
 struct kvm_one_reg reg;
-- 
2.4.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [PATCH v6 1/4] arm: qmp: add query-gic-capabilities interface

2016-03-22 Thread Peter Xu
This patch add "query-gic-capabilities" but does not implemnet it. The
command is ARM-only. The command will return one GICCapabilityResult
object, which contains a array of GICCapability struct that describes
all GIC versions that current QEMU and system support.

Libvirt is possibly the first consumer of this new command.

Before this patch, user will successfully configure all kinds of GIC
devices for ARM guests, no matter whether current QEMU/kernel support
it. If the specified GIC version/type is not supported, user will got an
ambiguous "QEMU boot failure" when trying to start the VM. This is not
user-friendly.

With this patch, libvirt should be able to query which type (and which
version) of GIC device that we support. Use this information, libvirt
can warn the user during configuration of guests when specified GIC
device type is not supported. Or better, we can just list those versions
that we support, and filter out those not-supported ones.

Besides, the community is working on a more generic way to query these
kind of information. However, due to the eagerness of this command, we
decided to first implement this ad-hoc one, then when the generic method
is ready, we can move on to that one smoothly.

Signed-off-by: Peter Xu <pet...@redhat.com>
---
 monitor.c|  8 
 qapi-schema.json | 50 
 qmp-commands.hx  | 28 +++
 target-arm/Makefile.objs |  2 +-
 target-arm/monitor.c | 28 +++
 5 files changed, 115 insertions(+), 1 deletion(-)
 create mode 100644 target-arm/monitor.c

diff --git a/monitor.c b/monitor.c
index 4c02f0f..3845213 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4258,3 +4258,11 @@ void qmp_dump_skeys(const char *filename, Error **errp)
 error_setg(errp, QERR_FEATURE_DISABLED, "dump-skeys");
 }
 #endif
+
+#ifndef TARGET_ARM
+GICCapabilityResult *qmp_query_gic_capabilities(Error **errp)
+{
+error_setg(errp, QERR_FEATURE_DISABLED, "query-gic-capabilities");
+return NULL;
+}
+#endif
diff --git a/qapi-schema.json b/qapi-schema.json
index 88f9b81..72359dd 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4126,3 +4126,53 @@
 ##
 { 'enum': 'ReplayMode',
   'data': [ 'none', 'record', 'play' ] }
+
+##
+# @GICCapability:
+#
+# The struct describes capability for a specific GIC (Generic
+# Interrupt Controller) version. These bits are not only decided by
+# QEMU/KVM software version, but also decided by the hardware that
+# the program is running upon.
+#
+# @version:  version of GIC to be described. Currently, only 2 and 3
+#are supported.
+#
+# @emulated: whether current QEMU/hardware supports emulated GIC
+#device in user space.
+#
+# @kernel:   whether current QEMU/hardware supports hardware
+#accelerated GIC device in kernel.
+#
+# Since: 2.6
+##
+{ 'struct': 'GICCapability',
+  'data': { 'version': 'int',
+'emulated': 'bool',
+'kernel': 'bool' } }
+
+##
+# @GICCapabilityResult:
+#
+# Query result for GIC capabilities.
+#
+# @data:  currently with a single key 'capabilities', which contains
+# a list of GICCapability elements, describing all supported
+# versions/kinds of GIC devices.
+#
+# Since: 2.6
+##
+{ 'struct': 'GICCapabilityResult',
+  'data': { 'capabilities': ['GICCapability'] } }
+
+##
+# @query-gic-capabilities:
+#
+# This command is ARM-only. It will return a GICCapabilityResult
+# object that describes its capability bits.
+#
+# Returns: GICCapabilityResult object.
+#
+# Since: 2.6
+##
+{ 'command': 'query-gic-capabilities', 'returns': 'GICCapabilityResult' }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 9e05365..4144982 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4853,3 +4853,31 @@ Example:
  {"type": 0, "out-pport": 0, "pport": 0, "vlan-id": 3840,
   "pop-vlan": 1, "id": 251658240}
]}
+
+EQMP
+
+#if defined TARGET_ARM
+{
+.name   = "query-gic-capabilities",
+.args_type  = "",
+.mhandler.cmd_new = qmp_marshal_query_gic_capabilities,
+},
+#endif
+
+SQMP
+query-gic-capabilities
+---
+
+Return a GICCapabilityResult object, which contains a list of
+GICCapability elements, describing supported GIC versions.
+
+Arguments: None
+
+Example:
+
+-> { "execute": "query-gic-capabilities" }
+<- { "return": { "capabilities": [
+{ "version": 2, "emulated": true, "kernel": false },
+{ "version": 3, "emulated": false, "kernel": true } ] } }
+
+EQMP
diff --git a/target-arm/Makefile.objs b/target-arm/Makefile.objs
index a80eb39..334074c 100644
--- a/target-arm/Makefile.objs
+++ b/target-arm/Makefile.objs

[libvirt] [PATCH v6 4/4] arm: implement query-gic-capabilities

2016-03-22 Thread Peter Xu
For emulated GIC capabilities, currently only gicv2 is supported. We
need to add gicv3 in when emulated gicv3 ready. For KVM accelerated ARM
VM, we detect the capability bits by creating a scratch VM.

Signed-off-by: Peter Xu <pet...@redhat.com>
---
 target-arm/monitor.c | 60 +++-
 1 file changed, 59 insertions(+), 1 deletion(-)

diff --git a/target-arm/monitor.c b/target-arm/monitor.c
index 254a9c9..4a2db59 100644
--- a/target-arm/monitor.c
+++ b/target-arm/monitor.c
@@ -21,8 +21,66 @@
  */
 #include "qemu/osdep.h"
 #include "qmp-commands.h"
+#include "hw/boards.h"
+#include "kvm_arm.h"
+
+static GICCapability *gic_cap_new(int version)
+{
+GICCapability *cap = g_new0(GICCapability, 1);
+cap->version = version;
+/* by default, support none */
+cap->emulated = false;
+cap->kernel = false;
+return cap;
+}
+
+static GICCapabilityList *gic_cap_list_add(GICCapabilityList *head,
+   GICCapability *cap)
+{
+GICCapabilityList *item = g_new0(GICCapabilityList, 1);
+item->value = cap;
+item->next = head;
+return item;
+}
 
 GICCapabilityResult *qmp_query_gic_capabilities(Error **errp)
 {
-return NULL;
+GICCapabilityResult *result = g_new0(GICCapabilityResult, 1);
+GICCapabilityList *head = NULL;
+GICCapability *v2 = gic_cap_new(2), *v3 = gic_cap_new(3);
+
+v2->emulated = true;
+/* FIXME: we'd change to true after we get emulated GICv3. */
+v3->emulated = false;
+
+#ifdef CONFIG_KVM
+{
+int fdarray[3];
+
+if (!kvm_arm_create_scratch_host_vcpu(NULL, fdarray, NULL)) {
+goto out;
+}
+
+/* Test KVM GICv2 */
+if (kvm_support_device(fdarray[1], KVM_DEV_TYPE_ARM_VGIC_V2)) {
+v2->kernel = true;
+}
+
+/* Test KVM GICv3 */
+if (kvm_support_device(fdarray[1], KVM_DEV_TYPE_ARM_VGIC_V3)) {
+v3->kernel = true;
+}
+
+kvm_arm_destroy_scratch_host_vcpu(fdarray);
+out:
+;
+}
+#endif
+
+head = gic_cap_list_add(head, v2);
+head = gic_cap_list_add(head, v3);
+
+result->capabilities = head;
+
+return result;
 }
-- 
2.4.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [PATCH v5 0/5] ARM: add query-gic-capabilities SMP command

2016-03-21 Thread Peter Xu
On Mon, Mar 21, 2016 at 04:56:07PM +0100, Andrea Bolognani wrote:
> On Fri, 2016-03-18 at 11:27 +0800, Peter Xu wrote:
> > v5 changes:
> > - patch 2: moved to target-arm/monitor.c (from target-arm/machine.c)
> >    [Peter]
> > - patch 3: splitted into three patches: [all from Peter's comments]
> >   - patch 3 (new): leverage kvm_arm_create_scratch_host_vcpu(), tiny
> > enhancement of old one to suite our need
> >   - patch 4: introduce kvm_support_device() in kvm-all.c
> >   - patch 5: do the implementation.
> 
> Tested on two separate aarch64 hosts, seems to work fine.

Thanks Andrea!

Sorry I forgot to CC libvir-list. To avoid duplication, not
re-sending but CCing in this reply. If anyone interested, please
review in the following link:

https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg04465.html

(which points to exactly current thread.)

Sorry for the inconvenience!

-- peterx

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [RFC PATCH 0/2] ARM: add QMP command to query GIC version

2016-02-29 Thread Peter Xu
On Mon, Feb 29, 2016 at 05:30:36PM +0100, Andrea Bolognani wrote:
> On Mon, 2016-02-22 at 09:35 +0800, Peter Xu wrote:
> > On Fri, Feb 19, 2016 at 01:33:09PM +0100, Andrea Bolognani wrote:
> > > 
> > > I didn't say it would be hard :)
> > > 
> > > I just said that such compatibility code would have to be kept
> > > around forever. We already support lots and lots of similar cases
> > > in libvirt, the difference being that in this case we would add
> > > support for a new command *knowing in advance* that it will become
> > > obsolete as soon as a proper implementation is available.
> > > 
> > > It might still be the right thing to do! I just want to make sure
> > > everything's been properly considered and discussed beforehand.
> > 
> > I totally agree with you to think more before doing. :)
> 
> So, anyone else willing to give their $0.2 on how to implement
> this The Right Way™?
> 
> I just skimmed the whole thread again and it doesn't look to me
> like any consensus has been reached.

Hi, Andrea,

I have sent another non-rfc patchset about this already:

https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg05270.html

I am not sure whether you have received it or not (you should be in
the CC list :), just pasting it again here. Do you think this would
work for us?

Btw, the thread is receiving none reviews till now. Looking forward
to have any of your further comments!

Thanks!
Peter

> 
> Cheers.
> 
> -- 
> Andrea Bolognani
> Software Engineer - Virtualization Team

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [RFC PATCH 0/2] ARM: add QMP command to query GIC version

2016-02-21 Thread Peter Xu
On Fri, Feb 19, 2016 at 01:33:09PM +0100, Andrea Bolognani wrote:
> I didn't say it would be hard :)
> 
> I just said that such compatibility code would have to be kept
> around forever. We already support lots and lots of similar cases
> in libvirt, the difference being that in this case we would add
> support for a new command *knowing in advance* that it will become
> obsolete as soon as a proper implementation is available.
> 
> It might still be the right thing to do! I just want to make sure
> everything's been properly considered and discussed beforehand.

I totally agree with you to think more before doing. :)

Then I will try to move on. Appreciate for all the review comments!

Peter

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [RFC PATCH 0/2] ARM: add QMP command to query GIC version

2016-02-18 Thread Peter Xu
On Thu, Feb 18, 2016 at 06:10:21PM +0100, Andrea Bolognani wrote:
> On Thu, 2016-02-18 at 17:52 +0100, Andrew Jones wrote:
> > > Is this work on any of our todo list (or anyone has started the
> > > prototyping)?
> > > 
> > > It seems reasonable to provide such a generic interface, rather than
> > > adding a "query-gic-capability" for GIC versions only. The problem
> > > is that, I am not sure how eagerly we are wanting this GIC
> > > interface, and when will this framework be there in QOM.
> > 
> > We want it eagerly :-) This type of a rabbit hole is likely why Daniel
> > was suggesting we do more in libvirt. I'm still not sure we want to
> > probe both kvm and qemu from libvirt though, so I'm still in favor of
> > an improved qemu probing method being worked out.
> > 
> > I don't know what the policy is for deprecating QMP commands, but I
> > wonder if we can't introduce a QMP command now, and then, after working
> > out the QOM extensions, we could shift to it, deprecating this QMP
> > command and any others that would no longer be needed.
> 
> AFAIK, the current situation of libvirt passing the GIC version to
> QEMU and simply reporting in case of failure is not unprecedented
> and there are a few cases where probing in advance would simply not
> be feasible.
> 
> Any probing code added to libvirt would have to be kept around
> forever to ensure compatibility with current QEMU versions, so it
> should IMHO be seen as a last resort in case we can't live without
> GIC version probing while it's being implemented, properly, in QEMU.

If libvirt is the most possible consumer for the new command, I
think it might not be too hard to keep the compatibility of all
possible versions of QEMU. E.g., after we have got a better way to
query GIC version other than query-gic-capability, we can do
something like this in libvirt:

- try query-gic-capability
  - if supported -> [got GIC version]
  - if not supported -> try the new method
- if supported -> [got GIC version]
- if not supported -> [not support]

During the time when QEMU has both methods working (before
obsoleting the query-gic-capability QMP command), QEMU will make
sure querying in both way will get exactly the same results.

Does this work?

Thanks.
Peter

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [Qemu-devel] [RFC PATCH 0/2] ARM: add QMP command to query GIC version

2016-02-17 Thread Peter Xu
On Mon, Feb 15, 2016 at 03:22:05PM +, Daniel P. Berrange wrote:
> On Mon, Feb 15, 2016 at 04:08:33PM +0100, Markus Armbruster wrote:
> > Peter Xu <pet...@redhat.com> writes:
> > 
> > > On Mon, Feb 15, 2016 at 10:52:01AM +0100, Markus Armbruster wrote:
> > >> Peter Xu <pet...@redhat.com> writes:
> > >> 
> > >> > For ARM platform, we still do not have any interface to query
> > >> > whether current QEMU/host support specific GIC version. This
> > >> > patchset is trying to add one QMP interface for that. By querying
> > >> > the GIC capability using the new interface, one should know exactly
> > >> > what GIC version(s) the platform will support. The capability bits
> > >> > will be decided by both QEMU and host kernel.
> > >> >
> > >> > The current patchset only provides interface for review. Its handler
> > >> > is a fake one which returns empty always.
> > >> >
> > >> > The command interface I am planning to add is something like this:
> > >> >
> > >> > -> { "execute": "query-gic-capability" }
> > >> > <- { "return": [ "gicv2", "gicv2-kvm", "gicv3-kvm" ] }
> > >> >
> > >> > Currently, all the possible supported GIC versions are:
> > >> >
> > >> > - gicv2:  GIC version 2 without kernel IRQ chip
> > >> > - gicv2-kvm:  GIC version 2 with kernel IRQ chip
> > >> > - gicv3:  GIC version 3 without kernel IRQ chip (not supported)
> > >> > - gicv3-kvm:  GIC version 3 with kernel IRQ chip
> > >> >
> > >> > Since "gicv3" is still not supported (to use GICv3, kernel irqchip
> > >> > support is required for now, which corresponds to "gicv3-kvm"),
> > >> > currently the maximum superset of the result should be:
> > >> >
> > >> > ["gicv2", "gicv2-kvm", "gicv3-kvm"]
> > >> >
> > >> > Please help review whether the interface suits our need, also please
> > >> > point out any error I have made.
> > >> 
> > >> Adding ad hoc queries as we go won't scale.  Is there really no generic
> > >> way to get this information, e.g. with qom-get?
> > >
> > > Haven't used "qom-get" before, but it seems to fetch one property
> > > for a specific object. If so, will it be strange to hide some
> > > capability bits into every GIC objects (though there is possibly
> > > one object)?
> > 
> > Pardon my ignorance...  what are these "GIC objects"?
> > 
> > What exactly is the "GIC type", and how would the result of
> > query-gic-capability be used?
> > 
> > > I agree that we should keep the interface as simple as
> > > possible. I see that there are already commands that works just like
> > > this one, which is to query some capabilities from QEMU, like:
> > >
> > > - query-dump-guest-memory-capability
> > > - query-migrate-capabilities
> > 
> > I'm not saying we must not add another ad hoc query.  I'm trying to
> > figure out whether existing generic infrastructure can serve, or be
> > adapted to serve.  Once we establish the answer is "no" or "badly", I'm
> > willing to consider the ad hoc query.
> 
> Looking at this from the POV of solving the generic problem, what we
> have here is an object with an integer property, for which only certain
> values are permitted based on what host kernel/hardware we're runing
> on.
> 
> So to solve this generically, we would need a way to provide information
> in QOM as to what permitted values are for any given property. This would
> make sense for at least bool, int and enum properties, since they can all
> potentially have scenarios where the possible range of values is greater
> than the currently permissible range of values.

Is this work on any of our todo list (or anyone has started the
prototyping)?

It seems reasonable to provide such a generic interface, rather than
adding a "query-gic-capability" for GIC versions only. The problem
is that, I am not sure how eagerly we are wanting this GIC
interface, and when will this framework be there in QOM.

Thanks.
Peter

> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [Qemu-devel] [RFC PATCH 0/2] ARM: add QMP command to query GIC version

2016-02-15 Thread Peter Xu
On Mon, Feb 15, 2016 at 10:52:01AM +0100, Markus Armbruster wrote:
> Peter Xu <pet...@redhat.com> writes:
> 
> > For ARM platform, we still do not have any interface to query
> > whether current QEMU/host support specific GIC version. This
> > patchset is trying to add one QMP interface for that. By querying
> > the GIC capability using the new interface, one should know exactly
> > what GIC version(s) the platform will support. The capability bits
> > will be decided by both QEMU and host kernel.
> >
> > The current patchset only provides interface for review. Its handler
> > is a fake one which returns empty always.
> >
> > The command interface I am planning to add is something like this:
> >
> > -> { "execute": "query-gic-capability" }
> > <- { "return": [ "gicv2", "gicv2-kvm", "gicv3-kvm" ] }
> >
> > Currently, all the possible supported GIC versions are:
> >
> > - gicv2:  GIC version 2 without kernel IRQ chip
> > - gicv2-kvm:  GIC version 2 with kernel IRQ chip
> > - gicv3:  GIC version 3 without kernel IRQ chip (not supported)
> > - gicv3-kvm:  GIC version 3 with kernel IRQ chip
> >
> > Since "gicv3" is still not supported (to use GICv3, kernel irqchip
> > support is required for now, which corresponds to "gicv3-kvm"),
> > currently the maximum superset of the result should be:
> >
> > ["gicv2", "gicv2-kvm", "gicv3-kvm"]
> >
> > Please help review whether the interface suits our need, also please
> > point out any error I have made.
> 
> Adding ad hoc queries as we go won't scale.  Is there really no generic
> way to get this information, e.g. with qom-get?

Haven't used "qom-get" before, but it seems to fetch one property
for a specific object. If so, will it be strange to hide some
capability bits into every GIC objects (though there is possibly
one object)?

I agree that we should keep the interface as simple as
possible. I see that there are already commands that works just like
this one, which is to query some capabilities from QEMU, like:

- query-dump-guest-memory-capability
- query-migrate-capabilities

So... besides the original proposal, what about adding a generic QMP
command to query all kinds of capabilities (and let GIC be the first
item)? Or any other way to avoid adding a new command?

Thanks.
Peter

> 
> > One question: how should I make this command "ARM only"? I see that
> > in qmp-commands.hx, I can use something like "#if defined
> > TARGET_ARM" to block out ARM specified commands, however how should
> > I do the similiar thing in qapi-schema.json?
> 
> Have a look at the #if in qmp-commands.hx.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH 0/2] ARM: add QMP command to query GIC version

2016-02-15 Thread Peter Xu
On Mon, Feb 15, 2016 at 10:35:39AM +0100, Martin Kletzander wrote:
> On Sun, Feb 14, 2016 at 01:41:41PM +0800, Peter Xu wrote:
> >
> >["gicv2", "gicv2-kvm", "gicv3-kvm"]
> >
> >Please help review whether the interface suits our need, also please
> >point out any error I have made.
> >
> 
> This looks nice.  I have some questions, but I'm not an expert in this
> area, so excuse me if they are stupid.
> 
> So hardware itself supports some GIC version, let's say 3 for our case.
> Does that mean it can be triggered to do v2 as well?  I mean is it
> possible that HW supports multiple versions?  If yes, then I suspect
> there is (will be) HW that does *not* do it and that's where QEMU (or
> KVM) must emulate that version.  If all I'm having in mind is true, then
> you are trying to reply with two orthogonal types of information. A)
> versions kernel/HW supports and B) qemu/kvm can emulate.  That is fine
> if libvirt can choose all the versions specified (e.g. gicv2-kvm,
> gicv2), but if we can only select a version, then it might be worth just
> returning those two types of information separately, e.g.:
> 
> ["v2": {"emulated": true, "kvm":true}, "v3": {"emulated": false, "kvm": true}]

Yes, I'd say that this is more clear especially when the matrix is
very big. Luckily for GIC versions, there is only 2x2 (2x3 if there
is v4) and it keeps a small one. So IMHO this is a flavor issue and
both work for me. :)

Thanks.
Peter

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC PATCH 0/2] ARM: add QMP command to query GIC version

2016-02-15 Thread Peter Xu
On Mon, Feb 15, 2016 at 12:54:57AM -0600, Wei Huang wrote:
> On 2/13/16 23:41, Peter Xu wrote:
> > Please help review whether the interface suits our need, also please
> > point out any error I have made.
> 
> I tested QEMU with these patches and they were able to work on a native
> ARM64 machine.

Thanks for the verification. :)

> 
> > 
> > One question: how should I make this command "ARM only"? I see that
> > in qmp-commands.hx, I can use something like "#if defined
> > TARGET_ARM" to block out ARM specified commands, however how should
> > I do the similiar thing in qapi-schema.json?
> 
> This situation is similar to "rtc-reset-reinjection", right? I think You
> can disable this feature on other arch's by setting
> QERR_FEATURE_DISABLED in file like monitor.c.

Yes. Will include this when post the real patches.

Thanks!
Peter

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [RFC PATCH 1/2] arm: gic: add GICType

2016-02-15 Thread Peter Xu
A new enum type is added to define ARM GIC types.

Signed-off-by: Peter Xu <pet...@redhat.com>
---
 qapi-schema.json | 17 +
 1 file changed, 17 insertions(+)

diff --git a/qapi-schema.json b/qapi-schema.json
index 8d04897..81654bd 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4083,3 +4083,20 @@
 ##
 { 'enum': 'ReplayMode',
   'data': [ 'none', 'record', 'play' ] }
+
+##
+# @GICType:
+#
+# An enumeration of GIC types
+#
+# @gicv2: GICv2 support without kernel irqchip
+#
+# @gicv3: GICv3 support without kernel irqchip
+#
+# @gicv2-kvm: GICv3 support with kernel irqchip
+#
+# @gicv3-kvm: GICv3 support with kernel irqchip
+#
+# Since: 2.6
+##
+{ 'enum': 'GICType', 'data': [ 'gicv2', 'gicv3', 'gicv2-kvm', 'gicv3-kvm' ] }
-- 
2.4.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [RFC PATCH 2/2] arm: gic: add "query-gic-capability" interface

2016-02-15 Thread Peter Xu
Signed-off-by: Peter Xu <pet...@redhat.com>
---
 qapi-schema.json | 11 +++
 qmp-commands.hx  | 25 +
 qmp.c|  5 +
 scripts/qapi.py  |  1 +
 4 files changed, 42 insertions(+)

diff --git a/qapi-schema.json b/qapi-schema.json
index 81654bd..7e3b8cd 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -4100,3 +4100,14 @@
 # Since: 2.6
 ##
 { 'enum': 'GICType', 'data': [ 'gicv2', 'gicv3', 'gicv2-kvm', 'gicv3-kvm' ] }
+
+##
+# @query-gic-capability:
+#
+# Return a list of supported GIC types
+#
+# Returns: a list of GICType
+#
+# Since: 2.6
+##
+{ 'command': 'query-gic-capability', 'returns': ['GICType'] }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 020e5ee..55930d3 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -4818,3 +4818,28 @@ Example:
  {"type": 0, "out-pport": 0, "pport": 0, "vlan-id": 3840,
   "pop-vlan": 1, "id": 251658240}
]}
+
+EQMP
+
+#if defined TARGET_ARM
+{
+.name   = "query-gic-capability",
+.args_type  = "",
+.mhandler.cmd_new = qmp_marshal_query_gic_capability,
+},
+#endif
+
+SQMP
+query-gic-capability
+---
+
+Return a list of supported ARM GIC types.
+
+Arguments: None
+
+Example:
+
+-> { "execute": "query-gic-capability" }
+<- { "return": [ "gicv2", "gicv2-kvm", "gicv3-kvm" ] }
+
+EQMP
diff --git a/qmp.c b/qmp.c
index 6ae7230..9ad74c7 100644
--- a/qmp.c
+++ b/qmp.c
@@ -753,3 +753,8 @@ ACPIOSTInfoList *qmp_query_acpi_ospm_status(Error **errp)
 
 return head;
 }
+
+GICTypeList *qmp_query_gic_capability(Error **errp)
+{
+return NULL;
+}
diff --git a/scripts/qapi.py b/scripts/qapi.py
index 7dec611..f4900a1 100644
--- a/scripts/qapi.py
+++ b/scripts/qapi.py
@@ -46,6 +46,7 @@ returns_whitelist = [
 'query-tpm-models',
 'query-tpm-types',
 'ringbuf-read',
+'query-gic-capability',
 
 # From QGA:
 'guest-file-open',
-- 
2.4.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


[libvirt] [RFC PATCH 0/2] ARM: add QMP command to query GIC version

2016-02-15 Thread Peter Xu
For ARM platform, we still do not have any interface to query
whether current QEMU/host support specific GIC version. This
patchset is trying to add one QMP interface for that. By querying
the GIC capability using the new interface, one should know exactly
what GIC version(s) the platform will support. The capability bits
will be decided by both QEMU and host kernel.

The current patchset only provides interface for review. Its handler
is a fake one which returns empty always.

The command interface I am planning to add is something like this:

-> { "execute": "query-gic-capability" }
<- { "return": [ "gicv2", "gicv2-kvm", "gicv3-kvm" ] }

Currently, all the possible supported GIC versions are:

- gicv2:  GIC version 2 without kernel IRQ chip
- gicv2-kvm:  GIC version 2 with kernel IRQ chip
- gicv3:  GIC version 3 without kernel IRQ chip (not supported)
- gicv3-kvm:  GIC version 3 with kernel IRQ chip

Since "gicv3" is still not supported (to use GICv3, kernel irqchip
support is required for now, which corresponds to "gicv3-kvm"),
currently the maximum superset of the result should be:

["gicv2", "gicv2-kvm", "gicv3-kvm"]

Please help review whether the interface suits our need, also please
point out any error I have made.

One question: how should I make this command "ARM only"? I see that
in qmp-commands.hx, I can use something like "#if defined
TARGET_ARM" to block out ARM specified commands, however how should
I do the similiar thing in qapi-schema.json?

Thanks!
Peter

Peter Xu (2):
  arm: gic: add GICType
  arm: gic: add "query-gic-capability" interface

 qapi-schema.json | 28 
 qmp-commands.hx  | 25 +
 qmp.c|  5 +
 scripts/qapi.py  |  1 +
 4 files changed, 59 insertions(+)

-- 
2.4.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list