On Wed, 28 Oct 2020 14:31:35 +1100 David Gibson <dgib...@redhat.com> wrote:
> On Tue, 27 Oct 2020 13:54:26 +0100 > Igor Mammedov <imamm...@redhat.com> wrote: > > > On Tue, 27 Oct 2020 07:26:44 -0400 > > "Michael S. Tsirkin" <m...@redhat.com> wrote: > > > > [...] > > [...] > > [...] > > [...] > > [...] > > > > > > > > It certainly shouldn't wait an unbounded time. But a wait with timeout > > > > seems worth investigating to me. > > racy, timeout is bound to break once it's in overcommited env. > > Hm. That's no less true at the management layer than it is at the qemu > layer. true, but it's user policy which is defined by user not by QEMU. > > > > If it's helpful, I'd add a query to check state > > > so management can figure out why doesn't guest see device yet. > > that means mgmt would have to poll it and forward it to user > > somehow. > > If that even makes sense. In the case of Kata, it's supposed to be > autonomously creating the VM, so there's nothing meaningful it can > forward to the user other than "failed to create the container because > of some hotplug problem that means nothing to you". > > > [...] > > I have more questions wrt the suggestion/workflow: > > * at what place would you suggest buffering it? > > * what would be the request in this case, i.e. create PCI device anyways > > and try to signal hotplug event later? > > * what would baremethal do in such case? > > * what to do in case guest is never ready, what user should do in such case? > > * can be such device be removed? > > > > not sure that all of this is worth of the effort and added complexity. > > > > alternatively: > > maybe ports can send QMP events about it's state changes, which end user > > would > > be able to see + error like in this patch. > > > > On top of it, mgmt could build a better UIx, like retry/notify logic if > > that's what user really wishes for and configures (it would be up to user to > > define behaviour). > > That kind of makes sense if the user is explicitly requesting hotplugs, > but that's not necessarily the case. user doesn't have to be a human, it could be some mgmt layer that would automate retry logic, depending on what actually user needs for particular task (i.e. fail immediately, retry N time then fail, retry with time out - then fail, don't care - succeed, ...). The point is for QEMU to provide means for mgmt to implement whatever policy user would need. PS: but then, I know close to nothing about PCI, so all of above might be nonsense.