On Fri, Sep 06, 2019 at 05:32:02PM +0200, Johannes Berg wrote: > Hi, > > > Oh. Apparently qemu mailman chose this time to kick me out > > of list subscription (too many bounces or something?) > > so I didn't see it. > > D'oh. Well, it's really my mistake, I should've CC'ed you. > > > What worries me is the load this places on the socket. > > ATM if socket buffer is full qemu locks up, so we > > need to be careful not to send too many messages. > > Right, sure. I really don't think you ever want to use this extension in > a "normal VM" use case. :-) > > I think the only use for this extension would be for simulation > purposes, and even then only combined with the REPLY_ACK and SLAVE_REQ > extensions, i.e. you explicitly *want* your virtual machine to lock up / > wait for a response to the KICK command (and respectively, the device to > wait for a response to the CALL command).
OK so when combined with these, it's OK I think. Do we want to force this restriction in code maybe then? > Note that this is basically its sole purpose: ensuring exactly this > synchronisation! Yes, it's bad for speed, but it's needed in simulation > when time isn't "real". > > Let me try to explain again, most likely my previous explanation was too > long winded. WLOG, I'll focus on the "kick" use case, the "call" is the > same, just the other way around. I'm sure you know that the call is > asynchronous, i.e. the VM will increment the eventfd counter, and > "eventually" it becomes readable to the device. Now the device does > something (as fast as it can, presumably) and returns the buffer to the > VM. > > Now, imagine you're running in simulation time, i.e. "time travel" mode. > Briefly, this hacks the idle loop of the (UML) VM to just skip forward > when there's nothing to do, i.e. if you have a timer firing in 100ms and > get to idle, time is immediately incremented by 100ms and the timer > fires. For a single VM/device this is already implemented in UML, and > while it's already very useful that's only half the story to me. > > Once you have multiple devices and/or VMs, you basically have to keep a > "simulation calendar" where each participant (VM/device) can put an > entry, and then whenever they become idle they don't immediately move > time forward, but instead ask the calendar what's next, and the calendar > determines who runs. > > Now, for these simulation cases, consider vhost-user again. It's > absolutely necessary that the calendar is updated all the time, and the > asynchronous nature of the call breaks that - the device cannot update > the calendar to put an event there to process the call message. > > With this extension, the device would work in the following way. Assume > that the device is idle, and waiting for the simulation calendar to tell > it to run. Now, > > 1) it has an incoming call (message) from VM (which waits for reply) > 2) the device will now put a new event on the simulation scheduler for > a time slot to process the message > 3) return reply to VM > 4) device goes back to sleep - this stuff was asynchronously handled > outside of the simulation basically. > > In a sense, the code that just ran isn't considered part of the > simulated device, it's just the transport protocol and part of the > simulation environment. > > At this point, the device is still waiting for its calendar event to be > triggered, but now it has a new one to process the message. Now, once > the VM goes to sleep, the scheduler will check the calendar and > presumably tell the device to run, which runs and processes the message. > This repeats for as long as the simulation runs, going both ways (or > multiple ways if there are more than 2 participants). > > > Now, what if you didn't have this synchronisation, ie. we don't have > this extension or we don't have REPLY_ACK or whatnot? > > In that case, after the step 1 above, the VM will immediately continue > running. Let's say it'll wait for a response from the device for a few > hundred milliseconds (of now simulated time). However, depending on the > scheduling, the device has quite likely not yet put the new event on the > simulation calendar (that happens in step 2 above). This means that the > VM's calendar event to wake it up after a few hundred milliseconds will > immediately trigger, and the simulation ends with the driver getting a > timeout from the device. > > > So - yes, while I understand your concern, I basically think this is not > something anyone will want to use outside of such simulations. OTOH, > there are various use cases (I'm doing device simulation, others are > doing network simulation) that use such a behaviour, and it might be > nice to support it in a more standard way, rather than everyone having > their own local hacks for everything, like e.g. the VMSimInt paper(**). > > > But again, like I said, no hard feelings if you think such simulation > has no place in upstream vhost-user. > > > (**) I put a copy of their qemu changes on top of 1.6.0 here: > https://p.sipsolutions.net/af9a68ded948c07e.txt > > johannes