On Tue, Aug 18, 2020 at 02:44:50PM +0200, Cornelia Huck wrote: > On Mon, 17 Aug 2020 15:11:28 +0200 > Stefano Garzarella <sgarz...@redhat.com> wrote: > > > On Mon, Aug 17, 2020 at 12:27:46PM +0200, Cornelia Huck wrote: > > > On Thu, 13 Aug 2020 14:04:15 +0200 > > > Stefano Garzarella <sgarz...@redhat.com> wrote: > > > > > > > On Thu, Aug 13, 2020 at 12:37:37PM +0200, Cornelia Huck wrote: > > > > > On Thu, 13 Aug 2020 12:24:30 +0200 > > > > > Stefano Garzarella <sgarz...@redhat.com> wrote: > > > > > > > > > > > On Thu, Aug 13, 2020 at 11:28:20AM +0200, Cornelia Huck wrote: > > > > > > > We basically have three possible ways to deal with this: > > > > > > > > > > > > > > - Force it to modern (i.e., what you have been doing; would need > > > > > > > the > > > > > > > equivalent changes in ccw as well.) > > > > > > > > > > > > Oo, thanks for pointing out ccw! > > > > > > I don't know ccw well, in this case should we set dev->max_rev to 1 > > > > > > or 2 > > > > > > to force to modern? > > > > > > > > > > No, ->max_rev is the wrong side of the limit :) You want > > > > > > > > Well :-) Thanks! > > > > > > > > > > > > > > ccw_dev->force_revision_1 = true; > > > > > > > > > > in _instance_init() (see e.g. virtio-ccw-gpu.c). > > > > > > > > > > > > > > > > > > Pro: looks like the cleanest approach. > > > > > > > Con: not sure if we would need backwards compatibility support, > > > > > > > which looks hairy. > > > > > > > > > > > > Not sure too. > > > > > > > > > > Yes, I'm not sure at all how to handle user-specified values for > > > > > legacy/modern. > > > > > > Thinking a bit more about it, I'm not sure whether we even *can* > > > provide backwards compatibility: we have different autoconfigurations > > > for PCI based upon where it is plugged, and ccw does not have a way to > > > turn legacy on/off, except from within the code. > > > > Yes, I discovered today for example that the PCIe bus set auto-legacy > > mode to off. > > And vhost-vsock actually really seems to be modern-only, see below. > > > > > > > > > > > > > > > > > > > > > > > > - Add vsock to the list of devices with legacy support. > > > > > > > Pro: Existing setups continue to work. > > > > > > > Con: If vsock is really virtio-1-only, we still carry around > > > > > > > possibly broken legacy support. > > > > > > > > > > > > I'm not sure it is virtio-1-only, but virtio-vsock was introduced in > > > > > > 2016, so I supposed it is modern-only. > > > > > > > > > > Yes, I would guess so as well. > > > > > > > > > > > > > > > > > How can I verify that? Maybe forcing legacy mode and run some > > > > > > tests. > > > > > > > > > > Probably yes. The likeliest area with issues is probably endianness, > > > > > so > > > > > maybe with something big endian in the mix? > > > > > > > > > > > > > Yeah, I'll try this setup! > > Ok, I tried this now with an x86 host and an s390x guest. Reverted the > checking commit, tried both with a -ccw and a -pci device and your ncat > example. > - When using virtio-1, both devices work fine. > - When using the -pci device with disable-modern=yes, I get "reset by > peer". > - When using the -ccw device with max_revision=0, I get an instant > timeout.
Great, thanks for testing! > > Smells like endianness problems (aka weird things are happening). Yeah. > > Also noticed that vhost-vsock-ccw does not have an immediate problem, > even with the commit: The code only checks whether the device has been > forced to legacy, not whether legacy is allowed (which cannot be > controlled by the user anyway). Probably best to address after we've > dealt with the vhost-vsock issue and made sure that there are no other > problems. Make sense! > > > > > > > > > > > > > > > > > > - Do nothing, have users force legacy off. Bad idea, as ccw has > > > > > > > no way > > > > > > > to do that on the command line. > > > > > > > > > > > > > > The first option is probably best. > > > > > > The first option is now "force modern, but with no backwards > > > compatibility", which is not that great; but "allow legacy, even though > > > it should not exist" is not particularly appealing, either... what a > > > mess :( > > > > Yeah, it's a mess :-( anyway I still prefer option 1, it seems a little > > bit more correct to me. > > It seems to me that the status before this was "works by accident, but > only if we're not negotiating to legacy, or the guest/host are both > little endian". IOW, no visible breakage for most people (or we'd > probably have heard of it already). Now we have a setup that's correct, > but forces users to adapt their QEMU command lines. Option 1 would > eliminate the need to do that, but would cause possibly > not-really-fixable migration issues (you can probably deal with that > manually, detaching and re-attaching the device as a last resort.) > > So, force modern, probably also remove the -transitional device type, > and put a prominent explanation into the change log? > I completely agree with your analysis and solution. So, for now we need to patch vhost-vsock-pci and vhost-user-vsock-pci, and queue the patches in stable. Do you prefer to send them? Otherwise I can do that. Thanks again for the help and the test with s390x guest! Stefano