Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Wed, Aug 19, 2009 at 08:40:33AM +0300, Avi Kivity wrote: On 08/19/2009 03:38 AM, Ira W. Snyder wrote: On Wed, Aug 19, 2009 at 12:26:23AM +0300, Avi Kivity wrote: On 08/18/2009 11:59 PM, Ira W. Snyder wrote: On a non shared-memory system (where the guest's RAM is not just a chunk of userspace RAM in the host system), virtio's management model seems to fall apart. Feature negotiation doesn't work as one would expect. In your case, virtio-net on the main board accesses PCI config space registers to perform the feature negotiation; software on your PCI cards needs to trap these config space accesses and respond to them according to virtio ABI. Is this real PCI (physical hardware) or fake PCI (software PCI emulation) that you are describing? Real PCI. The host (x86, PCI master) must use real PCI to actually configure the boards, enable bus mastering, etc. Just like any other PCI device, such as a network card. On the guests (ppc, PCI agents) I cannot add/change PCI functions (the last .[0-9] in the PCI address) nor can I change PCI BAR's once the board has started. I'm pretty sure that would violate the PCI spec, since the PCI master would need to re-scan the bus, and re-assign addresses, which is a task for the BIOS. Yes. Can the boards respond to PCI config space cycles coming from the host, or is the config space implemented in silicon and immutable? (reading on, I see the answer is no). virtio-pci uses the PCI config space to configure the hardware. Yes, the PCI config space is implemented in silicon. I can change a few things (mostly PCI BAR attributes), but not much. (There's no real guest on your setup, right? just a kernel running on and x86 system and other kernels running on the PCI cards?) Yes, the x86 (PCI master) runs Linux (booted via PXELinux). The ppc's (PCI agents) also run Linux (booted via U-Boot). They are independent Linux systems, with a physical PCI interconnect. The x86 has CONFIG_PCI=y, however the ppc's have CONFIG_PCI=n. Linux's PCI stack does bad things as a PCI agent. It always assumes it is a PCI master. It is possible for me to enable CONFIG_PCI=y on the ppc's by removing the PCI bus from their list of devices provided by OpenFirmware. They can not access PCI via normal methods. PCI drivers cannot work on the ppc's, because Linux assumes it is a PCI master. To the best of my knowledge, I cannot trap configuration space accesses on the PCI agents. I haven't needed that for anything I've done thus far. Well, if you can't do that, you can't use virtio-pci on the host. You'll need another virtio transport (equivalent to fake pci you mentioned above). Ok. Is there something similar that I can study as an example? Should I look at virtio-pci? This does appear to be solved by vbus, though I haven't written a vbus-over-PCI implementation, so I cannot be completely sure. Even if virtio-pci doesn't work out for some reason (though it should), you can write your own virtio transport and implement its config space however you like. This is what I did with virtio-over-PCI. The way virtio-net negotiates features makes this work non-intuitively. I think you tried to take two virtio-nets and make them talk together? That won't work. You need the code from qemu to talk to virtio-net config space, and vhost-net to pump the rings. It *is* possible to make two unmodified virtio-net's talk together. I've done it, and it is exactly what the virtio-over-PCI patch does. Study it and you'll see how I connected the rx/tx queues together. The feature negotiation code also works, but in a very unintuitive manner. I made it work in the virtio-over-PCI patch, but the devices are hardcoded into the driver. It would be quite a bit of work to swap virtio-net and virtio-console, for example. I'm not at all clear on how to get feature negotiation to work on a system like mine. From my study of lguest and kvm (see below) it looks like userspace will need to be involved, via a miscdevice. I don't see why. Is the kernel on the PCI cards in full control of all accesses? I'm not sure what you mean by this. Could you be more specific? This is a normal, unmodified vanilla Linux kernel running on the PCI agents. I meant, does board software implement the config space accesses issued from the host, and it seems the answer is no. In my virtio-over-PCI patch, I hooked two virtio-net's together. I wrote an algorithm to pair the tx/rx queues together. Since virtio-net pre-fills its rx queues with buffers, I was able to use the DMA engine to copy from the tx queue into the pre-allocated memory in the rx queue. Please find a name other than virtio-over-PCI since it conflicts with virtio-pci. You're tunnelling virtio config cycles (which are usually done on pci config cycles) on a new protocol which is
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/19/2009 06:28 PM, Ira W. Snyder wrote: Well, if you can't do that, you can't use virtio-pci on the host. You'll need another virtio transport (equivalent to fake pci you mentioned above). Ok. Is there something similar that I can study as an example? Should I look at virtio-pci? There's virtio-lguest, virtio-s390, and virtio-vbus. I think you tried to take two virtio-nets and make them talk together? That won't work. You need the code from qemu to talk to virtio-net config space, and vhost-net to pump the rings. It *is* possible to make two unmodified virtio-net's talk together. I've done it, and it is exactly what the virtio-over-PCI patch does. Study it and you'll see how I connected the rx/tx queues together. Right, crossing the cables works, but feature negotiation is screwed up, and both sides think the data is in their RAM. vhost-net doesn't do negotiation and doesn't assume the data lives in its address space. Please find a name other than virtio-over-PCI since it conflicts with virtio-pci. You're tunnelling virtio config cycles (which are usually done on pci config cycles) on a new protocol which is itself tunnelled over PCI shared memory. Sorry about that. Do you have suggestions for a better name? virtio-$yourhardware or maybe virtio-dma I called it virtio-over-PCI in my previous postings to LKML, so until a new patch is written and posted, I'll keep referring to it by the name used in the past, so people can search for it. When I post virtio patches, should I CC another mailing list in addition to LKML? virtualizat...@lists.linux-foundation.org is virtio's home. That said, I'm not sure how qemu-system-ppc running on x86 could possibly communicate using virtio-net. This would mean the guest is an emulated big-endian PPC, while the host is a little-endian x86. I haven't actually tested this situation, so perhaps I am wrong. I'm confused now. You don't actually have any guest, do you, so why would you run qemu at all? The x86 side only needs to run virtio-net, which is present in RHEL 5.3. You'd only need to run virtio-tunnel or however it's called. All the eventfd magic takes place on the PCI agents. I can upgrade the kernel to anything I want on both the x86 and ppc's. I'd like to avoid changing the x86 (RHEL5) userspace, though. On the ppc's, I have full control over the userspace environment. You don't need any userspace on virtio-net's side. Your ppc boards emulate a virtio-net device, so all you need is the virtio-net module (and virtio bindings). If you chose to emulate, say, an e1000 card all you'd need is the e1000 driver. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Wed, Aug 19, 2009 at 06:37:06PM +0300, Avi Kivity wrote: On 08/19/2009 06:28 PM, Ira W. Snyder wrote: Well, if you can't do that, you can't use virtio-pci on the host. You'll need another virtio transport (equivalent to fake pci you mentioned above). Ok. Is there something similar that I can study as an example? Should I look at virtio-pci? There's virtio-lguest, virtio-s390, and virtio-vbus. I think you tried to take two virtio-nets and make them talk together? That won't work. You need the code from qemu to talk to virtio-net config space, and vhost-net to pump the rings. It *is* possible to make two unmodified virtio-net's talk together. I've done it, and it is exactly what the virtio-over-PCI patch does. Study it and you'll see how I connected the rx/tx queues together. Right, crossing the cables works, but feature negotiation is screwed up, and both sides think the data is in their RAM. vhost-net doesn't do negotiation and doesn't assume the data lives in its address space. Yes, that is exactly what I did: crossed the cables (in software). I'll take a closer look at vhost-net now, and make sure I understand how it works. Please find a name other than virtio-over-PCI since it conflicts with virtio-pci. You're tunnelling virtio config cycles (which are usually done on pci config cycles) on a new protocol which is itself tunnelled over PCI shared memory. Sorry about that. Do you have suggestions for a better name? virtio-$yourhardware or maybe virtio-dma How about virtio-phys? Arnd and BenH are both looking at PPC systems (similar to mine). Grant Likely is looking at talking to an processor core running on an FPGA, IIRC. Most of the code can be shared, very little should need to be board-specific, I hope. I called it virtio-over-PCI in my previous postings to LKML, so until a new patch is written and posted, I'll keep referring to it by the name used in the past, so people can search for it. When I post virtio patches, should I CC another mailing list in addition to LKML? virtualizat...@lists.linux-foundation.org is virtio's home. That said, I'm not sure how qemu-system-ppc running on x86 could possibly communicate using virtio-net. This would mean the guest is an emulated big-endian PPC, while the host is a little-endian x86. I haven't actually tested this situation, so perhaps I am wrong. I'm confused now. You don't actually have any guest, do you, so why would you run qemu at all? I do not run qemu. I am just stating a problem with virtio-net that I noticed. This is just so someone more knowledgeable can be aware of the problem. The x86 side only needs to run virtio-net, which is present in RHEL 5.3. You'd only need to run virtio-tunnel or however it's called. All the eventfd magic takes place on the PCI agents. I can upgrade the kernel to anything I want on both the x86 and ppc's. I'd like to avoid changing the x86 (RHEL5) userspace, though. On the ppc's, I have full control over the userspace environment. You don't need any userspace on virtio-net's side. Your ppc boards emulate a virtio-net device, so all you need is the virtio-net module (and virtio bindings). If you chose to emulate, say, an e1000 card all you'd need is the e1000 driver. Thanks for the replies. Ira -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/19/2009 07:29 PM, Ira W. Snyder wrote: virtio-$yourhardware or maybe virtio-dma How about virtio-phys? Could work. Arnd and BenH are both looking at PPC systems (similar to mine). Grant Likely is looking at talking to an processor core running on an FPGA, IIRC. Most of the code can be shared, very little should need to be board-specific, I hope. Excellent. That said, I'm not sure how qemu-system-ppc running on x86 could possibly communicate using virtio-net. This would mean the guest is an emulated big-endian PPC, while the host is a little-endian x86. I haven't actually tested this situation, so perhaps I am wrong. I'm confused now. You don't actually have any guest, do you, so why would you run qemu at all? I do not run qemu. I am just stating a problem with virtio-net that I noticed. This is just so someone more knowledgeable can be aware of the problem. Ah, it certainly doesn't byteswap. Maybe nobody tried it. Hollis? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Tue, Aug 18, 2009 at 11:46:06AM +0300, Michael S. Tsirkin wrote: On Mon, Aug 17, 2009 at 04:17:09PM -0400, Gregory Haskins wrote: Michael S. Tsirkin wrote: On Mon, Aug 17, 2009 at 10:14:56AM -0400, Gregory Haskins wrote: Case in point: Take an upstream kernel and you can modprobe the vbus-pcibridge in and virtio devices will work over that transport unmodified. See http://lkml.org/lkml/2009/8/6/244 for details. The modprobe you are talking about would need to be done in guest kernel, correct? Yes, and your point is? unmodified (pardon the psuedo pun) modifies virtio, not guest. It means you can take an off-the-shelf kernel with off-the-shelf virtio (ala distro-kernel) and modprobe vbus-pcibridge and get alacrityvm acceleration. Heh, by that logic ksplice does not modify running kernel either :) It is not a design goal of mine to forbid the loading of a new driver, so I am ok with that requirement. OTOH, Michael's patch is purely targeted at improving virtio-net on kvm, and its likewise constrained by various limitations of that decision (such as its reliance of the PCI model, and the kvm memory scheme). vhost is actually not related to PCI in any way. It simply leaves all setup for userspace to do. And the memory scheme was intentionally separated from kvm so that it can easily support e.g. lguest. I think you have missed my point. I mean that vhost requires a separate bus-model (ala qemu-pci). So? That can be in userspace, and can be anything including vbus. And no, your memory scheme is not separated, at least, not very well. It still assumes memory-regions and copy_to_user(), which is very kvm-esque. I don't think so: works for lguest, kvm, UML and containers Vbus has people using things like userspace containers (no regions), vhost by default works without regions and physical hardware (dma controllers, so no regions or copy_to_user) so your scheme quickly falls apart once you get away from KVM. Someone took a driver and is building hardware for it ... so what? I think Greg is referring to something like my virtio-over-PCI patch. I'm pretty sure that vhost is completely useless for my situation. I'd like to see vhost work for my use, so I'll try to explain what I'm doing. I've got a system where I have about 20 computers connected via PCI. The PCI master is a normal x86 system, and the PCI agents are PowerPC systems. The PCI agents act just like any other PCI card, except they are running Linux, and have their own RAM and peripherals. I wrote a custom driver which imitated a network interface and a serial port. I tried to push it towards mainline, and DavidM rejected it, with the argument, use virtio, don't add another virtualization layer to the kernel. I think he has a decent argument, so I wrote virtio-over-PCI. Now, there are some things about virtio that don't work over PCI. Mainly, memory is not truly shared. It is extremely slow to access memory that is far away, meaning across the PCI bus. This can be worked around by using a DMA controller to transfer all data, along with an intelligent scheme to perform only writes across the bus. If you're careful, reads are never needed. So, in my system, copy_(to|from)_user() is completely wrong. There is no userspace, only a physical system. In fact, because normal x86 computers do not have DMA controllers, the host system doesn't actually handle any data transfer! I used virtio-net in both the guest and host systems in my example virtio-over-PCI patch, and succeeded in getting them to communicate. However, the lack of any setup interface means that the devices must be hardcoded into both drivers, when the decision could be up to userspace. I think this is a problem that vbus could solve. For my own selfish reasons (I don't want to maintain an out-of-tree driver) I'd like to see *something* useful in mainline Linux. I'm happy to answer questions about my setup, just ask. Ira -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/18/2009 06:53 PM, Ira W. Snyder wrote: So, in my system, copy_(to|from)_user() is completely wrong. There is no userspace, only a physical system. In fact, because normal x86 computers do not have DMA controllers, the host system doesn't actually handle any data transfer! In fact, modern x86s do have dma engines these days (google for Intel I/OAT), and one of our plans for vhost-net is to allow their use for packets above a certain size. So a patch allowing vhost-net to optionally use a dma engine is a good thing. I used virtio-net in both the guest and host systems in my example virtio-over-PCI patch, and succeeded in getting them to communicate. However, the lack of any setup interface means that the devices must be hardcoded into both drivers, when the decision could be up to userspace. I think this is a problem that vbus could solve. Exposing a knob to userspace is not an insurmountable problem; vhost-net already allows changing the memory layout, for example. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Tue, Aug 18, 2009 at 07:51:21PM +0300, Avi Kivity wrote: On 08/18/2009 06:53 PM, Ira W. Snyder wrote: So, in my system, copy_(to|from)_user() is completely wrong. There is no userspace, only a physical system. In fact, because normal x86 computers do not have DMA controllers, the host system doesn't actually handle any data transfer! In fact, modern x86s do have dma engines these days (google for Intel I/OAT), and one of our plans for vhost-net is to allow their use for packets above a certain size. So a patch allowing vhost-net to optionally use a dma engine is a good thing. Yes, I'm aware that very modern x86 PCs have general purpose DMA engines, even though I don't have any capable hardware. However, I think it is better to support using any PC (with or without DMA engine, any architecture) as the PCI master, and just handle the DMA all from the PCI agent, which is known to have DMA? I used virtio-net in both the guest and host systems in my example virtio-over-PCI patch, and succeeded in getting them to communicate. However, the lack of any setup interface means that the devices must be hardcoded into both drivers, when the decision could be up to userspace. I think this is a problem that vbus could solve. Exposing a knob to userspace is not an insurmountable problem; vhost-net already allows changing the memory layout, for example. Let me explain the most obvious problem I ran into: setting the MAC addresses used in virtio. On the host (PCI master), I want eth0 (virtio-net) to get a random MAC address. On the guest (PCI agent), I want eth0 (virtio-net) to get a specific MAC address, aa:bb:cc:dd:ee:ff. The virtio feature negotiation code handles this, by seeing the VIRTIO_NET_F_MAC feature in it's configuration space. If BOTH drivers do not have VIRTIO_NET_F_MAC set, then NEITHER will use the specified MAC address. This is because the feature negotiation code only accepts a feature if it is offered by both sides of the connection. In this case, I must have the guest generate a random MAC address and have the host put aa:bb:cc:dd:ee:ff into the guest's configuration space. This basically means hardcoding the MAC addresses in the Linux drivers, which is a big no-no. What would I expose to userspace to make this situation manageable? Thanks for the response, Ira -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/18/2009 08:27 PM, Ira W. Snyder wrote: In fact, modern x86s do have dma engines these days (google for Intel I/OAT), and one of our plans for vhost-net is to allow their use for packets above a certain size. So a patch allowing vhost-net to optionally use a dma engine is a good thing. Yes, I'm aware that very modern x86 PCs have general purpose DMA engines, even though I don't have any capable hardware. However, I think it is better to support using any PC (with or without DMA engine, any architecture) as the PCI master, and just handle the DMA all from the PCI agent, which is known to have DMA? Certainly; but if your PCI agent will support the DMA API, then the same vhost code will work with both I/OAT and your specialized hardware. Exposing a knob to userspace is not an insurmountable problem; vhost-net already allows changing the memory layout, for example. Let me explain the most obvious problem I ran into: setting the MAC addresses used in virtio. On the host (PCI master), I want eth0 (virtio-net) to get a random MAC address. On the guest (PCI agent), I want eth0 (virtio-net) to get a specific MAC address, aa:bb:cc:dd:ee:ff. The virtio feature negotiation code handles this, by seeing the VIRTIO_NET_F_MAC feature in it's configuration space. If BOTH drivers do not have VIRTIO_NET_F_MAC set, then NEITHER will use the specified MAC address. This is because the feature negotiation code only accepts a feature if it is offered by both sides of the connection. In this case, I must have the guest generate a random MAC address and have the host put aa:bb:cc:dd:ee:ff into the guest's configuration space. This basically means hardcoding the MAC addresses in the Linux drivers, which is a big no-no. What would I expose to userspace to make this situation manageable? I think in this case you want one side to be virtio-net (I'm guessing the x86) and the other side vhost-net (the ppc boards with the dma engine). virtio-net on x86 would communicate with userspace on the ppc board to negotiate features and get a mac address, the fast path would be between virtio-net and vhost-net (which would use the dma engine to push and pull data). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Tue, Aug 18, 2009 at 08:47:04PM +0300, Avi Kivity wrote: On 08/18/2009 08:27 PM, Ira W. Snyder wrote: In fact, modern x86s do have dma engines these days (google for Intel I/OAT), and one of our plans for vhost-net is to allow their use for packets above a certain size. So a patch allowing vhost-net to optionally use a dma engine is a good thing. Yes, I'm aware that very modern x86 PCs have general purpose DMA engines, even though I don't have any capable hardware. However, I think it is better to support using any PC (with or without DMA engine, any architecture) as the PCI master, and just handle the DMA all from the PCI agent, which is known to have DMA? Certainly; but if your PCI agent will support the DMA API, then the same vhost code will work with both I/OAT and your specialized hardware. Yes, that's true. My ppc is a Freescale MPC8349EMDS. It has a Linux DMAEngine driver in mainline, which I've used. That's excellent. Exposing a knob to userspace is not an insurmountable problem; vhost-net already allows changing the memory layout, for example. Let me explain the most obvious problem I ran into: setting the MAC addresses used in virtio. On the host (PCI master), I want eth0 (virtio-net) to get a random MAC address. On the guest (PCI agent), I want eth0 (virtio-net) to get a specific MAC address, aa:bb:cc:dd:ee:ff. The virtio feature negotiation code handles this, by seeing the VIRTIO_NET_F_MAC feature in it's configuration space. If BOTH drivers do not have VIRTIO_NET_F_MAC set, then NEITHER will use the specified MAC address. This is because the feature negotiation code only accepts a feature if it is offered by both sides of the connection. In this case, I must have the guest generate a random MAC address and have the host put aa:bb:cc:dd:ee:ff into the guest's configuration space. This basically means hardcoding the MAC addresses in the Linux drivers, which is a big no-no. What would I expose to userspace to make this situation manageable? I think in this case you want one side to be virtio-net (I'm guessing the x86) and the other side vhost-net (the ppc boards with the dma engine). virtio-net on x86 would communicate with userspace on the ppc board to negotiate features and get a mac address, the fast path would be between virtio-net and vhost-net (which would use the dma engine to push and pull data). Ah, that seems backwards, but it should work after vhost-net learns how to use the DMAEngine API. I haven't studied vhost-net very carefully yet. As soon as I saw the copy_(to|from)_user() I stopped reading, because it seemed useless for my case. I'll look again and try to find where vhost-net supports setting MAC addresses and other features. Also, in my case I'd like to boot Linux with my rootfs over NFS. Is vhost-net capable of this? I've had Arnd, BenH, and Grant Likely (and others, privately) contact me about devices they are working with that would benefit from something like virtio-over-PCI. I'd like to see vhost-net be merged with the capability to support my use case. There are plenty of others that would benefit, not just myself. I'm not sure vhost-net is being written with this kind of future use in mind. I'd hate to see it get merged, and then have to change the ABI to support physical-device-to-device usage. It would be better to keep future use in mind now, rather than try and hack it in later. Thanks for the comments. Ira -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/18/2009 09:27 PM, Ira W. Snyder wrote: I think in this case you want one side to be virtio-net (I'm guessing the x86) and the other side vhost-net (the ppc boards with the dma engine). virtio-net on x86 would communicate with userspace on the ppc board to negotiate features and get a mac address, the fast path would be between virtio-net and vhost-net (which would use the dma engine to push and pull data). Ah, that seems backwards, but it should work after vhost-net learns how to use the DMAEngine API. I haven't studied vhost-net very carefully yet. As soon as I saw the copy_(to|from)_user() I stopped reading, because it seemed useless for my case. I'll look again and try to find where vhost-net supports setting MAC addresses and other features. It doesn't; all it does is pump the rings, leaving everything else to userspace. Also, in my case I'd like to boot Linux with my rootfs over NFS. Is vhost-net capable of this? It's just another network interface. You'd need an initramfs though to contain the needed userspace. I've had Arnd, BenH, and Grant Likely (and others, privately) contact me about devices they are working with that would benefit from something like virtio-over-PCI. I'd like to see vhost-net be merged with the capability to support my use case. There are plenty of others that would benefit, not just myself. I'm not sure vhost-net is being written with this kind of future use in mind. I'd hate to see it get merged, and then have to change the ABI to support physical-device-to-device usage. It would be better to keep future use in mind now, rather than try and hack it in later. Please review and comment then. I'm fairly confident there won't be any ABI issues since vhost-net does so little outside pumping the rings. Note the signalling paths go through eventfd: when vhost-net wants the other side to look at its ring, it tickles an eventfd which is supposed to trigger an interrupt on the other side. Conversely, when another eventfd is signalled, vhost-net will look at the ring and process any data there. You'll need to wire your signalling to those eventfds, either in userspace or in the kernel. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Tue, Aug 18, 2009 at 11:27:35AM -0700, Ira W. Snyder wrote: I haven't studied vhost-net very carefully yet. As soon as I saw the copy_(to|from)_user() I stopped reading, because it seemed useless for my case. I'll look again and try to find where vhost-net supports setting MAC addresses and other features. vhost net doesn't do this at all. You bind raw socket to a network device, and program that with usual userspace interfaces. Also, in my case I'd like to boot Linux with my rootfs over NFS. Is vhost-net capable of this? I've had Arnd, BenH, and Grant Likely (and others, privately) contact me about devices they are working with that would benefit from something like virtio-over-PCI. I'd like to see vhost-net be merged with the capability to support my use case. There are plenty of others that would benefit, not just myself. I'm not sure vhost-net is being written with this kind of future use in mind. I'd hate to see it get merged, and then have to change the ABI to support physical-device-to-device usage. It would be better to keep future use in mind now, rather than try and hack it in later. I still need to think your usage over. I am not so sure this fits what vhost is trying to do. If not, possibly it's better to just have a separate driver for your device. Thanks for the comments. Ira -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Tue, Aug 18, 2009 at 10:27:52AM -0700, Ira W. Snyder wrote: On Tue, Aug 18, 2009 at 07:51:21PM +0300, Avi Kivity wrote: On 08/18/2009 06:53 PM, Ira W. Snyder wrote: So, in my system, copy_(to|from)_user() is completely wrong. There is no userspace, only a physical system. In fact, because normal x86 computers do not have DMA controllers, the host system doesn't actually handle any data transfer! In fact, modern x86s do have dma engines these days (google for Intel I/OAT), and one of our plans for vhost-net is to allow their use for packets above a certain size. So a patch allowing vhost-net to optionally use a dma engine is a good thing. Yes, I'm aware that very modern x86 PCs have general purpose DMA engines, even though I don't have any capable hardware. However, I think it is better to support using any PC (with or without DMA engine, any architecture) as the PCI master, and just handle the DMA all from the PCI agent, which is known to have DMA? I used virtio-net in both the guest and host systems in my example virtio-over-PCI patch, and succeeded in getting them to communicate. However, the lack of any setup interface means that the devices must be hardcoded into both drivers, when the decision could be up to userspace. I think this is a problem that vbus could solve. Exposing a knob to userspace is not an insurmountable problem; vhost-net already allows changing the memory layout, for example. Let me explain the most obvious problem I ran into: setting the MAC addresses used in virtio. On the host (PCI master), I want eth0 (virtio-net) to get a random MAC address. On the guest (PCI agent), I want eth0 (virtio-net) to get a specific MAC address, aa:bb:cc:dd:ee:ff. The virtio feature negotiation code handles this, by seeing the VIRTIO_NET_F_MAC feature in it's configuration space. If BOTH drivers do not have VIRTIO_NET_F_MAC set, then NEITHER will use the specified MAC address. This is because the feature negotiation code only accepts a feature if it is offered by both sides of the connection. In this case, I must have the guest generate a random MAC address and have the host put aa:bb:cc:dd:ee:ff into the guest's configuration space. This basically means hardcoding the MAC addresses in the Linux drivers, which is a big no-no. What would I expose to userspace to make this situation manageable? Thanks for the response, Ira This calls for some kind of change in guest virtio. vhost being a host kernel only feature, does not deal with this problem. But assuming virtio in guest supports this somehow, vhost will not interfere: you do the setup in qemu userspace anyway, vhost will happily use a network device however you chose to set it up. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Tue, Aug 18, 2009 at 09:52:48PM +0300, Avi Kivity wrote: On 08/18/2009 09:27 PM, Ira W. Snyder wrote: I think in this case you want one side to be virtio-net (I'm guessing the x86) and the other side vhost-net (the ppc boards with the dma engine). virtio-net on x86 would communicate with userspace on the ppc board to negotiate features and get a mac address, the fast path would be between virtio-net and vhost-net (which would use the dma engine to push and pull data). Ah, that seems backwards, but it should work after vhost-net learns how to use the DMAEngine API. I haven't studied vhost-net very carefully yet. As soon as I saw the copy_(to|from)_user() I stopped reading, because it seemed useless for my case. I'll look again and try to find where vhost-net supports setting MAC addresses and other features. It doesn't; all it does is pump the rings, leaving everything else to userspace. Ok. On a non shared-memory system (where the guest's RAM is not just a chunk of userspace RAM in the host system), virtio's management model seems to fall apart. Feature negotiation doesn't work as one would expect. This does appear to be solved by vbus, though I haven't written a vbus-over-PCI implementation, so I cannot be completely sure. I'm not at all clear on how to get feature negotiation to work on a system like mine. From my study of lguest and kvm (see below) it looks like userspace will need to be involved, via a miscdevice. Also, in my case I'd like to boot Linux with my rootfs over NFS. Is vhost-net capable of this? It's just another network interface. You'd need an initramfs though to contain the needed userspace. Ok. I'm using an initramfs already, so adding some more userspace to it isn't a problem. I've had Arnd, BenH, and Grant Likely (and others, privately) contact me about devices they are working with that would benefit from something like virtio-over-PCI. I'd like to see vhost-net be merged with the capability to support my use case. There are plenty of others that would benefit, not just myself. I'm not sure vhost-net is being written with this kind of future use in mind. I'd hate to see it get merged, and then have to change the ABI to support physical-device-to-device usage. It would be better to keep future use in mind now, rather than try and hack it in later. Please review and comment then. I'm fairly confident there won't be any ABI issues since vhost-net does so little outside pumping the rings. Ok. I thought I should at least express my concerns while we're discussing this, rather than being too late after finding the time to study the driver. Off the top of my head, I would think that transporting userspace addresses in the ring (for copy_(to|from)_user()) vs. physical addresses (for DMAEngine) might be a problem. Pinning userspace pages into memory for DMA is a bit of a pain, though it is possible. There is also the problem of different endianness between host and guest in virtio-net. The struct virtio_net_hdr (include/linux/virtio_net.h) defines fields in host byte order. Which totally breaks if the guest has a different endianness. This is a virtio-net problem though, and is not transport specific. Note the signalling paths go through eventfd: when vhost-net wants the other side to look at its ring, it tickles an eventfd which is supposed to trigger an interrupt on the other side. Conversely, when another eventfd is signalled, vhost-net will look at the ring and process any data there. You'll need to wire your signalling to those eventfds, either in userspace or in the kernel. Ok. I've never used eventfd before, so that'll take yet more studying. I've browsed over both the kvm and lguest code, and it looks like they each re-invent a mechanism for transporting interrupts between the host and guest, using eventfd. They both do this by implementing a miscdevice, which is basically their management interface. See drivers/lguest/lguest_user.c (see write() and LHREQ_EVENTFD) and kvm-kmod-devel-88/x86/kvm_main.c (see kvm_vm_ioctl(), called via kvm_dev_ioctl()) for how they hook up eventfd's. I can now imagine how two userspace programs (host and guest) could work together to implement a management interface, including hotplug of devices, etc. Of course, this would basically reinvent the vbus management interface into a specific driver. I think this is partly what Greg is trying to abstract out into generic code. I haven't studied the actual data transport mechanisms in vbus, though I have studied virtio's transport mechanism. I think a generic management interface for virtio might be a good thing to consider, because it seems there are at least two implementations already: kvm and lguest. Thanks for answering my questions. It helps to talk with someone more familiar with the issues than I am. Ira -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Tue, Aug 18, 2009 at 08:53:29AM -0700, Ira W. Snyder wrote: I think Greg is referring to something like my virtio-over-PCI patch. I'm pretty sure that vhost is completely useless for my situation. I'd like to see vhost work for my use, so I'll try to explain what I'm doing. I've got a system where I have about 20 computers connected via PCI. The PCI master is a normal x86 system, and the PCI agents are PowerPC systems. The PCI agents act just like any other PCI card, except they are running Linux, and have their own RAM and peripherals. I wrote a custom driver which imitated a network interface and a serial port. I tried to push it towards mainline, and DavidM rejected it, with the argument, use virtio, don't add another virtualization layer to the kernel. I think he has a decent argument, so I wrote virtio-over-PCI. Now, there are some things about virtio that don't work over PCI. Mainly, memory is not truly shared. It is extremely slow to access memory that is far away, meaning across the PCI bus. This can be worked around by using a DMA controller to transfer all data, along with an intelligent scheme to perform only writes across the bus. If you're careful, reads are never needed. So, in my system, copy_(to|from)_user() is completely wrong. There is no userspace, only a physical system. Can guests do DMA to random host memory? Or is there some kind of IOMMU and DMA API involved? If the later, then note that you'll still need some kind of driver for your device. The question we need to ask ourselves then is whether this driver can reuse bits from vhost. In fact, because normal x86 computers do not have DMA controllers, the host system doesn't actually handle any data transfer! Is it true that PPC has to initiate all DMA then? How do you manage not to do DMA reads then? I used virtio-net in both the guest and host systems in my example virtio-over-PCI patch, and succeeded in getting them to communicate. However, the lack of any setup interface means that the devices must be hardcoded into both drivers, when the decision could be up to userspace. I think this is a problem that vbus could solve. What you describe (passing setup from host to guest) seems like a feature that guest devices need to support. It seems unlikely that vbus, being a transport layer, can address this. For my own selfish reasons (I don't want to maintain an out-of-tree driver) I'd like to see *something* useful in mainline Linux. I'm happy to answer questions about my setup, just ask. Ira Thanks Ira, I'll think about it. A couple of questions: - Could you please describe what kind of communication needs to happen? - I'm not familiar with DMA engine in question. I'm guessing it's the usual thing: in/out buffers need to be kernel memory, interface is asynchronous, small limited number of outstanding requests? Is there a userspace interface for it and if yes how does it work? -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Tuesday 18 August 2009 20:35:22 Michael S. Tsirkin wrote: On Tue, Aug 18, 2009 at 10:27:52AM -0700, Ira W. Snyder wrote: Also, in my case I'd like to boot Linux with my rootfs over NFS. Is vhost-net capable of this? I've had Arnd, BenH, and Grant Likely (and others, privately) contact me about devices they are working with that would benefit from something like virtio-over-PCI. I'd like to see vhost-net be merged with the capability to support my use case. There are plenty of others that would benefit, not just myself. yes. I'm not sure vhost-net is being written with this kind of future use in mind. I'd hate to see it get merged, and then have to change the ABI to support physical-device-to-device usage. It would be better to keep future use in mind now, rather than try and hack it in later. I still need to think your usage over. I am not so sure this fits what vhost is trying to do. If not, possibly it's better to just have a separate driver for your device. I now think we need both. virtio-over-PCI does it the right way for its purpose and can be rather generic. It could certainly be extended to support virtio-net on both sides (host and guest) of KVM, but I think it better fits the use where a kernel wants to communicate with some other machine where you normally wouldn't think of using qemu. Vhost-net OTOH is great in the way that it serves as an easy way to move the virtio-net code from qemu into the kernel, without changing its behaviour. It should even straightforward to do live-migration between hosts with and without it, something that would be much harder with the virtio-over-PCI logic. Also, its internal state is local to the process owning its file descriptor, which makes it much easier to manage permissions and cleanup of its resources. Arnd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/18/2009 11:59 PM, Ira W. Snyder wrote: On a non shared-memory system (where the guest's RAM is not just a chunk of userspace RAM in the host system), virtio's management model seems to fall apart. Feature negotiation doesn't work as one would expect. In your case, virtio-net on the main board accesses PCI config space registers to perform the feature negotiation; software on your PCI cards needs to trap these config space accesses and respond to them according to virtio ABI. (There's no real guest on your setup, right? just a kernel running on and x86 system and other kernels running on the PCI cards?) This does appear to be solved by vbus, though I haven't written a vbus-over-PCI implementation, so I cannot be completely sure. Even if virtio-pci doesn't work out for some reason (though it should), you can write your own virtio transport and implement its config space however you like. I'm not at all clear on how to get feature negotiation to work on a system like mine. From my study of lguest and kvm (see below) it looks like userspace will need to be involved, via a miscdevice. I don't see why. Is the kernel on the PCI cards in full control of all accesses? Ok. I thought I should at least express my concerns while we're discussing this, rather than being too late after finding the time to study the driver. Off the top of my head, I would think that transporting userspace addresses in the ring (for copy_(to|from)_user()) vs. physical addresses (for DMAEngine) might be a problem. Pinning userspace pages into memory for DMA is a bit of a pain, though it is possible. Oh, the ring doesn't transport userspace addresses. It transports guest addresses, and it's up to vhost to do something with them. Currently vhost supports two translation modes: 1. virtio address == host virtual address (using copy_to_user) 2. virtio address == offsetted host virtual address (using copy_to_user) The latter mode is used for kvm guests (with multiple offsets, skipping some details). I think you need to add a third mode, virtio address == host physical address (using dma engine). Once you do that, and wire up the signalling, things should work. There is also the problem of different endianness between host and guest in virtio-net. The struct virtio_net_hdr (include/linux/virtio_net.h) defines fields in host byte order. Which totally breaks if the guest has a different endianness. This is a virtio-net problem though, and is not transport specific. Yeah. You'll need to add byteswaps. I've browsed over both the kvm and lguest code, and it looks like they each re-invent a mechanism for transporting interrupts between the host and guest, using eventfd. They both do this by implementing a miscdevice, which is basically their management interface. See drivers/lguest/lguest_user.c (see write() and LHREQ_EVENTFD) and kvm-kmod-devel-88/x86/kvm_main.c (see kvm_vm_ioctl(), called via kvm_dev_ioctl()) for how they hook up eventfd's. I can now imagine how two userspace programs (host and guest) could work together to implement a management interface, including hotplug of devices, etc. Of course, this would basically reinvent the vbus management interface into a specific driver. You don't need anything in the guest userspace (virtio-net) side. I think this is partly what Greg is trying to abstract out into generic code. I haven't studied the actual data transport mechanisms in vbus, though I have studied virtio's transport mechanism. I think a generic management interface for virtio might be a good thing to consider, because it seems there are at least two implementations already: kvm and lguest. Management code in the kernel doesn't really help unless you plan to manage things with echo and cat. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/19/2009 12:26 AM, Avi Kivity wrote: Off the top of my head, I would think that transporting userspace addresses in the ring (for copy_(to|from)_user()) vs. physical addresses (for DMAEngine) might be a problem. Pinning userspace pages into memory for DMA is a bit of a pain, though it is possible. Oh, the ring doesn't transport userspace addresses. It transports guest addresses, and it's up to vhost to do something with them. Currently vhost supports two translation modes: 1. virtio address == host virtual address (using copy_to_user) 2. virtio address == offsetted host virtual address (using copy_to_user) The latter mode is used for kvm guests (with multiple offsets, skipping some details). I think you need to add a third mode, virtio address == host physical address (using dma engine). Once you do that, and wire up the signalling, things should work. You don't need in fact a third mode. You can mmap the x86 address space into your ppc userspace and use the second mode. All you need then is the dma engine glue and byte swapping. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Tue, Aug 18, 2009 at 11:57:48PM +0300, Michael S. Tsirkin wrote: On Tue, Aug 18, 2009 at 08:53:29AM -0700, Ira W. Snyder wrote: I think Greg is referring to something like my virtio-over-PCI patch. I'm pretty sure that vhost is completely useless for my situation. I'd like to see vhost work for my use, so I'll try to explain what I'm doing. I've got a system where I have about 20 computers connected via PCI. The PCI master is a normal x86 system, and the PCI agents are PowerPC systems. The PCI agents act just like any other PCI card, except they are running Linux, and have their own RAM and peripherals. I wrote a custom driver which imitated a network interface and a serial port. I tried to push it towards mainline, and DavidM rejected it, with the argument, use virtio, don't add another virtualization layer to the kernel. I think he has a decent argument, so I wrote virtio-over-PCI. Now, there are some things about virtio that don't work over PCI. Mainly, memory is not truly shared. It is extremely slow to access memory that is far away, meaning across the PCI bus. This can be worked around by using a DMA controller to transfer all data, along with an intelligent scheme to perform only writes across the bus. If you're careful, reads are never needed. So, in my system, copy_(to|from)_user() is completely wrong. There is no userspace, only a physical system. Can guests do DMA to random host memory? Or is there some kind of IOMMU and DMA API involved? If the later, then note that you'll still need some kind of driver for your device. The question we need to ask ourselves then is whether this driver can reuse bits from vhost. Mostly. All of my systems are 32 bit (both x86 and ppc). From the view of the ppc (and DMAEngine), I can view the first 1GB of host memory. This limited view is due to address space limitations on the ppc. The view of PCI memory must live somewhere in the ppc address space, along with the ppc's SDRAM, flash, and other peripherals. Since this is a 32bit processor, I only have 4GB of address space to work with. The PCI address space could be up to 4GB in size. If I tried to allow the ppc boards to view all 4GB of PCI address space, then they would have no address space left for their onboard SDRAM, etc. Hopefully that makes sense. I use dma_set_mask(dev, DMA_BIT_MASK(30) on the host system to ensure that when dma_map_sg() is called, it returns addresses that can be accessed directly by the device. The DMAEngine can access any local (ppc) memory without any restriction. I have used the Linux DMAEngine API (include/linux/dmaengine.h) to handle all data transfer across the PCI bus. The Intel I/OAT (and many others) use the same API. In fact, because normal x86 computers do not have DMA controllers, the host system doesn't actually handle any data transfer! Is it true that PPC has to initiate all DMA then? How do you manage not to do DMA reads then? Yes, the ppc initiates all DMA. It handles all data transfer (both reads and writes) across the PCI bus, for speed reasons. A CPU cannot create burst transactions on the PCI bus. This is the reason that most (all?) network cards (as a familiar example) use DMA to transfer packet contents into RAM. Sorry if I made a confusing statement (no reads are necessary) earlier. What I meant to say was: If you are very careful, it is not necessary for the CPU to do any reads over the PCI bus to maintain state. Writes are the only necessary CPU-initiated transaction. I implemented this in my virtio-over-PCI patch, copying as much as possible from the virtio vring structure. The descriptors in the rings are only changed by one side of the connection, therefore they can be cached as they are written (via the CPU) across the PCI bus, with the knowledge that both sides will have a consistent view. I'm sorry, this is hard to explain via email. It is much easier in a room with a whiteboard. :) I used virtio-net in both the guest and host systems in my example virtio-over-PCI patch, and succeeded in getting them to communicate. However, the lack of any setup interface means that the devices must be hardcoded into both drivers, when the decision could be up to userspace. I think this is a problem that vbus could solve. What you describe (passing setup from host to guest) seems like a feature that guest devices need to support. It seems unlikely that vbus, being a transport layer, can address this. I think I explained this poorly as well. Virtio needs two things to function: 1) a set of descriptor rings (1 or more) 2) a way to kick each ring. With the amount of space available in the ppc's PCI BAR's (which point at a small chunk of SDRAM), I could potentially make ~6 virtqueues + 6 kick interrupts available. Right now, my virtio-over-PCI driver hardcoded the first and second virtqueues to be for virtio-net only, and nothing else. What if the user wanted 2
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Wed, Aug 19, 2009 at 12:26:23AM +0300, Avi Kivity wrote: On 08/18/2009 11:59 PM, Ira W. Snyder wrote: On a non shared-memory system (where the guest's RAM is not just a chunk of userspace RAM in the host system), virtio's management model seems to fall apart. Feature negotiation doesn't work as one would expect. In your case, virtio-net on the main board accesses PCI config space registers to perform the feature negotiation; software on your PCI cards needs to trap these config space accesses and respond to them according to virtio ABI. Is this real PCI (physical hardware) or fake PCI (software PCI emulation) that you are describing? The host (x86, PCI master) must use real PCI to actually configure the boards, enable bus mastering, etc. Just like any other PCI device, such as a network card. On the guests (ppc, PCI agents) I cannot add/change PCI functions (the last .[0-9] in the PCI address) nor can I change PCI BAR's once the board has started. I'm pretty sure that would violate the PCI spec, since the PCI master would need to re-scan the bus, and re-assign addresses, which is a task for the BIOS. (There's no real guest on your setup, right? just a kernel running on and x86 system and other kernels running on the PCI cards?) Yes, the x86 (PCI master) runs Linux (booted via PXELinux). The ppc's (PCI agents) also run Linux (booted via U-Boot). They are independent Linux systems, with a physical PCI interconnect. The x86 has CONFIG_PCI=y, however the ppc's have CONFIG_PCI=n. Linux's PCI stack does bad things as a PCI agent. It always assumes it is a PCI master. It is possible for me to enable CONFIG_PCI=y on the ppc's by removing the PCI bus from their list of devices provided by OpenFirmware. They can not access PCI via normal methods. PCI drivers cannot work on the ppc's, because Linux assumes it is a PCI master. To the best of my knowledge, I cannot trap configuration space accesses on the PCI agents. I haven't needed that for anything I've done thus far. This does appear to be solved by vbus, though I haven't written a vbus-over-PCI implementation, so I cannot be completely sure. Even if virtio-pci doesn't work out for some reason (though it should), you can write your own virtio transport and implement its config space however you like. This is what I did with virtio-over-PCI. The way virtio-net negotiates features makes this work non-intuitively. I'm not at all clear on how to get feature negotiation to work on a system like mine. From my study of lguest and kvm (see below) it looks like userspace will need to be involved, via a miscdevice. I don't see why. Is the kernel on the PCI cards in full control of all accesses? I'm not sure what you mean by this. Could you be more specific? This is a normal, unmodified vanilla Linux kernel running on the PCI agents. Ok. I thought I should at least express my concerns while we're discussing this, rather than being too late after finding the time to study the driver. Off the top of my head, I would think that transporting userspace addresses in the ring (for copy_(to|from)_user()) vs. physical addresses (for DMAEngine) might be a problem. Pinning userspace pages into memory for DMA is a bit of a pain, though it is possible. Oh, the ring doesn't transport userspace addresses. It transports guest addresses, and it's up to vhost to do something with them. Currently vhost supports two translation modes: 1. virtio address == host virtual address (using copy_to_user) 2. virtio address == offsetted host virtual address (using copy_to_user) The latter mode is used for kvm guests (with multiple offsets, skipping some details). I think you need to add a third mode, virtio address == host physical address (using dma engine). Once you do that, and wire up the signalling, things should work. Ok. In my virtio-over-PCI patch, I hooked two virtio-net's together. I wrote an algorithm to pair the tx/rx queues together. Since virtio-net pre-fills its rx queues with buffers, I was able to use the DMA engine to copy from the tx queue into the pre-allocated memory in the rx queue. I have an intuitive idea about how I think vhost-net works in this case. There is also the problem of different endianness between host and guest in virtio-net. The struct virtio_net_hdr (include/linux/virtio_net.h) defines fields in host byte order. Which totally breaks if the guest has a different endianness. This is a virtio-net problem though, and is not transport specific. Yeah. You'll need to add byteswaps. I wonder if Rusty would accept a new feature: VIRTIO_F_NET_LITTLE_ENDIAN, which would allow the virtio-net driver to use LE for all of it's multi-byte fields. I don't think the transport should have to care about the endianness. I've browsed over both the kvm and lguest code, and it looks like they each re-invent a mechanism for transporting interrupts between
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On Wed, Aug 19, 2009 at 01:06:45AM +0300, Avi Kivity wrote: On 08/19/2009 12:26 AM, Avi Kivity wrote: Off the top of my head, I would think that transporting userspace addresses in the ring (for copy_(to|from)_user()) vs. physical addresses (for DMAEngine) might be a problem. Pinning userspace pages into memory for DMA is a bit of a pain, though it is possible. Oh, the ring doesn't transport userspace addresses. It transports guest addresses, and it's up to vhost to do something with them. Currently vhost supports two translation modes: 1. virtio address == host virtual address (using copy_to_user) 2. virtio address == offsetted host virtual address (using copy_to_user) The latter mode is used for kvm guests (with multiple offsets, skipping some details). I think you need to add a third mode, virtio address == host physical address (using dma engine). Once you do that, and wire up the signalling, things should work. You don't need in fact a third mode. You can mmap the x86 address space into your ppc userspace and use the second mode. All you need then is the dma engine glue and byte swapping. Hmm, I'll have to think about that. The ppc is a 32-bit processor, so it has 4GB of address space for everything, including PCI, SDRAM, flash memory, and all other peripherals. This is exactly like 32bit x86, where you cannot have a PCI card that exposes a 4GB PCI BAR. The system would have no address space left for its own SDRAM. On my x86 computers, I only have 1GB of physical RAM, and so the ppc's have plenty of room in their address spaces to map the entire x86 RAM into their own address space. That is exactly what I do now. Accesses to ppc physical address 0x8000 magically hit x86 physical address 0x0. Ira -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Alacrityvm-devel] [PATCH v3 3/6] vbus: add a vbus-proxy bus model for vbus_driver objects
On 08/19/2009 03:44 AM, Ira W. Snyder wrote: You don't need in fact a third mode. You can mmap the x86 address space into your ppc userspace and use the second mode. All you need then is the dma engine glue and byte swapping. Hmm, I'll have to think about that. The ppc is a 32-bit processor, so it has 4GB of address space for everything, including PCI, SDRAM, flash memory, and all other peripherals. This is exactly like 32bit x86, where you cannot have a PCI card that exposes a 4GB PCI BAR. The system would have no address space left for its own SDRAM. (you actually can, since x86 has a 36-40 bit physical address space even with a 32-bit virtual address space, but that doesn't help you). On my x86 computers, I only have 1GB of physical RAM, and so the ppc's have plenty of room in their address spaces to map the entire x86 RAM into their own address space. That is exactly what I do now. Accesses to ppc physical address 0x8000 magically hit x86 physical address 0x0. So if you mmap() that, you could work with virtual addresses. It may be more efficient to work with physical addresses directly though. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html