Re: A set of standard virtual devices?
On Wednesday 04 April 2007, H. Peter Anvin wrote: Configuration space access is platform-dependent. It's only defined to work in a specific way on x86 platforms. Interrupt swizzling is really totally independent of PCI. ALL PCI really provides is up to four interrupts per device (not counting MSI/MSI-X) and an 8-bit writable field which the platform can choose to use to hold interrupt information. That's all. The rest is all platform information. PCI enumeration is hardly complex. Most of the stuff that doesn't apply to you you can generally ignore, as is done by other busses like HyperTransport when they emulate PCI. You still don't get my point: On a platform that doesn't have interrupt numbers, and where most of the fields in the config space don't correspond do anything that is already there, you really don't want to invent a set of new hcalls that implement emulation, to get something as simple as a pipe. wc drivers/pci/*.[ch] include/asm-i386/{pci,io}.h lib/iomap*.c \ arch/i386/pci/*.c kernel/irq/*.c 17015 59037 463967 total Even if you only need half of that code in reality, reimplementing all that in both the kernel and in the hypervisor is an enourmous effort. We've seen that before on the ps3, which initially faked a virtual PCI bus just for the USB controller, but doing something like that requires adding abstraction layers, to decide whether to implement e.g. an inb as a hypercall or as a memory read. That being said, on platforms which are PCI-centric, such as x86, this of course makes it a lot easier to produce virtual devices which work across hypervisors, since the device model, of *any* operating system is set up to handle them. Yes, as I said there are two separate problems. I really think that a standardized virtual driver interface should be modeled after kernel - user interfaces, not hardware - kernel interfaces. Once we know what operations we want (e.g. read, write and SIGIO, or some other set of primitives), it will be good to provide a virtual PCI device that can be used as one transport mechanism below it. Using PCI device IDs to tell what functionality is provided by the device would provide a reasonable method for autoprobing. Arnd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
On Tuesday 03 April 2007 10:29:06 Christian Borntraeger wrote: On Monday 02 April 2007 23:12, Andi Kleen wrote: How would that work in the case where virtualized guests don't have a visible PCI bus, and the virtual environment doesn't pretend to emulate a PCI bus? If they emulated one with the appropiate device then distribution driver auto probing would just work transparently for them. Still, that would only make sense for virtualized platforms that usually have a PCI bus. Thinking about seeing a PCI device on ,lets say, s390 is strange. If it gets the job done surely you can tolerate a little strangeness? -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
On s390, it would be more than strangeness. There's no implementation of PCI at all, someone would have to cook it up - and it wouldn't have any use beyond those special devices. Since there isn't any bus type that is available on *all* architectures, a generic virtual bus with very simple probing seems much saner... You just have to change all the distribution installers then. Ok I suppose on s390 that's not that big issue because there are not that many for s390. But for x86 there exist quite a lot. I suppose it's easier to change it in the kernel. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
On Tuesday 03 April 2007, H. Peter Anvin wrote: However, one probably wants to think about what the heck one actually means with virtualization in the absence of a lot of this stuff. PCI is probably the closest thing we have to a lowest common denominator for device detection. I think that's true outside of s390, but a standardized virtual device interface should be able to work there as well. Interestingly, the s390 channel I/O also uses two 16 bit numbers to identify a device (type and model), just like PCI or USB, so in that light, we might be able to use the same number space for something entirely different depending on the virtual bus. Arnd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
On Tue, 3 Apr 2007 11:41:49 +0200, Arnd Bergmann [EMAIL PROTECTED] wrote: On Tuesday 03 April 2007, H. Peter Anvin wrote: However, one probably wants to think about what the heck one actually means with virtualization in the absence of a lot of this stuff. PCI is probably the closest thing we have to a lowest common denominator for device detection. I think that's true outside of s390, but a standardized virtual device interface should be able to work there as well. Interestingly, the s390 channel I/O also uses two 16 bit numbers to identify a device (type and model), just like PCI or USB, so in that light, we might be able to use the same number space for something entirely different depending on the virtual bus. Even if we used those ids for cu_type and dev_type, it would still be ugly IMO. It would be much cleaner to just define a very simple, easy to implement virtual bus without dragging implementation details for other types of devices around. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
On Tue, 3 Apr 2007 11:26:52 +0200, Andi Kleen [EMAIL PROTECTED] wrote: On s390, it would be more than strangeness. There's no implementation of PCI at all, someone would have to cook it up - and it wouldn't have any use beyond those special devices. Since there isn't any bus type that is available on *all* architectures, a generic virtual bus with very simple probing seems much saner... You just have to change all the distribution installers then. Ok I suppose on s390 that's not that big issue because there are not that many for s390. But for x86 there exist quite a lot. I suppose it's easier to change it in the kernel. Huh? I don't follow you here. Why should this be easier for s390 vs. x86? (And since there seems to be a trend to use HAL as a device discovery tool recently: A new bus type is easy enough to add there.) And I really think we should have a clean design in the kernel instead of trying to wedge virtual devices into a known system. Exposing virtual devices (which may be handled totally differently) as PCI devices just seems hackish to me. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
On Tuesday 03 April 2007, Cornelia Huck wrote: I think that's true outside of s390, but a standardized virtual device interface should be able to work there as well. Interestingly, the s390 channel I/O also uses two 16 bit numbers to identify a device (type and model), just like PCI or USB, so in that light, we might be able to use the same number space for something entirely different depending on the virtual bus. Even if we used those ids for cu_type and dev_type, it would still be ugly IMO. It would be much cleaner to just define a very simple, easy to implement virtual bus without dragging implementation details for other types of devices around. Right, but an interesting point is the question what to do when running another operating system as a guest under Linux, e.g. with kvm. Ideally, you'd want to use the same interface to announce the presence of the device, which can be done far more easily with PCI than using a new bus type that you'd need to implement for every OS, instead of just implementing the virtual PCI driver. Using a 16 bit number to identify a specific interface sounds like a good idea to me, if only for the reason that it is a widely used approach. The alternative would be to use an ascii string, like we have for open-firmware devices on powerpc or sparc. I think in either way, we need to abstract the driver for the virtual device from the underlying bus infrastructure, which is hypervisor and/or platform dependent. The abstraction could work roughly like this: == virt_dev.h == struct virt_driver { /* platform independent */ struct device_driver drv; struct pci_device_id *ids; /* not necessarily PCI */ }; struct virt_bus { /* platform dependent */ long (*transfer)(struct virt_dev *dev, void *buffer, unsigned long size, int type); }; struct virt_dev { struct device dev; struct virt_driver *driver; struct virt_bus *bus; struct pci_device_id id; int irq; }; == virt_example.c == static ssize_t virt_pipe_read(struct file *filp, char __user *buffer, size_t len, loff_t *off) { struct virt_dev *dev = filp-private_data; ssize_t ret = dev-bus-transfer(dev, buffer, len, READ); *off += ret; return ret; } static struct file_operations virt_pipe_fops = { .open = nonseekable_open, .read = virt_pipe_read, }; static int virt_pipe_probe(struct device *dev) { struct virt_dev *vdev = to_virt_dev(dev); struct miscdev *mdev = kmalloc(sizeof(*dev), GFP_KERNEL); mdev-name = virt_pipe; mdev-fops = virt_pipe_fops; mdev-parent = dev; return register_miscdev(mdev); } static struct pci_device_id virt_pipe_id = { .vendor = PCI_VENDOR_LINUX, .device = 0x3456, }; MODULE_DEVICE_TABLE(pci, virt_pipe_id); static struct virt_driver virt_pipe_driver = { .drv = { .name = virt_pipe, .probe = virt_pipe_probe, }, .ids = virt_pipe_id, } static int virt_pipe_init(void) { return virt_driver_register(virt_pipe_driver); } module_init(virt_pipe_init); == virt_devtree.c == static long virt_devtree_transfer(struct virt_dev *dev, void *buffer, unsigned long size, int type) { long reg; switch type { case READ: ret = hcall(HV_READ, dev-dev.platform_data, buffer, size); break; case WRITE: ret = hcall(HV_WRITE, dev-dev.platform_data, buffer, size); break; default: BUG(); } return ret; } static struct virt_bus virt_devtree_bus = { .transfer = virt_devtree_transfer, }; static int virt_devtree_probe(struct of_device *ofdev, struct of_device_id *match) { struct virt_dev *vdev = kzalloc(sizeof(*vdev); vdev-bus = virt_devtree_bus; vdev-dev.parent = ofdev-dev; vdev.id.vendor = PCI_VENDOR_LINUX; vdev.id.device = *of_get_property(ofdev, virt_dev_id), vdev.irq = of_irq_parse_and_map(ofdev, 0); return device_register(vdev-dev); } struct of_device_id virt_devtree_ids = { .compatible = virt-dev, }; static struct of_platform_driver virt_devtree_driver = { .probe = virt_devtree_probe, .match_table = virt_devtree_ids, }; == virt_pci.c == static long virt_pci_transfer(struct virt_dev *dev, void *buffer, unsigned long size, int type) { struct virt_pci_regs __iomem *regs = dev-dev.platform_data; switch type { case READ: mmio_insb(regs-read_port, buffer, size); break; case WRITE: mmio_outsb(regs-write_port, buffer, size); break; default: BUG(); }
Re: A set of standard virtual devices?
On Tuesday 03 April 2007, Cornelia Huck wrote: On Tue, 3 Apr 2007 14:15:37 +0200, Arnd Bergmann [EMAIL PROTECTED] wrote: That's OK for a virtualized architecture where the base architecture already supports PCI. But a traditional s390 OS would be as unhappy with a PCI device as with a device of a completely new type :) Sure, that was my point from the start. There are several options for virtualized devices (and I don't know why they shouldn't coexist): 1. Emulate a well-known device (like a e1000 network card on PCI or a model 3390 dasd on CCW). Existing operating systems can just use them, but it's a lot of work in the hypervisor. Most hypervisors already do this, and it's an unrelated topic. What we're trying to achieve is to make sure not every hypervisor and simulator has to introduce its own set of drivers. struct virt_bus { /* platform dependent */ long (*transfer)(struct virt_dev *dev, void *buffer, unsigned long size, int type); }; Should this embed a struct bus_type? Or reference a generic_virt_bus? yes, that should embed the bus_type. struct virt_dev { struct device dev; struct virt_driver *driver; struct virt_bus *bus; struct pci_device_id id; int irq; }; And that's where I have problems :) The notion of irq is far too platform specific. I can bend my mind round using PCI-like ids for non-PCI virtualized devices, but an integer is far too small and to specific for a way to access the device. Sorry, I've been working too long on the lesser architectures. IRQ number are evil indeed. However, I'm pretty sure that we need _some_ abstraction of an interrupt mechanism here. The easiest way is probably to have a callback function like int (*irq_handler)(struct virt_dev*, unsigned long message); in the virt_dev. Arnd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
On Tue, Apr 03, 2007 at 11:26:52AM +0200, Andi Kleen wrote: On s390, it would be more than strangeness. There's no implementation of PCI at all, someone would have to cook it up - and it wouldn't have any use beyond those special devices. Since there isn't any bus type that is available on *all* architectures, a generic virtual bus with very simple probing seems much saner... You just have to change all the distribution installers then. Ok I suppose on s390 that's not that big issue because there are not that many for s390. But for x86 there exist quite a lot. I suppose it's easier to change it in the kernel. I don't get this point. Compared to whatever will be done in the kernel, any change to a distribution installer should be trivial. And a new release of a distribution with a new kernel might anyway usually require some updates to an installer. -Andi cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
Arnd Bergmann wrote: I think we need to separate two problems here: 1. Probing: That's really what triggered the discussion, PCI probing is well-understood and implemented on _most_ platforms, so there is some value in reusing it. When you talk about 'very simple probing', I'm not sure what the most simple approach could be. Is probing an interesting problem to consider on its own? If there's some hypervisor-agnostic device driver in Linux, then obviously it needs some way to find the the corresponding (virtual) hardware for it to talk to. But that probing mechanism will depend on the actual interface structure, and is just one of the many problems that need to be solved. There's no point in overloading PCI to probe for the device unless you're actually using PCI to talk to the device. Ideas that have been implemented before include: a) have a limited set of device IDs (e.g. 65535 devices, or a hierarchic tree), and try to access each one of them in order to find out if it's there. We do that for PCI or CCW, for instance. b) Have an iterator in the hypervisor (or firmware), to return a handle to the first, next or child of a device. We do that for open firmware. c) ask the hypervisor for an unused device of a given class, which needs to be returned to the hypervisor when no longer used. This is how the PS3 hypervisor works, but it does not play well with the Linux driver model. Xen has xenbus, which is essentially a filesystem-like namespace which can be walked to find the devices being exposed to a guest. It is fairly similar to OFW's device tree. 2. Device access: When talking to a virtual device, you want to have at least a way to give commands to it and a way to get interrupts back. Again, multiple ideas have been used in the past, and we should choose a subset: Let me say up front that I'm skeptical that we can come up with a single bus-like abstraction which can be a both simple and efficient interface to all the virtual architectures. I think a more fruitful path is to find what pieces of functionality can be made common, with the aim of having small, simple and self-contained hypervisor-specific backends. I think this needs to be considered on a class by class basis. This thread started with a discussion about entropy sources. In theory you could implement it as simply as exposing a mmaped ringbuffer. There are some extra complexities deriving from the security requirements though; for example, all the entropy needs to be kept strictly private to the domain that consumes it. But beyond that, there are 3 other important classes of device: * console * disk * networking (There are obviously more, but these are the must-have.) Console already provides us with a model to work on, in the form of hvc-console. The hvc-console code itself has the bulk of the common console code, along with a set of very small hypervisor-specific backends. The Xen console implementation shrunk considerably when we switched to using it. If we could do the same thing with disk and net, I would be very happy. For example, if we wanted to change the Xen frontend/backend disk interface, we could use SCSI as the basic protocol, and then convert netfront into a relatively simple scsi driver. There would still be a Xen-specific piece, but it should be fairly small and have a clean interface. Though the existing interface is pretty simple shove-this-block-there affair. I'm not sure what similar common code could be extracted for network devices. I haven't looked into it all that closely. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
Arnd Bergmann wrote: We already have device drivers for physical devices that can be attached to different buses. The EHCI USB is an example of a driver that can be for instance PCI, OF or an on-chip device. Moreover, you can have an abstracted device behind it that does not need to know about the transport, like the SCSI disk driver does not care if it is talking to an ATA, parallel SCSI or SAS chip, or even which controller that is. Yes, that kind of layering is useful when there's enough of an abstraction gap to fit the layers into. USB is particularly simple in that way, since it can be made to travel nicely over any number of transports. console is also the least problematic interface, you can do it over practically anything. Sure. But its interesting that there are savings to be had. Doing a SCSI driver has been tried before, with ibmvscsi. Not good. OK, interesting. People had proposed using SCSI as the interface, but I wasn't aware of any results from doing that. How is it not good? The interesting question about block devices is how to handle concurrency and interrupt mitigation. An efficient interface should - have asynchronous notification, not sleep until the transfer is complete - allow multiple blocks to be in flight simultaneously, so the host can reorder the requests if it is smart enough - give only a single interrupt when multiple transfers have completed Yes. The Xen block interface is already pretty efficient in these respects. I'm not sure what similar common code could be extracted for network devices. I haven't looked into it all that closely. One way to do networking would be to simply provide a shared memory area that everyone can write to, then use a ring buffer and atomic operations to synchronize between the guests, and a method to send interrupts to the others for flow control. Yes, and that's the core of the Xen netfront. But is there really much code which can be shared between different hypervisors? When you get down to it, all the real code is hypervisor-specific stuff for setting up ringbuffers and dealing with interrupts. Like all the other network drivers. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
Jeremy Fitzhardinge wrote: Yes, and that's the core of the Xen netfront. But is there really much code which can be shared between different hypervisors? When you get down to it, all the real code is hypervisor-specific stuff for setting up ringbuffers and dealing with interrupts. Like all the other network drivers. One thing, Jeremy, which I think is being a bit misleading here: you're focusing on big, performance-critical stuff. Those things are going to be the ones which has the most win to implement in hypervisor-specific ways. Although we can offer models for some hypervisors (and G-d knows there are enough implementations out there of virtual disk which are almost identical), they're clearly not going to be universal. However, there are other things; console is some, or my original example, which was random number generation. For those, the benefit of unification is proportionally greater, simply because the win of anything hypervisor-specific is much smaller. -hpa ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote: That said, something like USB is probably the best bet for this kind of low-performance device. I think. Not that I really know anything about USB. USB has the disadvantage that it is more complex than PCI and requires significantly more code to simulate on the host side. On the plus side, I think it should be possible to implement a virtual USB host on s390, which is not possible with PCI, but that again takes a lot of work to implement. One interesting aspect of the PS3 hypervisor is that some of the low-speed interfaces are implemented as a virtual UART, meaning something that only has read and write operations and uses an interrupt for flow control. The implementation in drivers/ps3/vuart.c is probably more complex than what we want as a generic transport mechanism, but simply having a bidirectional data stream sounds like an ideal abstraction for the simple case. Some more or less obvious users of this include: - console - additional tty - random - slow network (using ppp) - printer - watchdog - hid (e.g. mouse) - system management (like ps3) - fast network (in combination with shared memory segment) The transport can be hypervisor specific, e.g. there could be a virtual PCI serial port on kvm, an hcall interface on the ps3 and a virtual CTC on s390 (kidding), while all of them can have the same kind of hardware _behind_ the serial connection. Arnd ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
Arnd Bergmann wrote: On Wednesday 04 April 2007, H. Peter Anvin wrote: Note that at least for PIO-based devices, there is nothing that says you can't implement PCI over another transport, if you wish. It's really just a very simple RPC protocol. The PIO aspect of PCI is simple, yes, except on architectures that don't have the concept of PIO or even uncached memory, but even that can be done by defining readl/writel/inl/outl/... as hcalls. The tricky part about PCI is the device probing, everything about config space accesses, interrupt swizzling, bus/device/function numbers and base address registers becomes a pointless excercise when the other side is just faking it. Configuration space access is platform-dependent. It's only defined to work in a specific way on x86 platforms. Interrupt swizzling is really totally independent of PCI. ALL PCI really provides is up to four interrupts per device (not counting MSI/MSI-X) and an 8-bit writable field which the platform can choose to use to hold interrupt information. That's all. The rest is all platform information. PCI enumeration is hardly complex. Most of the stuff that doesn't apply to you you can generally ignore, as is done by other busses like HyperTransport when they emulate PCI. That being said, on platforms which are PCI-centric, such as x86, this of course makes it a lot easier to produce virtual devices which work across hypervisors, since the device model, of *any* operating system is set up to handle them. -hpa ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
On Monday 02 April 2007 23:33:01 Jeff Garzik wrote: Andi Kleen wrote: How would that work in the case where virtualized guests don't have a visible PCI bus, and the virtual environment doesn't pretend to emulate a PCI bus? If they emulated one with the appropiate device then distribution driver auto probing would just work transparently for them. Yes, but, ideally with paravirtualization you should be able to avoid the overhead of emulating many major classes of device (storage, network, RNG, etc.) by developing a low-overhead passthrough interface that does not involve PCI at all. The implementation wouldn't need to use PCI at all. There wouldn't even need to be PCI like registers internally. Just a pci device with an ID somewhere in sysfs. PCI with unique IDs is just a convenient and well established key into the driver module collection. Once you have the right driver it can do what it wants. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
Jeremy Fitzhardinge wrote: Andi Kleen wrote: The implementation wouldn't need to use PCI at all. There wouldn't even need to be PCI like registers internally. Just a pci device with an ID somewhere in sysfs. PCI with unique IDs is just a convenient and well established key into the driver module collection. Once you have the right driver it can do what it wants. But I understood hpa's suggestion to mean that there would be a standard PCI interface for a hardware RNG, and a single linux driver for that device, which all hypervisors would be expected to implement. But that's only reasonable if the virtualization environment has some notion of PCI to expose to the Linux guest. The actual PCI bus could paravirtualized. It's just a question of whether one reinvents a device discovery mechanism (like XenBus) or whether one piggy backs on existing mechanisms. Furthermore, in the future, I strongly suspect that HVM will become much more important for Xen than PV and since that already has a PCI bus it's not really that big of a deal. Regards, Anthony Liguori J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: A set of standard virtual devices?
H. Peter Anvin wrote: However, one probably wants to think about what the heck one actually means with virtualization in the absence of a lot of this stuff. PCI is probably the closest thing we have to a lowest common denominator for device detection. Sure, but let's look beyond device detection. For instance, it does not necessarily follow that emulating PCI DMA is the best way to go for communication with a virtual device, once detected. Outside of pci_device_id driver matching, is there much value here? Jeff ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization