Re: [PATCH] Add I/O hypercalls for i386 paravirt
> We have an X driver that does minimal performance costing operations. > As we should and will have for our other drivers. Ok, so you use your own DDX and prevent X vgacrapware to kick in ? Makes sense. Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Benjamin Herrenschmidt wrote: On Wed, 2007-08-22 at 16:25 +1000, Rusty Russell wrote: On Wed, 2007-08-22 at 08:34 +0300, Avi Kivity wrote: Zachary Amsden wrote: This patch provides hypercalls for the i386 port I/O instructions, which vastly helps guests which use native-style drivers. For certain VMI workloads, this provides a performance boost of up to 30%. We expect KVM and lguest to be able to achieve similar gains on I/O intensive workloads. Won't these workloads be better off using paravirtualized drivers? i.e., do the native drivers with paravirt I/O instructions get anywhere near the performance of paravirt drivers? This patch also means I can kill off the emulation code in drivers/lguest/core.c, which is a real relief. Hrm... how do you deal with X doing IOs ? Ben. We have an X driver that does minimal performance costing operations. As we should and will have for our other drivers. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
On Wed, 2007-08-22 at 16:25 +1000, Rusty Russell wrote: > On Wed, 2007-08-22 at 08:34 +0300, Avi Kivity wrote: > > Zachary Amsden wrote: > > > This patch provides hypercalls for the i386 port I/O instructions, > > > which vastly helps guests which use native-style drivers. For certain > > > VMI workloads, this provides a performance boost of up to 30%. We > > > expect KVM and lguest to be able to achieve similar gains on I/O > > > intensive workloads. > > > > Won't these workloads be better off using paravirtualized drivers? > > i.e., do the native drivers with paravirt I/O instructions get anywhere > > near the performance of paravirt drivers? > > This patch also means I can kill off the emulation code in > drivers/lguest/core.c, which is a real relief. Hrm... how do you deal with X doing IOs ? Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
On Tue, 2007-08-21 at 22:23 -0700, Zachary Amsden wrote: > In general, I/O in a virtual guest is subject to performance problems. > The I/O can not be completed physically, but must be virtualized. This > means trapping and decoding port I/O instructions from the guest OS. > Not only is the trap for a #GP heavyweight, both in the processor and > the hypervisor (which usually has a complex #GP path), but this forces > the hypervisor to decode the individual instruction which has faulted. > Worse, even with hardware assist such as VT, the exit reason alone is > not sufficient to determine the true nature of the faulting instruction, > requiring a complex and costly instruction decode and simulation. .../... How about userland ? Things like X do IO's typically... You still need to trap/emulate for these no ? Cheers, Ben. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Hi! > >>In general, I/O in a virtual guest is subject to > >>performance problems. The I/O can not be completed > >>physically, but must be virtualized. This > >>means trapping and decoding port I/O instructions from > >>the guest OS. Not only is the trap for a #GP > >>heavyweight, both in the processor and > >>the hypervisor (which usually has a complex #GP path), > >>but this forces > >>the hypervisor to decode the individual instruction > >>which has faulted. Worse, even with hardware assist > >>such as VT, the exit reason alone is > >>not sufficient to determine the true nature of the > >>faulting instruction, > >>requiring a complex and costly instruction decode and > >>simulation. > >> > >>This patch provides hypercalls for the i386 port I/O > >>instructions, which > >>vastly helps guests which use native-style drivers. > >>For certain VMI > >>workloads, this provides a performance boost of up to > >>30%. We expect > >>KVM and lguest to be able to achieve similar gains on > >>I/O intensive > >>workloads. > >> > >> > > > >What about cost on hardware? > > > > On modern hardware, port I/O is about the most expensive > thing you can do. The extra function call cost is > totally masked by the stall. We have measured with port > I/O converted like this on real hardware, and have seen > zero measurable impact on macro-benchmarks. > Micro-benchmarks that generate massively repeated port > I/O might show some effect on ancient hardware, but I > can't even imagine a workload which does such a thing, > other than a polling port I/O loop perhaps - which would > not be performance critical in any case I can reasonably > imagine. SCSI controller in ISA slot? IDE without DMA enabled? Yes, those are performance-critical. The second case seems common with compactflash cards. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
On Wed, 2007-08-22 at 22:25 +0100, Alan Cox wrote: > > I still think it's preferable to change some drivers than everybody. > > > > AFAIK BusLogic as real hardware is pretty much dead anyways, > > so you're probably the only primary user of it anyways. > > Go wild on it! > > I don't believe anyone is materially maintaining the buslogic driver and > in time its going to break completely. > > > Well that might be. I just think it would be a mistake > > to design paravirt_ops based on someone's short term release engineering > > considerations. > > Agreed, especially as an interface where each in or out traps into the > hypervisor is broken even for the model of virtualising hardware. I'd really like lguest guests not to do ins and outs, but that's likely to be more invasive a change than this. We do it to find the PCI bus IIRC, and a couple of other early probe bits. It's just unfortunate that it's the one place lguest has to emulate because of lack of paravirt_ops coverage. Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
On Wed, Aug 22, 2007 at 05:38:31PM -0700, Jeremy Fitzhardinge wrote: > Andi Kleen wrote: > > On Wed, Aug 22, 2007 at 04:14:41PM -0700, Jeremy Fitzhardinge wrote: > > > >> (which would also have VT, since > >> all new processors do). > >> > > > > Not true unfortunately. The Intel low end parts like Celerons (which > > are actually shipped in very large numbers) don't. Also Intel > > is still shipping some CPUs that don't support it at all, like > > the ULV Centrinos which are based on an older core. > > > > Likely to be missing VT-d too, right? VT-d is chipset functionality. So it depends on the chipset. At least initially the non Intel chipsets and lowend chips are unlikely to get IOMMUs I guess. There might be some exceptions. e.g. the GPU vendors seem to want to to their own IOMMUs, so perhaps graphic devices might have them anyways. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Andi Kleen wrote: > On Wed, Aug 22, 2007 at 04:14:41PM -0700, Jeremy Fitzhardinge wrote: > >> (which would also have VT, since >> all new processors do). >> > > Not true unfortunately. The Intel low end parts like Celerons (which > are actually shipped in very large numbers) don't. Also Intel > is still shipping some CPUs that don't support it at all, like > the ULV Centrinos which are based on an older core. > Likely to be missing VT-d too, right? J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
On Wed, Aug 22, 2007 at 04:14:41PM -0700, Jeremy Fitzhardinge wrote: > (which would also have VT, since > all new processors do). Not true unfortunately. The Intel low end parts like Celerons (which are actually shipped in very large numbers) don't. Also Intel is still shipping some CPUs that don't support it at all, like the ULV Centrinos which are based on an older core. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
James Courtier-Dutton wrote: > Ok, so I need to get a new CPU like the Intel Core Duo that has VT > features? I have an old Pentium 4 at the moment, without any VT features. > No, VT-d (as opposed to VT) is a chipset feature which allows the hypervisor to control who's allowed to DMA where. So you'd need a very new machine with a VT-d capable chipset (which would also have VT, since all new processors do). J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Chris Wright wrote: > * James Courtier-Dutton ([EMAIL PROTECTED]) wrote: >> If one could directly expose a device to the guest, this feature could >> be extremely useful for me. >> Is it possible? How would it manage to handle the DMA bus mastering? > > Yes it's possible (Xen supports pci pass through). Without an IOMMU > (like Intel VT-d or AMD IOMMU) it's not DMA safe. > > thanks, > -chris Ok, so I need to get a new CPU like the Intel Core Duo that has VT features? I have an old Pentium 4 at the moment, without any VT features. Kind Regards James - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
* James Courtier-Dutton ([EMAIL PROTECTED]) wrote: > Ok, so I need to get a new CPU like the Intel Core Duo that has VT > features? I have an old Pentium 4 at the moment, without any VT features. Depends on your goals. You can certainly give a paravirt Xen guest[1] physical hardware without any VT extentions. But that guest will be able to DMA anywhere in memory without VT-d, so if it's an untrusted guest you'd be taking a huge risk. thanks, -chris [1] Note: this is with the xenbits.xensource.com kernel, not with a kernel you'll get from kernel.org ATM. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Jeremy Fitzhardinge wrote: > Zachary Amsden wrote: >> This patch provides hypercalls for the i386 port I/O instructions, >> which vastly helps guests which use native-style drivers. For certain >> VMI workloads, this provides a performance boost of up to 30%. We >> expect KVM and lguest to be able to achieve similar gains on I/O >> intensive workloads. > > Two comments: > > - I should dust off my "break up paravirt_ops" patch, and this would fit > nicely into it (I think we already discussed this) > > - What happens if you *don't* want to pv some of the io instructions? > What if you have a device which is directly exposed to the guest? If one could directly expose a device to the guest, this feature could be extremely useful for me. Is it possible? How would it manage to handle the DMA bus mastering? James - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
* James Courtier-Dutton ([EMAIL PROTECTED]) wrote: > If one could directly expose a device to the guest, this feature could > be extremely useful for me. > Is it possible? How would it manage to handle the DMA bus mastering? Yes it's possible (Xen supports pci pass through). Without an IOMMU (like Intel VT-d or AMD IOMMU) it's not DMA safe. thanks, -chris - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Zachary Amsden wrote: > This patch provides hypercalls for the i386 port I/O instructions, > which vastly helps guests which use native-style drivers. For certain > VMI workloads, this provides a performance boost of up to 30%. We > expect KVM and lguest to be able to achieve similar gains on I/O > intensive workloads. Two comments: - I should dust off my "break up paravirt_ops" patch, and this would fit nicely into it (I think we already discussed this) - What happens if you *don't* want to pv some of the io instructions? What if you have a device which is directly exposed to the guest? J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Andi Kleen wrote: We might benefit from it, but would the BusLogic driver? It sets a nasty precedent for maintenance as different hypervisors and emulators hack up different drivers for their own performance. I still think it's preferable to change some drivers than everybody. AFAIK BusLogic as real hardware is pretty much dead anyways, so you're probably the only primary user of it anyways. Go wild on it! It is looking juicy. Maybe another day. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Alan Cox wrote: I still think it's preferable to change some drivers than everybody. AFAIK BusLogic as real hardware is pretty much dead anyways, so you're probably the only primary user of it anyways. Go wild on it! I don't believe anyone is materially maintaining the buslogic driver and in time its going to break completely. I think I was actually the last person to touch it ;) Well that might be. I just think it would be a mistake to design paravirt_ops based on someone's short term release engineering considerations. Agreed, especially as an interface where each in or out traps into the hypervisor is broken even for the model of virtualising hardware. Well, it's not necessarily broken, it's just a different model. At some point the cost of maintaining a whole suite of virtual drivers becomes greater than leveraging a bunch of legacy drivers. If you can eliminate most of the performance cost of that by changing something at a layer below (port I/O), it is a win even if it is not a perfect solution. But I think I've lost the argument anyways; it doesn't seem to be for the greater good of Linux, and there are alternatives we can take. Unfortunately for me, they require a lot more work. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
> I still think it's preferable to change some drivers than everybody. > > AFAIK BusLogic as real hardware is pretty much dead anyways, > so you're probably the only primary user of it anyways. > Go wild on it! I don't believe anyone is materially maintaining the buslogic driver and in time its going to break completely. > Well that might be. I just think it would be a mistake > to design paravirt_ops based on someone's short term release engineering > considerations. Agreed, especially as an interface where each in or out traps into the hypervisor is broken even for the model of virtualising hardware. > I thought we had that already? But can't find it now :/ pci_iomap() and friends. Alan - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
> No, you can't ignore it. The page protections won't change between the > GP and the decoder execution, but the instruction can, causing you to > decode into the next page where the processor would not have. !P > becomes obvious, but failure to respect NX or U/S is an exploitable > bug. Put a 1 byte instruction at the end of a page crossing into a NX > (or supervisor page). Remotely, change keep switching between the > instruction and a segment override. Then you do this check only if the instruction crosses a page. That is very cheap to test. > Result: user executes instruction on supervisor code page, learning data > as a result of this; code on NX page gets executed. Most systems probably have gaps between user and supervisor (like Linux), but ok. > We already have drivers for all of our hardware in Linux. Most of the > hardware we emulate is physical hardware, and there are no virtual > drivers for it. Should we take the BusLogic driver and "paravirtualize" > it by adding VMI hypercalls? You're proposing instead to paravirtualize all drivers, even if 99.99% of those will never ever have a driver model. > We might benefit from it, but would the > BusLogic driver? It sets a nasty precedent for maintenance as different > hypervisors and emulators hack up different drivers for their own > performance. I still think it's preferable to change some drivers than everybody. AFAIK BusLogic as real hardware is pretty much dead anyways, so you're probably the only primary user of it anyways. Go wild on it! If you worry about it do your own drivers like the other hypervisors. I still suspect you could go faster if you use a paravirtualy optimized driver, but and I'm not going to speculate on the reasons why you don't want to do that. > Our SCSI and IDE emulation and thus the drivers used by Linux are pretty > much fixed in stone; we are not going to go about changing a tricky > hardware interface to a virtual one, it is simply too risky for You wouldn't need to change it; just add a very simple new one (e.g. the lguest interface is nearly trivial) > There is great advantage in talking to our existing device layer faster, > and this is something that is valuable today. Well that might be. I just think it would be a mistake to design paravirt_ops based on someone's short term release engineering considerations. > > >Really LinuxHAL^wparavirt ops is already so complicated that > >any new hooks need an extremly good justification and that is > >just not here for this. > > > >We can add it if you find an equivalent number of hooks > >to eliminate. > > > > Interesting trade. What if I sanitized the whole I/O messy macros into > something fun and friendly: That would be a cool project anyways. e.g. just moving the NUMAQ support separately would clean it up. But probably not enough on its own, sorry. But you'll need separate interfaces anyways if you want to go down the BusLogic change path. That could well be coupled with a cleanup. > We might even be able to get rid of the umpteen different > places where drivers wrap iospace access with their own byte / word / > long functions so they can switch between port I/O and memory mapped I/O > by moving it all into common infrastructure. I thought we had that already? But can't find it now :/ > > We could make similar (unwelcome?) advances on the pte functions if it > were not for the regrettable disconnect between pte_high / pte_low and > the rest. Perhaps if it was hidden in macros? You want to do what exactly? If you mean PAE and non PAE In the same binary: that would likely need abstracted page tables first. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Andi Kleen wrote: How is that measured? In a loop? In the same pipeline state? It seems a little dubious to me. I did the experiments in a controlled environment, with interrupts disabled and care to get the pipeline in the same state. It was a perfectly repeatable experiment. I don't have exact cycle time anymore, but they were the tightest measurements I've even seen on cycle counts because of the unique nature of serializing the processor for the fault / privilege transition. I tested a variety of different conditions, including different types of #GP (yes, the cost does vary), #NP, #PF, sysenter, int $0xxx. Sysenter was the fastest, by far. Int was about 5x the cost. #GP and friends were all about similar costs. #PF was the most expensive. to verify protection in the page tables mapping the page allows execution (P, !NX, and U/S check). This is a lot more expensive than a When the page is not executable or not present you get #PF not #GP. So the hardware already checks that. The only case where you would need to check yourself is if you emulate NX on non NX capable hardware, but I can't see you doing that. No, it doesn't. Between the #GP and decode, you have an SMP race where another processor can rewrite the instruction. That can be ignored imho. If the page goes away you'll notice when you handle the page fault on read. If it becomes NX then the execution just happened to be logically a little earlier. No, you can't ignore it. The page protections won't change between the GP and the decoder execution, but the instruction can, causing you to decode into the next page where the processor would not have. !P becomes obvious, but failure to respect NX or U/S is an exploitable bug. Put a 1 byte instruction at the end of a page crossing into a NX (or supervisor page). Remotely, change keep switching between the instruction and a segment override. Result: user executes instruction on supervisor code page, learning data as a result of this; code on NX page gets executed. Or easier to just write a backend for the lguest virtio drivers, that will be likely faster in the end anyways than this gross hack. We already have drivers for all of our hardware in Linux. Most of the hardware we emulate is physical hardware, and there are no virtual drivers for it. Should we take the BusLogic driver and "paravirtualize" it by adding VMI hypercalls? We might benefit from it, but would the BusLogic driver? It sets a nasty precedent for maintenance as different hypervisors and emulators hack up different drivers for their own performance. Our SCSI and IDE emulation and thus the drivers used by Linux are pretty much fixed in stone; we are not going to go about changing a tricky hardware interface to a virtual one, it is simply too risky for something as critical as storage. We might be able to move our network driver over to virtio, but that is not a short-term prospect either. There is great advantage in talking to our existing device layer faster, and this is something that is valuable today. Really LinuxHAL^wparavirt ops is already so complicated that any new hooks need an extremly good justification and that is just not here for this. We can add it if you find an equivalent number of hooks to eliminate. Interesting trade. What if I sanitized the whole I/O messy macros into something fun and friendly: native_port_in(int port, iosize_t opsize, int delay) native_port_out(int port, iosize_t opsize, u32 output, int delay) native_port_string_in(int port, void *ptr, iosize_t opsize, unsigned count, int delay) native_port_string_out(int port, void *ptr, iosize_t opsize, unsigned count, int delay) Then we can be rid of all the macro goo in io.h, which frightens my mother. We might even be able to get rid of the umpteen different places where drivers wrap iospace access with their own byte / word / long functions so they can switch between port I/O and memory mapped I/O by moving it all into common infrastructure. We could make similar (unwelcome?) advances on the pte functions if it were not for the regrettable disconnect between pte_high / pte_low and the rest. Perhaps if it was hidden in macros? Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
On Wed, Aug 22, 2007 at 10:07:47AM -0700, Zachary Amsden wrote: > >Also I fail to see the fundamental speed difference between > > > >mov index,register > >int 0x... > >... > >switch (register) > >case : do emulation > > > > Int (on p4 == ~680 cycles). > > >versus > > > >out ... > >#gp > >-> switch (*eip) { > >case 0xee: /* etc. */ > > do emulation > > > > GP = ~2000 cycles. How is that measured? In a loop? In the same pipeline state? It seems a little dubious to me. > > >>to verify protection in the page tables mapping the page allows > >>execution (P, !NX, and U/S check). This is a lot more expensive than a > >> > > > >When the page is not executable or not present you get #PF not #GP. > >So the hardware already checks that. > > > >The only case where you would need to check yourself is if you emulate > >NX on non NX capable hardware, but I can't see you doing that. > > > > No, it doesn't. Between the #GP and decode, you have an SMP race where > another processor can rewrite the instruction. That can be ignored imho. If the page goes away you'll notice when you handle the page fault on read. If it becomes NX then the execution just happened to be logically a little earlier. My other objection to this scheme is that you'll change a zillion drivers you'll never emulate which seems just stupid. You could just change the small handful that you emulate to use hypercalls. Or easier to just write a backend for the lguest virtio drivers, that will be likely faster in the end anyways than this gross hack. Really LinuxHAL^wparavirt ops is already so complicated that any new hooks need an extremly good justification and that is just not here for this. We can add it if you find an equivalent number of hooks to eliminate. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
> out is usually a single byte. Shouldn't be very expensive > to decode. In fact it should be roughly equivalent to your > hypercall multiplex. Why is a performance critical path on a paravirt kernel even using I/O instructions and not paravirtual device drivers ? It clearly makes sense to virtualise I/O operations if you are doing that (so you can do posting, triggers and predicted reply handling guest side to keep the trap rate sane) but I don't see why this situation occurs in the first place for paravirt. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Andi Kleen wrote: On Wed, Aug 22, 2007 at 09:48:25AM -0700, Zachary Amsden wrote: Andi Kleen wrote: On Tue, Aug 21, 2007 at 10:23:14PM -0700, Zachary Amsden wrote: In general, I/O in a virtual guest is subject to performance problems. The I/O can not be completed physically, but must be virtualized. This means trapping and decoding port I/O instructions from the guest OS. Not only is the trap for a #GP heavyweight, both in the processor and the hypervisor (which usually has a complex #GP path), but this forces the hypervisor to decode the individual instruction which has faulted. Is that really that expensive? Hard to imagine. You have an expensive (16x cost of hypercall on some processors) Where is the difference comming from? Are you using SYSENTER for the hypercall? I can't really see you using SYSENTER, because how would you do system calls then? I bet system calls are more frequent than in/out, so if you have decide between the two using them for syscalls is likely faster. We use sysenter for hypercalls and also for system calls. :) Also I fail to see the fundamental speed difference between mov index,register int 0x... ... switch (register) case : do emulation Int (on p4 == ~680 cycles). versus out ... #gp -> switch (*eip) { case 0xee: /* etc. */ do emulation GP = ~2000 cycles. to verify protection in the page tables mapping the page allows execution (P, !NX, and U/S check). This is a lot more expensive than a When the page is not executable or not present you get #PF not #GP. So the hardware already checks that. The only case where you would need to check yourself is if you emulate NX on non NX capable hardware, but I can't see you doing that. No, it doesn't. Between the #GP and decode, you have an SMP race where another processor can rewrite the instruction. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
On Wed, Aug 22, 2007 at 09:48:25AM -0700, Zachary Amsden wrote: > Andi Kleen wrote: > >On Tue, Aug 21, 2007 at 10:23:14PM -0700, Zachary Amsden wrote: > > > >>In general, I/O in a virtual guest is subject to performance problems. > >>The I/O can not be completed physically, but must be virtualized. This > >>means trapping and decoding port I/O instructions from the guest OS. > >>Not only is the trap for a #GP heavyweight, both in the processor and > >>the hypervisor (which usually has a complex #GP path), but this forces > >>the hypervisor to decode the individual instruction which has faulted. > >> > > > >Is that really that expensive? Hard to imagine. > > > > You have an expensive (16x cost of hypercall on some processors) Where is the difference comming from? Are you using SYSENTER for the hypercall? I can't really see you using SYSENTER, because how would you do system calls then? I bet system calls are more frequent than in/out, so if you have decide between the two using them for syscalls is likely faster. For an int XYZ gate i wouldn't expect that much difference to a #GP fault. Also I fail to see the fundamental speed difference between mov index,register int 0x... ... switch (register) case : do emulation versus out ... #gp -> switch (*eip) { case 0xee: /* etc. */ do emulation > privilege transition, you have to decode the instruction, then you have out is usually a single byte. Shouldn't be very expensive to decode. In fact it should be roughly equivalent to your hypercall multiplex. > to verify protection in the page tables mapping the page allows > execution (P, !NX, and U/S check). This is a lot more expensive than a When the page is not executable or not present you get #PF not #GP. So the hardware already checks that. The only case where you would need to check yourself is if you emulate NX on non NX capable hardware, but I can't see you doing that. > There are 24 different possible I/O operations; sometimes with a port > encoded in the instruction, sometimes with input in the DX register, > sometimes with a rep prefix, and for 3 different operand sizes. Most of this is a single byte which is the same as the hypercall demux. Essentially a table lookup if you use the obvious switch() -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Andi Kleen wrote: On Tue, Aug 21, 2007 at 10:23:14PM -0700, Zachary Amsden wrote: In general, I/O in a virtual guest is subject to performance problems. The I/O can not be completed physically, but must be virtualized. This means trapping and decoding port I/O instructions from the guest OS. Not only is the trap for a #GP heavyweight, both in the processor and the hypervisor (which usually has a complex #GP path), but this forces the hypervisor to decode the individual instruction which has faulted. Is that really that expensive? Hard to imagine. You have an expensive (16x cost of hypercall on some processors) privilege transition, you have to decode the instruction, then you have to verify protection in the page tables mapping the page allows execution (P, !NX, and U/S check). This is a lot more expensive than a hypercall. e.g. you could always have a fast check for inb/outb at the beginning of the #GP handler. And is your initial #GP entry really more expensive than a hypercall? The number of reasons for #GP is enormous, and there are too many paths to optimize with fast checks. We do have a fast check for inb/outb; it's just not fast enough. On P4, hypercall entry is 120 cycles. #GP is about 2000. Modern processors are better, but a hypercall is always faster than a fault. Many times, the hypercall can be handled and ready to return before a #GP would even complete. On workloads that sit there and hammer network cards, these costs become significant, and latency sensitive network benchmarks suffer. Worse, even with hardware assist such as VT, the exit reason alone is not sufficient to determine the true nature of the faulting instruction, requiring a complex and costly instruction decode and simulation. It's unclear to me why that should be that costly. Worst case it's a switch() There are 24 different possible I/O operations; sometimes with a port encoded in the instruction, sometimes with input in the DX register, sometimes with a rep prefix, and for 3 different operand sizes. Combine that with the MMU checks required, and it's complex and branchy enough to justify short-circuiting the whole thing with a simple hypercall. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Avi Kivity wrote: Since this is only for newer kernels, won't updating the driver to use a hypercall be more efficient? Or is this for existing out-of-tree drivers? Actually, it is for in-tree drivers that we emulate but don't want to pollute, and one out of tree driver (that will hopefully be in tree soon!) that has no way to determine if making hypercalls is acceptable. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Avi Kivity wrote: And even then, all the performance sensitive stuff uses mmio, no? Depends on the hardware. Jeff - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Andi Kleen wrote: Ah. But that's mostly modules, so real in-core changes should be very Yes that's the big difference. Near all paravirt ops are concentrated on the core kernel, but this one affects lots of people. And why "but"? -- modules are as important as the core kernel. They're not second citizens. It's not being second class; simply few modules are loaded at runtime, so most of the code impact is on disk. The in-code impact is small. If paravirt i/o insns are worthwhile, I don't think code size is an issue. -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
> Ah. But that's mostly modules, so real in-core changes should be very Yes that's the big difference. Near all paravirt ops are concentrated on the core kernel, but this one affects lots of people. And why "but"? -- modules are as important as the core kernel. They're not second citizens. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Andi Kleen wrote: On Wed, Aug 22, 2007 at 01:23:43PM +0300, Avi Kivity wrote: Andi Kleen wrote: I don't see why it's intrusive -- they all use the APIs, right? Yes, but it still changes them. It might have a larger impact on code size for example. Only if CONFIG_PARAVIRT is defined. Which eventually distribution kernels will do. And even then, all the performance sensitive stuff uses mmio, no? Not worried about performance, but just impact on code size etc. Ah. But that's mostly modules, so real in-core changes should be very small (say 10 bytes per call site X 10 callsites per driver X 10 drivers... even if off by an order of magnitude it's still tiny) -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
On Wed, Aug 22, 2007 at 01:23:43PM +0300, Avi Kivity wrote: > Andi Kleen wrote: > >>I don't see why it's intrusive -- they all use the APIs, right? > >> > > > >Yes, but it still changes them. It might have a larger impact > >on code size for example. > > > > Only if CONFIG_PARAVIRT is defined. Which eventually distribution kernels will do. > And even then, all the performance > sensitive stuff uses mmio, no? Not worried about performance, but just impact on code size etc. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Andi Kleen wrote: I don't see why it's intrusive -- they all use the APIs, right? Yes, but it still changes them. It might have a larger impact on code size for example. Only if CONFIG_PARAVIRT is defined. And even then, all the performance sensitive stuff uses mmio, no? -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
> I don't see why it's intrusive -- they all use the APIs, right? Yes, but it still changes them. It might have a larger impact on code size for example. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Andi Kleen wrote: This patch also means I can kill off the emulation code in drivers/lguest/core.c, which is a real relief. But would it be faster? If not or only insignificant amount I think I would prefer you keep it. Hooking IO is quite intrusive because it's done by so many drivers. I don't see why it's intrusive -- they all use the APIs, right? -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
> This patch also means I can kill off the emulation code in > drivers/lguest/core.c, which is a real relief. But would it be faster? If not or only insignificant amount I think I would prefer you keep it. Hooking IO is quite intrusive because it's done by so many drivers. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
On Tue, Aug 21, 2007 at 10:23:14PM -0700, Zachary Amsden wrote: > In general, I/O in a virtual guest is subject to performance problems. > The I/O can not be completed physically, but must be virtualized. This > means trapping and decoding port I/O instructions from the guest OS. > Not only is the trap for a #GP heavyweight, both in the processor and > the hypervisor (which usually has a complex #GP path), but this forces > the hypervisor to decode the individual instruction which has faulted. Is that really that expensive? Hard to imagine. e.g. you could always have a fast check for inb/outb at the beginning of the #GP handler. And is your initial #GP entry really more expensive than a hypercall? > Worse, even with hardware assist such as VT, the exit reason alone is > not sufficient to determine the true nature of the faulting instruction, > requiring a complex and costly instruction decode and simulation. It's unclear to me why that should be that costly. Worst case it's a switch() -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Zachary Amsden wrote: Avi Kivity wrote: Zachary Amsden wrote: In general, I/O in a virtual guest is subject to performance problems. The I/O can not be completed physically, but must be virtualized. This means trapping and decoding port I/O instructions from the guest OS. Not only is the trap for a #GP heavyweight, both in the processor and the hypervisor (which usually has a complex #GP path), but this forces the hypervisor to decode the individual instruction which has faulted. Worse, even with hardware assist such as VT, the exit reason alone is not sufficient to determine the true nature of the faulting instruction, requiring a complex and costly instruction decode and simulation. This patch provides hypercalls for the i386 port I/O instructions, which vastly helps guests which use native-style drivers. For certain VMI workloads, this provides a performance boost of up to 30%. We expect KVM and lguest to be able to achieve similar gains on I/O intensive workloads. Won't these workloads be better off using paravirtualized drivers? i.e., do the native drivers with paravirt I/O instructions get anywhere near the performance of paravirt drivers? Yes, in general, this is true (better off with paravirt drivers). However, we have "paravirt" drivers which run in both fully-paravirtualized and fully traditionally virtualized environments. As a result, they use native port I/O operations to interact with virtual hardware. Suffering from terminology overdose here: "fully traditionally virtualized, fully-paravirtuallized, para-fullyvirtualized". Since this is only for newer kernels, won't updating the driver to use a hypercall be more efficient? Or is this for existing out-of-tree drivers? Since not all hypervisors have paravirtualized driver infrastructures and guest O/S support yet, these hypercalls can be advantages to a wide range of scenarios. Using I/O hypercalls as such gives exactly the same performance as paravirt drivers for us, by eliminating the costly decode path, and the simplicity of using the same driver code makes this a huge win in code complexity. Ah, seems the answer to the last question is yes. -- error compiling committee.c: too many arguments to function - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
On Wed, 2007-08-22 at 08:34 +0300, Avi Kivity wrote: > Zachary Amsden wrote: > > This patch provides hypercalls for the i386 port I/O instructions, > > which vastly helps guests which use native-style drivers. For certain > > VMI workloads, this provides a performance boost of up to 30%. We > > expect KVM and lguest to be able to achieve similar gains on I/O > > intensive workloads. > > Won't these workloads be better off using paravirtualized drivers? > i.e., do the native drivers with paravirt I/O instructions get anywhere > near the performance of paravirt drivers? This patch also means I can kill off the emulation code in drivers/lguest/core.c, which is a real relief. Cheers, Rusty. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
H. Peter Anvin wrote: Zachary Amsden wrote: In general, I/O in a virtual guest is subject to performance problems. The I/O can not be completed physically, but must be virtualized. This means trapping and decoding port I/O instructions from the guest OS. Not only is the trap for a #GP heavyweight, both in the processor and the hypervisor (which usually has a complex #GP path), but this forces the hypervisor to decode the individual instruction which has faulted. Worse, even with hardware assist such as VT, the exit reason alone is not sufficient to determine the true nature of the faulting instruction, requiring a complex and costly instruction decode and simulation. This patch provides hypercalls for the i386 port I/O instructions, which vastly helps guests which use native-style drivers. For certain VMI workloads, this provides a performance boost of up to 30%. We expect KVM and lguest to be able to achieve similar gains on I/O intensive workloads. What about cost on hardware? On modern hardware, port I/O is about the most expensive thing you can do. The extra function call cost is totally masked by the stall. We have measured with port I/O converted like this on real hardware, and have seen zero measurable impact on macro-benchmarks. Micro-benchmarks that generate massively repeated port I/O might show some effect on ancient hardware, but I can't even imagine a workload which does such a thing, other than a polling port I/O loop perhaps - which would not be performance critical in any case I can reasonably imagine. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Zachary Amsden wrote: > In general, I/O in a virtual guest is subject to performance problems. > The I/O can not be completed physically, but must be virtualized. This > means trapping and decoding port I/O instructions from the guest OS. > Not only is the trap for a #GP heavyweight, both in the processor and > the hypervisor (which usually has a complex #GP path), but this forces > the hypervisor to decode the individual instruction which has faulted. > Worse, even with hardware assist such as VT, the exit reason alone is > not sufficient to determine the true nature of the faulting instruction, > requiring a complex and costly instruction decode and simulation. > > This patch provides hypercalls for the i386 port I/O instructions, which > vastly helps guests which use native-style drivers. For certain VMI > workloads, this provides a performance boost of up to 30%. We expect > KVM and lguest to be able to achieve similar gains on I/O intensive > workloads. > What about cost on hardware? -hpa - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Avi Kivity wrote: Zachary Amsden wrote: In general, I/O in a virtual guest is subject to performance problems. The I/O can not be completed physically, but must be virtualized. This means trapping and decoding port I/O instructions from the guest OS. Not only is the trap for a #GP heavyweight, both in the processor and the hypervisor (which usually has a complex #GP path), but this forces the hypervisor to decode the individual instruction which has faulted. Worse, even with hardware assist such as VT, the exit reason alone is not sufficient to determine the true nature of the faulting instruction, requiring a complex and costly instruction decode and simulation. This patch provides hypercalls for the i386 port I/O instructions, which vastly helps guests which use native-style drivers. For certain VMI workloads, this provides a performance boost of up to 30%. We expect KVM and lguest to be able to achieve similar gains on I/O intensive workloads. Won't these workloads be better off using paravirtualized drivers? i.e., do the native drivers with paravirt I/O instructions get anywhere near the performance of paravirt drivers? Yes, in general, this is true (better off with paravirt drivers). However, we have "paravirt" drivers which run in both fully-paravirtualized and fully traditionally virtualized environments. As a result, they use native port I/O operations to interact with virtual hardware. Since not all hypervisors have paravirtualized driver infrastructures and guest O/S support yet, these hypercalls can be advantages to a wide range of scenarios. Using I/O hypercalls as such gives exactly the same performance as paravirt drivers for us, by eliminating the costly decode path, and the simplicity of using the same driver code makes this a huge win in code complexity. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Add I/O hypercalls for i386 paravirt
Zachary Amsden wrote: > In general, I/O in a virtual guest is subject to performance > problems. The I/O can not be completed physically, but must be > virtualized. This means trapping and decoding port I/O instructions > from the guest OS. Not only is the trap for a #GP heavyweight, both > in the processor and the hypervisor (which usually has a complex #GP > path), but this forces the hypervisor to decode the individual > instruction which has faulted. Worse, even with hardware assist such > as VT, the exit reason alone is not sufficient to determine the true > nature of the faulting instruction, requiring a complex and costly > instruction decode and simulation. > > This patch provides hypercalls for the i386 port I/O instructions, > which vastly helps guests which use native-style drivers. For certain > VMI workloads, this provides a performance boost of up to 30%. We > expect KVM and lguest to be able to achieve similar gains on I/O > intensive workloads. > Won't these workloads be better off using paravirtualized drivers? i.e., do the native drivers with paravirt I/O instructions get anywhere near the performance of paravirt drivers? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/