Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops
Nakajima, Jun wrote: > To me such atomicity is provided by the "sti" instruction (i.e. the > processor begins responding to external, maskable interrupts _after_ the > next instruction is executed), and there is nothing special with that > combination "sti; hlt" (you can also have like "sti; ret", for example). > Sure, but there's no particular value in "sti; ret". While the sti mask window works everywhere, its only cases like "sti; hlt" where it's needed to avoid a race condition. > So if you define a PV ops like STI(next_instruction), "safe_halt" for > the native should be defined as STI("hlt"), and inlined as "sti; hlt". > That's only meaningful if the pv_op is implemented directly in x86 instructions - ie, the native (or almost native) case. > If it's hard or we don't need to expose the semantics of "sti" other > than that, I think it's okay to have a PV operation for safe_halt. > Yeah, the general form would be hard to support for a hypervisor. Xen, for example, has an "atomically enable events and block" operation, but no other "atomically enable events and do X" operations. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops
Jeremy Fitzhardinge wrote: > Nakajima, Jun wrote: > > Yes. For the native, "safe_halt" is "sti; hlt". The "native_halt" is > > just "hlt". So the para_virt part of "hlt" could be moved to pv_cpu_ops, > > and the "sti" part stays in pv_irq_ops. > > > > By "sti part", you mean the full "sti; hlt" sequence of safe_halt, > right? Since it needs to be an atomic sequence to avoid race > conditions, so the native sequence has to be precisely "sti; hlt" to > take advantage of the sti shadow, and other pv-backends will need their > own way to guarantee this atomicity. To me such atomicity is provided by the "sti" instruction (i.e. the processor begins responding to external, maskable interrupts _after_ the next instruction is executed), and there is nothing special with that combination "sti; hlt" (you can also have like "sti; ret", for example). So if you define a PV ops like STI(next_instruction), "safe_halt" for the native should be defined as STI("hlt"), and inlined as "sti; hlt". If it's hard or we don't need to expose the semantics of "sti" other than that, I think it's okay to have a PV operation for safe_halt. Jun --- Intel Open Source Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops
Nakajima, Jun wrote: > Yes. For the native, "safe_halt" is "sti; hlt". The "native_halt" is > just "hlt". So the para_virt part of "hlt" could be moved to pv_cpu_ops, > and the "sti" part stays in pv_irq_ops. > By "sti part", you mean the full "sti; hlt" sequence of safe_halt, right? Since it needs to be an atomic sequence to avoid race conditions, so the native sequence has to be precisely "sti; hlt" to take advantage of the sti shadow, and other pv-backends will need their own way to guarantee this atomicity. But I'm quite happy to put plain "hlt" into cpu_ops as halt_cpu() or something (and perhaps rename safe_halt to something a bit more descriptive). > Actually my concern was that such misc ops might grow to include the > things don't fit well anywhere else. To me, then pv_lazy_ops (with just > .set_mode) might be better. > The lazy interface has needed a rethink anyway. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops
Jeremy Fitzhardinge wrote: > Nakajima, Jun wrote: > > Jeremy Fitzhardinge wrote: > > > > > > > + .pv_irq_ops = { > > > + .init_IRQ = native_init_IRQ, > > > + .save_fl = native_save_fl, > > > + .restore_fl = native_restore_fl, > > > + .irq_disable = native_irq_disable, > > > + .irq_enable = native_irq_enable, > > > + .safe_halt = native_safe_halt, > > > + .halt = native_halt, > > > + }, > > > > > > > I think the halt stuff should be moved to pv_cpu_ops? > > > You mean halt's alternate "shutdown vcpu" meaning if you call it with > interrupts disabled? Yeah, I'd be happy to have an explicit op for > that, rather than making it a secondary overloaded meaning. And use > "safe_halt" for all uses of "wait for next interrupt". Yes. For the native, "safe_halt" is "sti; hlt". The "native_halt" is just "hlt". So the para_virt part of "hlt" could be moved to pv_cpu_ops, and the "sti" part stays in pv_irq_ops. > > > > + .pv_misc_ops = { > > > + .set_lazy_mode = paravirt_nop, > > > + }, > > > > > > > Or you can split it to pv_cpu_ops and pv_mmu_ops, assuming that they > > don't need to interact with each other in terms of the lazy handling. > > > > You mean have separate lazy_mmu and lazy_cpu (lazy_context_switch) ops? > Possible, but they're still exclusive. (I think VMI, at least, assumes > that you can't have lazy_mmu and lazy_cpu active at the same time, and > its nice to enforce this in the interface.) Okay I understand what you are saying. > > But having a whole misc structure for this interface is pretty warty, I > admit. > > J Actually my concern was that such misc ops might grow to include the things don't fit well anywhere else. To me, then pv_lazy_ops (with just .set_mode) might be better. Jun --- Intel Open Source Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops
Nakajima, Jun wrote: > Jeremy Fitzhardinge wrote: > >> This patch refactors the paravirt_ops structure into groups of >> functionally related ops: >> >> pv_info - random info, rather than function entrypoints >> pv_init_ops - functions used at boot time (some for module_init too) >> pv_misc_ops - lazy mode, which didn't fit well anywhere else >> pv_time_ops - time-related functions >> pv_cpu_ops - various privileged instruction ops >> pv_irq_ops - operations for managing interrupt state >> pv_apic_ops - APIC operations >> pv_mmu_ops - operations for managing pagetables >> >> > > Good. These make sense to me. > > >> +.pv_irq_ops = { >> + .init_IRQ = native_init_IRQ, >> + .save_fl = native_save_fl, >> + .restore_fl = native_restore_fl, >> + .irq_disable = native_irq_disable, >> + .irq_enable = native_irq_enable, >> + .safe_halt = native_safe_halt, >> + .halt = native_halt, >> + }, >> > > I think the halt stuff should be moved to pv_cpu_ops? > You mean halt's alternate "shutdown vcpu" meaning if you call it with interrupts disabled? Yeah, I'd be happy to have an explicit op for that, rather than making it a secondary overloaded meaning. And use "safe_halt" for all uses of "wait for next interrupt". >> +.pv_misc_ops = { >> + .set_lazy_mode = paravirt_nop, >> + }, >> > > Or you can split it to pv_cpu_ops and pv_mmu_ops, assuming that they > don't need to interact with each other in terms of the lazy handling. > You mean have separate lazy_mmu and lazy_cpu (lazy_context_switch) ops? Possible, but they're still exclusive. (I think VMI, at least, assumes that you can't have lazy_mmu and lazy_cpu active at the same time, and its nice to enforce this in the interface.) But having a whole misc structure for this interface is pretty warty, I admit. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops
Jeremy Fitzhardinge wrote: > This patch refactors the paravirt_ops structure into groups of > functionally related ops: > > pv_info - random info, rather than function entrypoints > pv_init_ops - functions used at boot time (some for module_init too) > pv_misc_ops - lazy mode, which didn't fit well anywhere else > pv_time_ops - time-related functions > pv_cpu_ops - various privileged instruction ops > pv_irq_ops - operations for managing interrupt state > pv_apic_ops - APIC operations > pv_mmu_ops - operations for managing pagetables > Good. These make sense to me. > + .pv_irq_ops = { > + .init_IRQ = native_init_IRQ, > + .save_fl = native_save_fl, > + .restore_fl = native_restore_fl, > + .irq_disable = native_irq_disable, > + .irq_enable = native_irq_enable, > + .safe_halt = native_safe_halt, > + .halt = native_halt, > + }, I think the halt stuff should be moved to pv_cpu_ops? > + .pv_misc_ops = { > + .set_lazy_mode = paravirt_nop, > + }, Or you can split it to pv_cpu_ops and pv_mmu_ops, assuming that they don't need to interact with each other in terms of the lazy handling. Jun --- Intel Open Source Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops
On Fri, 2007-09-28 at 11:49 -0700, Jeremy Fitzhardinge wrote: > > We shouldn't need to export pv_init_ops. > > No. The only ones I export are: > > EXPORT_SYMBOL_GPL(pv_time_ops); > EXPORT_SYMBOL_GPL(pv_cpu_ops); > EXPORT_SYMBOL_GPL(pv_mmu_ops); > EXPORT_SYMBOL_GPL(pv_apic_ops); > EXPORT_SYMBOL(pv_irq_ops); Nicely done. I know of some out of tree modules which use part of the pv_cpu_ops and pv_mmu_ops, but we should not worry about such things, and it turns out those modules don't need to be virtualized anyway. > > > > It is debatable whether > > CR2/CR3 should be part of CPU or MMU ops. > > Yeah, I was in two minds. CR3, at least, should be grouped with the > other tlb operations, wherever they go. And while they're privileged > CPU instructions (cpu_ops), they're more logically related to the rest > of the mmu state. On the other hand, we could have an ops structure > specifically dedicated to pagetable manipulations, and put the cr3/tlb > ops elsewhere. I'm not against either approach. I think the way you did it is fine. If it were up to me, I would probably have driven myself crazy splitting hairs on it until I was bald. > > Also, can we drop write_cr2? > > It isn't used anywhere, so the only reason to keep it is symmetry. > > Which was a fine argument when it was an inline, but now it just adds > > unused junk to the code. > > > > I think its used in some cpu state save/restore code, but its not > relevant to pv-ops. Ah yes, it is used there. We actually exercise some of those paths, but they don't need to be strictly virtualized. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops
Zachary Amsden wrote: > On Fri, 2007-09-28 at 11:10 -0700, Jeremy Fitzhardinge wrote: > >> This patch refactors the paravirt_ops structure into groups of >> functionally related ops: >> >> pv_info - random info, rather than function entrypoints >> pv_init_ops - functions used at boot time (some for module_init too) >> pv_misc_ops - lazy mode, which didn't fit well anywhere else >> pv_time_ops - time-related functions >> pv_cpu_ops - various privileged instruction ops >> pv_irq_ops - operations for managing interrupt state >> pv_apic_ops - APIC operations >> pv_mmu_ops - operations for managing pagetables >> >> There are several motivations for this: >> >> 1. Some of these ops will be general to all x86, and some will be >>i386/x86-64 specific. This makes it easier to share common stuff >>while allowing separate implementations where needed. >> >> 2. At the moment we must export all of paravirt_ops, but modules only >>need selected parts of it. This allows us to export on a case by case >>basis (and also choose which export license we want to apply). >> > > We shouldn't need to export pv_init_ops. No. The only ones I export are: EXPORT_SYMBOL_GPL(pv_time_ops); EXPORT_SYMBOL_GPL(pv_cpu_ops); EXPORT_SYMBOL_GPL(pv_mmu_ops); EXPORT_SYMBOL_GPL(pv_apic_ops); EXPORT_SYMBOL(pv_irq_ops); > It is debatable whether > CR2/CR3 should be part of CPU or MMU ops. Yeah, I was in two minds. CR3, at least, should be grouped with the other tlb operations, wherever they go. And while they're privileged CPU instructions (cpu_ops), they're more logically related to the rest of the mmu state. On the other hand, we could have an ops structure specifically dedicated to pagetable manipulations, and put the cr3/tlb ops elsewhere. > Also, can we drop write_cr2? > It isn't used anywhere, so the only reason to keep it is symmetry. > Which was a fine argument when it was an inline, but now it just adds > unused junk to the code. > I think its used in some cpu state save/restore code, but its not relevant to pv-ops. J - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops
On Fri, 2007-09-28 at 11:10 -0700, Jeremy Fitzhardinge wrote: > This patch refactors the paravirt_ops structure into groups of > functionally related ops: > > pv_info - random info, rather than function entrypoints > pv_init_ops - functions used at boot time (some for module_init too) > pv_misc_ops - lazy mode, which didn't fit well anywhere else > pv_time_ops - time-related functions > pv_cpu_ops - various privileged instruction ops > pv_irq_ops - operations for managing interrupt state > pv_apic_ops - APIC operations > pv_mmu_ops - operations for managing pagetables > > There are several motivations for this: > > 1. Some of these ops will be general to all x86, and some will be >i386/x86-64 specific. This makes it easier to share common stuff >while allowing separate implementations where needed. > > 2. At the moment we must export all of paravirt_ops, but modules only >need selected parts of it. This allows us to export on a case by case >basis (and also choose which export license we want to apply). We shouldn't need to export pv_init_ops. It is debatable whether CR2/CR3 should be part of CPU or MMU ops. Also, can we drop write_cr2? It isn't used anywhere, so the only reason to keep it is symmetry. Which was a fine argument when it was an inline, but now it just adds unused junk to the code. > This still needs to be reconciled with the x86 merge and glommer's > paravirt_ops unification work. Those issues aside, this looks great. I'll give it a whirl today. Zach - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/