Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-29 Thread Jeremy Fitzhardinge
Nakajima, Jun wrote:
> To me such atomicity is provided by the "sti" instruction (i.e. the
> processor begins responding to external, maskable interrupts _after_ the
> next instruction is executed), and there is nothing special with that
> combination "sti; hlt" (you can also have like "sti; ret", for example).
>   

Sure, but there's no particular value in "sti; ret".  While the sti mask
window works everywhere, its only cases like "sti; hlt" where it's
needed to avoid a race condition.

> So if you define a PV ops like STI(next_instruction), "safe_halt" for
> the native should be defined as STI("hlt"), and inlined as "sti; hlt". 
>   

That's only meaningful if the pv_op is implemented directly in x86
instructions - ie, the native (or almost native) case.

> If it's hard or we don't need to expose the semantics of "sti" other
> than that, I think it's okay to have a PV operation for safe_halt.
>   

Yeah, the general form would be hard to support for a hypervisor.  Xen,
for example, has an "atomically enable events and block" operation, but
no other "atomically enable events and do X" operations.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-29 Thread Nakajima, Jun
Jeremy Fitzhardinge wrote:
> Nakajima, Jun wrote:
> > Yes. For the native, "safe_halt" is "sti; hlt". The "native_halt" is
> > just "hlt". So the para_virt part of "hlt" could be moved to
pv_cpu_ops,
> > and the "sti" part stays in pv_irq_ops.
> > 
> 
> By "sti part", you mean the full "sti; hlt" sequence of safe_halt,
> right?  Since it needs to be an atomic sequence to avoid race
> conditions, so the native sequence has to be precisely "sti; hlt" to
> take advantage of the sti shadow, and other pv-backends will need
their
> own way to guarantee this atomicity.

To me such atomicity is provided by the "sti" instruction (i.e. the
processor begins responding to external, maskable interrupts _after_ the
next instruction is executed), and there is nothing special with that
combination "sti; hlt" (you can also have like "sti; ret", for example).
So if you define a PV ops like STI(next_instruction), "safe_halt" for
the native should be defined as STI("hlt"), and inlined as "sti; hlt". 

If it's hard or we don't need to expose the semantics of "sti" other
than that, I think it's okay to have a PV operation for safe_halt.

Jun
---
Intel Open Source Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Jeremy Fitzhardinge
Nakajima, Jun wrote:
> Yes. For the native, "safe_halt" is "sti; hlt". The "native_halt" is
> just "hlt". So the para_virt part of "hlt" could be moved to pv_cpu_ops,
> and the "sti" part stays in pv_irq_ops.
>   

By "sti part", you mean the full "sti; hlt" sequence of safe_halt,
right?  Since it needs to be an atomic sequence to avoid race
conditions, so the native sequence has to be precisely "sti; hlt" to
take advantage of the sti shadow, and other pv-backends will need their
own way to guarantee this atomicity.

But I'm quite happy to put plain "hlt" into cpu_ops as halt_cpu() or
something (and perhaps rename safe_halt to something a bit more
descriptive).

> Actually my concern was that such misc ops might grow to include the
> things don't fit well anywhere else. To me, then pv_lazy_ops (with just
> .set_mode) might be better.
>   

The lazy interface has needed a rethink anyway.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Nakajima, Jun
Jeremy Fitzhardinge wrote:
> Nakajima, Jun wrote:
> > Jeremy Fitzhardinge wrote:
> > 
> > 
> > > + .pv_irq_ops = {
> > > +  .init_IRQ = native_init_IRQ,
> > > +  .save_fl = native_save_fl,
> > > +  .restore_fl = native_restore_fl,
> > > +  .irq_disable = native_irq_disable,
> > > +  .irq_enable = native_irq_enable,
> > > +  .safe_halt = native_safe_halt,
> > > +  .halt = native_halt,
> > > +  },
> > > 
> > 
> > I think the halt stuff should be moved to pv_cpu_ops?
> > 
> You mean halt's alternate "shutdown vcpu" meaning if you call it with
> interrupts disabled?  Yeah, I'd be happy to have an explicit op for
> that, rather than making it a secondary overloaded meaning.  And use
> "safe_halt" for all uses of "wait for next interrupt".

Yes. For the native, "safe_halt" is "sti; hlt". The "native_halt" is
just "hlt". So the para_virt part of "hlt" could be moved to pv_cpu_ops,
and the "sti" part stays in pv_irq_ops.

> 
> > > + .pv_misc_ops = {
> > > +  .set_lazy_mode = paravirt_nop,
> > > +  },
> > > 
> > 
> > Or you can split it to pv_cpu_ops and pv_mmu_ops, assuming that they
> > don't need to interact with each other in terms of the lazy
handling.
> > 
> 
> You mean have separate lazy_mmu and lazy_cpu (lazy_context_switch)
ops?
> Possible, but they're still exclusive.  (I think VMI, at least,
assumes
> that you can't have lazy_mmu and lazy_cpu active at the same time, and
> its nice to enforce this in the interface.)

Okay I understand what you are saying.

> 
> But having a whole misc structure for this interface is pretty warty,
I
> admit.
> 
> J

Actually my concern was that such misc ops might grow to include the
things don't fit well anywhere else. To me, then pv_lazy_ops (with just
.set_mode) might be better.

Jun
---
Intel Open Source Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Jeremy Fitzhardinge
Nakajima, Jun wrote:
> Jeremy Fitzhardinge wrote:
>   
>> This patch refactors the paravirt_ops structure into groups of
>> functionally related ops:
>>
>> pv_info - random info, rather than function entrypoints
>> pv_init_ops - functions used at boot time (some for module_init too)
>> pv_misc_ops - lazy mode, which didn't fit well anywhere else
>> pv_time_ops - time-related functions
>> pv_cpu_ops - various privileged instruction ops
>> pv_irq_ops - operations for managing interrupt state
>> pv_apic_ops - APIC operations
>> pv_mmu_ops - operations for managing pagetables
>>
>> 
>
> Good. These make sense to me.
>
>   
>> +.pv_irq_ops = {
>> + .init_IRQ = native_init_IRQ,
>> + .save_fl = native_save_fl,
>> + .restore_fl = native_restore_fl,
>> + .irq_disable = native_irq_disable,
>> + .irq_enable = native_irq_enable,
>> + .safe_halt = native_safe_halt,
>> + .halt = native_halt,
>> + },
>> 
>
> I think the halt stuff should be moved to pv_cpu_ops?
>   
You mean halt's alternate "shutdown vcpu" meaning if you call it with
interrupts disabled?  Yeah, I'd be happy to have an explicit op for
that, rather than making it a secondary overloaded meaning.  And use
"safe_halt" for all uses of "wait for next interrupt".

>> +.pv_misc_ops = {
>> + .set_lazy_mode = paravirt_nop,
>> + },
>> 
>
> Or you can split it to pv_cpu_ops and pv_mmu_ops, assuming that they
> don't need to interact with each other in terms of the lazy handling.
>   

You mean have separate lazy_mmu and lazy_cpu (lazy_context_switch) ops? 
Possible, but they're still exclusive.  (I think VMI, at least, assumes
that you can't have lazy_mmu and lazy_cpu active at the same time, and
its nice to enforce this in the interface.)

But having a whole misc structure for this interface is pretty warty, I
admit.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Nakajima, Jun
Jeremy Fitzhardinge wrote:
> This patch refactors the paravirt_ops structure into groups of
> functionally related ops:
> 
> pv_info - random info, rather than function entrypoints
> pv_init_ops - functions used at boot time (some for module_init too)
> pv_misc_ops - lazy mode, which didn't fit well anywhere else
> pv_time_ops - time-related functions
> pv_cpu_ops - various privileged instruction ops
> pv_irq_ops - operations for managing interrupt state
> pv_apic_ops - APIC operations
> pv_mmu_ops - operations for managing pagetables
> 

Good. These make sense to me.

> + .pv_irq_ops = {
> +  .init_IRQ = native_init_IRQ,
> +  .save_fl = native_save_fl,
> +  .restore_fl = native_restore_fl,
> +  .irq_disable = native_irq_disable,
> +  .irq_enable = native_irq_enable,
> +  .safe_halt = native_safe_halt,
> +  .halt = native_halt,
> +  },

I think the halt stuff should be moved to pv_cpu_ops?

> + .pv_misc_ops = {
> +  .set_lazy_mode = paravirt_nop,
> +  },

Or you can split it to pv_cpu_ops and pv_mmu_ops, assuming that they
don't need to interact with each other in terms of the lazy handling.

Jun
---
Intel Open Source Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Zachary Amsden
On Fri, 2007-09-28 at 11:49 -0700, Jeremy Fitzhardinge wrote:
> > We shouldn't need to export pv_init_ops.  
> 
> No.  The only ones I export are:
> 
> EXPORT_SYMBOL_GPL(pv_time_ops);
> EXPORT_SYMBOL_GPL(pv_cpu_ops);
> EXPORT_SYMBOL_GPL(pv_mmu_ops);
> EXPORT_SYMBOL_GPL(pv_apic_ops);
> EXPORT_SYMBOL(pv_irq_ops);

Nicely done.  I know of some out of tree modules which use part of the
pv_cpu_ops and pv_mmu_ops, but we should not worry about such things,
and it turns out those modules don't need to be virtualized anyway.

> 
> 
> > It is debatable whether
> > CR2/CR3 should be part of CPU or MMU ops.
> 
> Yeah, I was in two minds.  CR3, at least, should be grouped with the
> other tlb operations, wherever they go.  And while they're privileged
> CPU instructions (cpu_ops), they're more logically related to the rest
> of the mmu state.  On the other hand, we could have an ops structure
> specifically dedicated to pagetable manipulations, and put the cr3/tlb
> ops elsewhere.

I'm not against either approach.  I think the way you did it is fine.
If it were up to me, I would probably have driven myself crazy splitting
hairs on it until I was bald.

> >   Also, can we drop write_cr2?
> > It isn't used anywhere, so the only reason to keep it is symmetry.
> > Which was a fine argument when it was an inline, but now it just adds
> > unused junk to the code.
> >   
> 
> I think its used in some cpu state save/restore code, but its not
> relevant to pv-ops.

Ah yes, it is used there.  We actually exercise some of those paths, but
they don't need to be strictly virtualized.

Zach

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> On Fri, 2007-09-28 at 11:10 -0700, Jeremy Fitzhardinge wrote:
>   
>> This patch refactors the paravirt_ops structure into groups of
>> functionally related ops:
>>
>> pv_info - random info, rather than function entrypoints
>> pv_init_ops - functions used at boot time (some for module_init too)
>> pv_misc_ops - lazy mode, which didn't fit well anywhere else
>> pv_time_ops - time-related functions
>> pv_cpu_ops - various privileged instruction ops
>> pv_irq_ops - operations for managing interrupt state
>> pv_apic_ops - APIC operations
>> pv_mmu_ops - operations for managing pagetables
>>
>> There are several motivations for this:
>>
>> 1. Some of these ops will be general to all x86, and some will be
>>i386/x86-64 specific.  This makes it easier to share common stuff
>>while allowing separate implementations where needed.
>>
>> 2. At the moment we must export all of paravirt_ops, but modules only
>>need selected parts of it.  This allows us to export on a case by case
>>basis (and also choose which export license we want to apply).
>> 
>
> We shouldn't need to export pv_init_ops.  

No.  The only ones I export are:

EXPORT_SYMBOL_GPL(pv_time_ops);
EXPORT_SYMBOL_GPL(pv_cpu_ops);
EXPORT_SYMBOL_GPL(pv_mmu_ops);
EXPORT_SYMBOL_GPL(pv_apic_ops);
EXPORT_SYMBOL(pv_irq_ops);


> It is debatable whether
> CR2/CR3 should be part of CPU or MMU ops.

Yeah, I was in two minds.  CR3, at least, should be grouped with the
other tlb operations, wherever they go.  And while they're privileged
CPU instructions (cpu_ops), they're more logically related to the rest
of the mmu state.  On the other hand, we could have an ops structure
specifically dedicated to pagetable manipulations, and put the cr3/tlb
ops elsewhere.

>   Also, can we drop write_cr2?
> It isn't used anywhere, so the only reason to keep it is symmetry.
> Which was a fine argument when it was an inline, but now it just adds
> unused junk to the code.
>   

I think its used in some cpu state save/restore code, but its not
relevant to pv-ops.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] paravirt_ops: refactor struct paravirt_ops into smaller pv_*_ops

2007-09-28 Thread Zachary Amsden
On Fri, 2007-09-28 at 11:10 -0700, Jeremy Fitzhardinge wrote:
> This patch refactors the paravirt_ops structure into groups of
> functionally related ops:
> 
> pv_info - random info, rather than function entrypoints
> pv_init_ops - functions used at boot time (some for module_init too)
> pv_misc_ops - lazy mode, which didn't fit well anywhere else
> pv_time_ops - time-related functions
> pv_cpu_ops - various privileged instruction ops
> pv_irq_ops - operations for managing interrupt state
> pv_apic_ops - APIC operations
> pv_mmu_ops - operations for managing pagetables
> 
> There are several motivations for this:
> 
> 1. Some of these ops will be general to all x86, and some will be
>i386/x86-64 specific.  This makes it easier to share common stuff
>while allowing separate implementations where needed.
> 
> 2. At the moment we must export all of paravirt_ops, but modules only
>need selected parts of it.  This allows us to export on a case by case
>basis (and also choose which export license we want to apply).

We shouldn't need to export pv_init_ops.  It is debatable whether
CR2/CR3 should be part of CPU or MMU ops.  Also, can we drop write_cr2?
It isn't used anywhere, so the only reason to keep it is symmetry.
Which was a fine argument when it was an inline, but now it just adds
unused junk to the code.

> This still needs to be reconciled with the x86 merge and glommer's
> paravirt_ops unification work.

Those issues aside, this looks great.  I'll give it a whirl today.

Zach

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/