Re: [PATCH 9/10] Vmi timer update.patch

2007-04-11 Thread Zachary Amsden

Chris Wright wrote:

* Zachary Amsden ([EMAIL PROTECTED]) wrote:
  

+void __init vmi_time_init(void)
+{
+   /* Disable PIT: BIOSes start PIT CH0 with 18.2hz peridic. */
+   outb_p(0x3a, PIT_MODE); /* binary, mode 5, LSB/MSB, ch 0 */


That shouldn't be necessary using clockevents.
  
Actually, I'm not so sure.  If clockevents simply masks the PIT when 
disabling it, we still have overhead of keeping the latch in sync, which 
requires a timer at the PIT frequency.  I can instrument to see how 
exactly the PIT gets disabled.



It should switch from pit to vmi-timer, and the switch should do the state
transistions on pit to go to unused mode.
  


Ok, here's why we need it: the reason is even more basic.  PIT 
clockevents never get setup; the time_init paravirt-op makes it 
conditional whether the PIT or VMI timer get invoked.  But our BIOS 
still sets it up to run at 18.2 HZ, like any good BIOS would.  We need 
the disable hack, in fact it is actually a good thing to do for native 
hardware.  Why leave the PIT enabled with junk programming from the BIOS 
once we are in the protected mode kernel?  Eventually, on hardware that 
doesn't want to use the PIT at all, this might be wanted to conserve 
power (casually joking but potentially correct argument).


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/10] Vmi timer update.patch

2007-04-10 Thread Zachary Amsden

Chris Wright wrote:

* Zachary Amsden ([EMAIL PROTECTED]) wrote:
  

Yes, but unfortunately that is a nop:

/*
 * Avoid unnecessary state transitions, as it confuses
 * Geode / Cyrix based boxen.
 */
case CLOCK_EVT_MODE_SHUTDOWN:
if (evt->mode == CLOCK_EVT_MODE_UNUSED)
break;
case CLOCK_EVT_MODE_UNUSED:
if (evt->mode == CLOCK_EVT_MODE_SHUTDOWN)
break;



This one should be fallthrough case during exchange (mode == PERIODIC)
  


Yes, seems PERIODIC->SHUTDOWN should do the right thing.



case CLOCK_EVT_MODE_ONESHOT:
/* One shot setup */
outb_p(0x38, PIT_MODE);

So switching from PIT to VMI does not disable PIT timer interrupts.  
Thus I have to keep this part of the patch.



Since I misread the code, I can drop this now.




Oh, I was looking at this (x86_64 work I have here):

case CLOCK_EVT_MODE_SHUTDOWN:
case CLOCK_EVT_MODE_UNUSED:
outb_p(0x30, PIT_MODE);
outb_p(0, PIT_CH0); /* LSB */
outb_p(0, PIT_CH0); /* MSB */
break;

That's mode 0, not mode 5, but I think the end result is the same.
  


Yes, mode 0, 4, 5 all should behave similarly.

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/10] Vmi timer update.patch

2007-04-10 Thread Chris Wright
* Zachary Amsden ([EMAIL PROTECTED]) wrote:
> Jeremy Fitzhardinge wrote:
> >Why not submit a patch to do what you need here?  (The Geode comment is
> >a bit worrying though.)
> 
> Why should VMI add workaround into PIT code?

I'm not sure it's a workaround, seems more like a subtle diff (perhaps
it's just an oversight).  I need to rectify it anyway when merging in
the x86_64 version.  It's the way the x86_64 code is working already.
Shutdown for most clockevents tells device to stop ticking.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/10] Vmi timer update.patch

2007-04-10 Thread Chris Wright
* Zachary Amsden ([EMAIL PROTECTED]) wrote:
> Yes, but unfortunately that is a nop:
> 
> /*
>  * Avoid unnecessary state transitions, as it confuses
>  * Geode / Cyrix based boxen.
>  */
> case CLOCK_EVT_MODE_SHUTDOWN:
> if (evt->mode == CLOCK_EVT_MODE_UNUSED)
> break;
> case CLOCK_EVT_MODE_UNUSED:
> if (evt->mode == CLOCK_EVT_MODE_SHUTDOWN)
> break;

This one should be fallthrough case during exchange (mode == PERIODIC)

> case CLOCK_EVT_MODE_ONESHOT:
> /* One shot setup */
> outb_p(0x38, PIT_MODE);
> 
> So switching from PIT to VMI does not disable PIT timer interrupts.  
> Thus I have to keep this part of the patch.

Oh, I was looking at this (x86_64 work I have here):

case CLOCK_EVT_MODE_SHUTDOWN:
case CLOCK_EVT_MODE_UNUSED:
outb_p(0x30, PIT_MODE);
outb_p(0, PIT_CH0); /* LSB */
outb_p(0, PIT_CH0); /* MSB */
break;

That's mode 0, not mode 5, but I think the end result is the same.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/10] Vmi timer update.patch

2007-04-10 Thread Zachary Amsden

Jeremy Fitzhardinge wrote:

Why not submit a patch to do what you need here?  (The Geode comment is
a bit worrying though.)
  


Why should VMI add workaround into PIT code?  PIT code wants to know 
nothing about VMI.  It understands PIT timers on hardware.  VMI, on the 
other hand, is special - it knows exactly what hardware platform it has 
and can manipulate hardware freely.  On a generic platform, surely touch 
PIT I/O ports would be quite ill behavior.  But side effects of that on 
a vastly restricted platform are predictable based on the hardware and 
kernel, and we would not add a workaround outside of VMI unless it was 
really necessary or generally useful.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/10] Vmi timer update.patch

2007-04-10 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
>/*
> * Avoid unnecessary state transitions, as it confuses
> * Geode / Cyrix based boxen.
> */
>case CLOCK_EVT_MODE_SHUTDOWN:
>if (evt->mode == CLOCK_EVT_MODE_UNUSED)
>break;
>case CLOCK_EVT_MODE_UNUSED:
>if (evt->mode == CLOCK_EVT_MODE_SHUTDOWN)
>break;
>case CLOCK_EVT_MODE_ONESHOT:
>/* One shot setup */
>outb_p(0x38, PIT_MODE);
>
> So switching from PIT to VMI does not disable PIT timer interrupts. 
> Thus I have to keep this part of the patch.

Why not submit a patch to do what you need here?  (The Geode comment is
a bit worrying though.)

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/10] Vmi timer update.patch

2007-04-10 Thread Zachary Amsden

Chris Wright wrote:

* Zachary Amsden ([EMAIL PROTECTED]) wrote:
  

+void __init vmi_time_init(void)
+{
+   /* Disable PIT: BIOSes start PIT CH0 with 18.2hz peridic. */
+   outb_p(0x3a, PIT_MODE); /* binary, mode 5, LSB/MSB, ch 0 */


That shouldn't be necessary using clockevents.
  
Actually, I'm not so sure.  If clockevents simply masks the PIT when 
disabling it, we still have overhead of keeping the latch in sync, which 
requires a timer at the PIT frequency.  I can instrument to see how 
exactly the PIT gets disabled.



It should switch from pit to vmi-timer, and the switch should do the state
transistions on pit to go to unused mode.
  


Yes, but unfortunately that is a nop:

   /*
* Avoid unnecessary state transitions, as it confuses
* Geode / Cyrix based boxen.
*/
   case CLOCK_EVT_MODE_SHUTDOWN:
   if (evt->mode == CLOCK_EVT_MODE_UNUSED)
   break;
   case CLOCK_EVT_MODE_UNUSED:
   if (evt->mode == CLOCK_EVT_MODE_SHUTDOWN)
   break;
   case CLOCK_EVT_MODE_ONESHOT:
   /* One shot setup */
   outb_p(0x38, PIT_MODE);

So switching from PIT to VMI does not disable PIT timer interrupts.  
Thus I have to keep this part of the patch.



isn't this just your ->startup?
  
Which structure has a ->startup function we can use?  Sorry if this 
seems ignorant, I'm not quite sure what you mean.



The irq_chip.  IOW, it looks like a liberal sprinkling of LVTT vector
initialization.
  


Ahh, ok.

  

+   local_irq_enable();
+   clockevents_notify(CLOCK_EVT_NOTIFY_RESUME, NULL);


...and resume?  Instead of letting clockevents core handle all of that,
and just registering right here?
  
It wasn't clear that clockevents would issue a resume notify for us; if 
so we could handle this setup in the callback, but it has to be done on 
the correct CPU.  I can try it and see if that works.



I would've expected to simply register the clockevents device right here,
and that should do the proper state transitions on the old device, as well
as the new device.  Why do you need resume notify?


I was confused.  The problem is, time init is also highly confused on 
i386; the system comes up running on the PIT, then switches to the 
APIC.  This changes when the APIC gets activated, which is why we 
suspend and resume clockevents while making the irq wiring changes for 
the switch to APIC mode.  Internally, clockevents does the proper state 
transition for us on VMI clock events - they get suspended properly, 
subject to all the proper precautions needed to avoid spurious interrupt 
that are pending in hardware, then they get resumed properly without us 
having to worry about missed interrupts or failure to restart the clock.


But to get the clockevents to do state transitions for us without this 
explicit suspend / resume, we would need to model two clockevents; a 
clockevent running through the PIT, and a higher priority clockevent 
running through the APIC.  Then the logic could conceivably switch over 
for us, but there is an unavoidable race, since we are using the same 
IRQ in each case - hardware IRQ-0.  And an IRQ can only have one irq 
chip, so we can't put the code to switch to the new irq chip in the 
->startup for that chip, since it will never get called unless we set 
the chip before shutting down the old irq chip (through the PIT), which 
means the old PIT irq never gets shut down.


We can't workaround it however without one of these options:

1) stealing a different IRQ and knowing the fixed 1-1 IRQ<->IDT vector 
mapping for it.  We could conceivably hijack any number of IRQs, in any 
order by reserving them during VMI platform initialization, but due to 
the non-linearity with which IRQs are re-mapped to IDT vectors when the 
IO-APIC is activated, I though it simpler to just continue using IRQ-0, 
as this is linearly mapped by constants (instead of offset by 8, 
skipping some vectors and wrapping around).


2) Reusing the local timer IDT vector, since APIC won't be using it.   
Reset the IDT handler to point to our own handler.  We used to do this, 
providing smp_vmi_timer_interrupt and entry.S assembler code for our own 
low-level interrupt handlers.  There were objections to our adding 
low-level interrupt handling code instead of using the proper genirq 
infrastructure.


3) Reuse the local timer IDT vector as our fixed IDT vector.  We must 
reset the IDT vector for this to point to a do_IRQ style handler which 
enters the genirq code.  So, reserve a non-zero platform IRQ and set the 
IDT vector for local timer to vector to interrupt[IRQ].  Now, set the 
irq chip for this IRQ to be a per-cpu handler and give it the VMI 
interrupt chip.


I can't really think of anything else that is feasible.  And keep in 
mind, all of these options require modeling the VMI timer as two 
separate clockevents.  Is it worth the complexity 

Re: [PATCH 9/10] Vmi timer update.patch

2007-04-10 Thread Chris Wright
* Zachary Amsden ([EMAIL PROTECTED]) wrote:
> >>+void __init vmi_time_init(void)
> >>+{
> >>+   /* Disable PIT: BIOSes start PIT CH0 with 18.2hz peridic. */
> >>+   outb_p(0x3a, PIT_MODE); /* binary, mode 5, LSB/MSB, ch 0 */
> >
> >That shouldn't be necessary using clockevents.
> 
> Actually, I'm not so sure.  If clockevents simply masks the PIT when 
> disabling it, we still have overhead of keeping the latch in sync, which 
> requires a timer at the PIT frequency.  I can instrument to see how 
> exactly the PIT gets disabled.

It should switch from pit to vmi-timer, and the switch should do the state
transistions on pit to go to unused mode.

> >>+   vmi_time_init_clockevent();
> >>+   setup_irq(0, &vmi_clock_action);
> >>+}
> >>+
> >>+#ifdef CONFIG_X86_LOCAL_APIC
> >>+void __devinit vmi_time_bsp_init(void)
> >>+{
> >>+   /*
> >>+* On APIC systems, we want local timers to fire on each cpu.  We do
> >>+* this by programming LVTT to deliver timer events to the IRQ 
> >>handler
> >>+* for IRQ-0, since we can't re-use the APIC local timer handler
> >>+* without interfering with that code.
> >>+*/
> >>+   clockevents_notify(CLOCK_EVT_NOTIFY_SUSPEND, NULL);
> >
> >Why do you do this suspend...
> 
> We need to cancel all pending PIT timer events and restart then local 
> timer, which requires atomically taking over IRQ-0.  We use the IDT gate 
> for IRQ-0 because it is already an exclusive interrupt, but we can't 
> re-use the LVTT IDT gate for local timer since that requires a custom 
> custom SMP interrupt in entry.S.  So we must be absolutely sure when we 
> get an interrupt on IRQ-0 that it came from the VMI local (rather than 
> PIT) delivery path.

OK, this is why it seems odd.  Clockevents should put pit timer into
unused state.

> >>+   local_irq_disable();
> >>+#ifdef CONFIG_X86_SMP
> >>+   /*
> >>+* XXX handle_percpu_irq only defined for SMP; we need to switch over
> >>+* to using it, since this is a local interrupt, which each CPU must
> >>+* handle individually without locking out or dropping simultaneous
> >>+* local timers on other CPUs.  We also don't want to trigger the
> >>+* quirk workaround code for interrupts which gets invoked from
> >>+* handle_percpu_irq via eoi, so we use our own IRQ chip.
> >>+*/
> >>+   set_irq_chip_and_handler_name(0, &vmi_chip, handle_percpu_irq, 
> >>"lvtt");
> >>+#else
> >>+   set_irq_chip_and_handler_name(0, &vmi_chip, handle_edge_irq, "lvtt");
> >>+#endif
> >>+   vmi_wiring = VMI_ALARM_WIRED_LVTT;
> >>+   apic_write(APIC_LVTT, vmi_get_timer_vector());
> >>
> >
> >isn't this just your ->startup?
> 
> Which structure has a ->startup function we can use?  Sorry if this 
> seems ignorant, I'm not quite sure what you mean.

The irq_chip.  IOW, it looks like a liberal sprinkling of LVTT vector
initialization.

> >>+   local_irq_enable();
> >>+   clockevents_notify(CLOCK_EVT_NOTIFY_RESUME, NULL);
> >...and resume?  Instead of letting clockevents core handle all of that,
> >and just registering right here?
> 
> It wasn't clear that clockevents would issue a resume notify for us; if 
> so we could handle this setup in the callback, but it has to be done on 
> the correct CPU.  I can try it and see if that works.

I would've expected to simply register the clockevents device right here,
and that should do the proper state transitions on the old device, as well
as the new device.  Why do you need resume notify?

thanks,
-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/10] Vmi timer update.patch

2007-04-10 Thread Zachary Amsden

Chris Wright wrote:


Thanks for the review!  Comments inline.


+/* paravirt_ops.get_wallclock = vmi_get_wallclock */



Style nit, these pv_ops.foo = vmi_foo style comments aren't really useful.

  


Yeah, and easy to get out of sync.  I'll drop them.


+   .rating = 1000,



Heh, no messing around ;-)
  


Yes, VMI has 1000 hps.


+	printk(KERN_WARNING "vmi: registering clock event %s. mult=%lu shift=%u\n", 
+	   evt->name, evt->mult, evt->shift);



Why is this a warning? ;-)
  


Debug info, I can remove it.


+void __init vmi_time_init(void)
+{
+   /* Disable PIT: BIOSes start PIT CH0 with 18.2hz peridic. */
+   outb_p(0x3a, PIT_MODE); /* binary, mode 5, LSB/MSB, ch 0 */



That shouldn't be necessary using clockevents.
  


Actually, I'm not so sure.  If clockevents simply masks the PIT when 
disabling it, we still have overhead of keeping the latch in sync, which 
requires a timer at the PIT frequency.  I can instrument to see how 
exactly the PIT gets disabled.




+   vmi_time_init_clockevent();
+   setup_irq(0, &vmi_clock_action);
+}
+
+#ifdef CONFIG_X86_LOCAL_APIC
+void __devinit vmi_time_bsp_init(void)
+{
+   /*
+* On APIC systems, we want local timers to fire on each cpu.  We do
+* this by programming LVTT to deliver timer events to the IRQ handler
+* for IRQ-0, since we can't re-use the APIC local timer handler
+* without interfering with that code.
+*/
+   clockevents_notify(CLOCK_EVT_NOTIFY_SUSPEND, NULL);



Why do you do this suspend...
  


We need to cancel all pending PIT timer events and restart then local 
timer, which requires atomically taking over IRQ-0.  We use the IDT gate 
for IRQ-0 because it is already an exclusive interrupt, but we can't 
re-use the LVTT IDT gate for local timer since that requires a custom 
custom SMP interrupt in entry.S.  So we must be absolutely sure when we 
get an interrupt on IRQ-0 that it came from the VMI local (rather than 
PIT) delivery path.


  

+   local_irq_disable();
+#ifdef CONFIG_X86_SMP
+   /*
+* XXX handle_percpu_irq only defined for SMP; we need to switch over
+* to using it, since this is a local interrupt, which each CPU must
+* handle individually without locking out or dropping simultaneous
+* local timers on other CPUs.  We also don't want to trigger the
+* quirk workaround code for interrupts which gets invoked from
+* handle_percpu_irq via eoi, so we use our own IRQ chip.
+*/
+   set_irq_chip_and_handler_name(0, &vmi_chip, handle_percpu_irq, "lvtt");
+#else
+   set_irq_chip_and_handler_name(0, &vmi_chip, handle_edge_irq, "lvtt");
+#endif
+   vmi_wiring = VMI_ALARM_WIRED_LVTT;
+   apic_write(APIC_LVTT, vmi_get_timer_vector());



isn't this just your ->startup?
  


Which structure has a ->startup function we can use?  Sorry if this 
seems ignorant, I'm not quite sure what you mean.


  

+   local_irq_enable();
+   clockevents_notify(CLOCK_EVT_NOTIFY_RESUME, NULL);



...and resume?  Instead of letting clockevents core handle all of that,
and just registering right here?
  


It wasn't clear that clockevents would issue a resume notify for us; if 
so we could handle this setup in the callback, but it has to be done on 
the correct CPU.  I can try it and see if that works.


Thanks,

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/10] Vmi timer update.patch

2007-04-09 Thread Chris Wright
* Zachary Amsden ([EMAIL PROTECTED]) wrote:
> diff -r c02ab981c99c arch/i386/kernel/vmiclock.c
> --- /dev/null Thu Jan 01 00:00:00 1970 +
> +++ b/arch/i386/kernel/vmiclock.c Mon Apr 09 15:47:17 2007 -0700
> @@ -0,0 +1,318 @@
> +/*
> + * VMI paravirtual timer support routines.
> + *
> + * Copyright (C) 2007, VMware, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful, but
> + * WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
> + * NON INFRINGEMENT.  See the GNU General Public License for more
> + * details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include "io_ports.h"
> +
> +#define VMI_ONESHOT  (VMI_ALARM_IS_ONESHOT  | VMI_CYCLES_REAL | 
> vmi_get_alarm_wiring())
> +#define VMI_PERIODIC (VMI_ALARM_IS_PERIODIC | VMI_CYCLES_REAL | 
> vmi_get_alarm_wiring())
> +
> +static DEFINE_PER_CPU(struct clock_event_device, local_events);
> +
> +static inline u32 vmi_counter(u32 flags)
> +{
> + /* Given VMI_ONESHOT or VMI_PERIODIC, return the corresponding
> +  * cycle counter. */
> + return flags & VMI_ALARM_COUNTER_MASK;
> +}
> +
> +/* paravirt_ops.get_wallclock = vmi_get_wallclock */

Style nit, these pv_ops.foo = vmi_foo style comments aren't really useful.

> +unsigned long vmi_get_wallclock(void)
> +{
> + unsigned long long wallclock;
> + wallclock = vmi_timer_ops.get_wallclock(); // nsec
> + (void)do_div(wallclock, 10);   // sec
> +
> + return wallclock;
> +}
> +
> +/* paravirt_ops.set_wallclock = vmi_set_wallclock */
> +int vmi_set_wallclock(unsigned long now)
> +{
> + return 0;
> +}
> +
> +/* paravirt_ops.get_scheduled_cycles = vmi_get_sched_cycles */
> +unsigned long long vmi_get_sched_cycles(void)
> +{
> + return vmi_timer_ops.get_cycle_counter(VMI_CYCLES_AVAILABLE);
> +}
> +
> +/* paravirt_ops.get_cpu_khz = vmi_cpu_khz */
> +unsigned long vmi_cpu_khz(void)
> +{
> + unsigned long long khz;
> + khz = vmi_timer_ops.get_cycle_frequency();
> + (void)do_div(khz, 1000);
> + return khz;
> +}
> +
> +static inline unsigned int vmi_get_timer_vector(void)
> +{
> +#ifdef CONFIG_X86_IO_APIC
> + return FIRST_DEVICE_VECTOR;
> +#else
> + return FIRST_EXTERNAL_VECTOR;
> +#endif
> +}
> +
> +/** vmi clockchip */
> +#ifdef CONFIG_X86_LOCAL_APIC
> +static unsigned int startup_timer_irq(unsigned int irq)
> +{
> + unsigned long val = apic_read(APIC_LVTT);
> + apic_write(APIC_LVTT, vmi_get_timer_vector());
> +
> + return (val & APIC_SEND_PENDING);
> +}
> +
> +static void mask_timer_irq(unsigned int irq)
> +{
> + unsigned long val = apic_read(APIC_LVTT);
> + apic_write(APIC_LVTT, val | APIC_LVT_MASKED);
> +}
> +
> +static void unmask_timer_irq(unsigned int irq)
> +{
> + unsigned long val = apic_read(APIC_LVTT);
> + apic_write(APIC_LVTT, val & ~APIC_LVT_MASKED);
> +}
> +
> +static void ack_timer_irq(unsigned int irq)
> +{
> + ack_APIC_irq();
> +}
> +
> +static struct irq_chip vmi_chip __read_mostly = {
> + .name   = "VMI-LOCAL",
> + .startup= startup_timer_irq,
> + .mask   = mask_timer_irq,
> + .unmask = unmask_timer_irq,
> + .ack= ack_timer_irq
> +};
> +#endif
> +
> +/** vmi clockevent */
> +#define VMI_ALARM_WIRED_IRQ00x
> +#define VMI_ALARM_WIRED_LVTT0x0001
> +static int vmi_wiring = VMI_ALARM_WIRED_IRQ0;
> +
> +static inline int vmi_get_alarm_wiring(void)
> +{
> + return vmi_wiring;  
> +}
> +
> +static void vmi_timer_set_mode(enum clock_event_mode mode,
> +struct clock_event_device *evt)
> +{
> + cycle_t now, cycles_per_hz;
> + BUG_ON(!irqs_disabled());
> +
> + switch (mode) {
> + case CLOCK_EVT_MODE_ONESHOT:
> + break;
> + case CLOCK_EVT_MODE_PERIODIC:
> + cycles_per_hz = vmi_timer_ops.get_cycle_frequency();
> + (void)do_div(cycles_per_hz, HZ);
> + now = 
> vmi_timer_ops.get_cycle_counter(vmi_counter(VMI_PERIODIC));
> + vmi_timer_ops.set_alarm(VMI_PERIODIC, now, cycles_per_hz);
> + break;
> + case CLOCK_EVT_MODE_UNUSED:
> + case CLOCK_EVT_MODE_SHUTDOWN:
> + switch (evt->mode) {
> + case CLOCK_EVT_MODE_ONESHOT:
> + vmi_timer_ops.cancel_alarm

[PATCH 9/10] Vmi timer update.patch

2007-04-09 Thread Zachary Amsden
Convert VMI timer to use clock events, making it properly able to use the NO_HZ
infrastructure.  On UP systems, with no local APIC, we just continue to route
these events through the PIT.  On systems with a local APIC, or SMP, we provide
a single source interrupt chip which creates the local timer IRQ.  It actually
gets delivered by the APIC hardware, but we don't want to use the same local
APIC clocksource processing, so we create our own handler here.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r c02ab981c99c arch/i386/kernel/Makefile
--- a/arch/i386/kernel/Makefile Mon Apr 09 15:45:27 2007 -0700
+++ b/arch/i386/kernel/Makefile Mon Apr 09 15:45:27 2007 -0700
@@ -41,7 +41,7 @@ obj-$(CONFIG_K8_NB)   += k8.o
 obj-$(CONFIG_K8_NB)+= k8.o
 obj-$(CONFIG_STACK_UNWIND) += unwind.o
 
-obj-$(CONFIG_VMI)  += vmi.o vmitime.o
+obj-$(CONFIG_VMI)  += vmi.o vmiclock.o
 obj-$(CONFIG_PARAVIRT) += paravirt.o
 obj-y  += pcspeaker.o
 
diff -r c02ab981c99c arch/i386/kernel/vmi.c
--- a/arch/i386/kernel/vmi.cMon Apr 09 15:45:27 2007 -0700
+++ b/arch/i386/kernel/vmi.cMon Apr 09 15:49:37 2007 -0700
@@ -73,6 +73,9 @@ static struct {
void (*set_lazy_mode)(int mode);
 } vmi_ops;
 
+/* Cached VMI operations */
+struct vmi_timer_ops vmi_timer_ops;
+
 /*
  * VMI patching routines.
  */
@@ -231,18 +234,6 @@ static void vmi_nop(void)
 {
 }
 
-/* For NO_IDLE_HZ, we stop the clock when halting the kernel */
-static fastcall void vmi_safe_halt(void)
-{
-   int idle = vmi_stop_hz_timer();
-   vmi_ops.halt();
-   if (idle) {
-   local_irq_disable();
-   vmi_account_time_restart_hz_timer();
-   local_irq_enable();
-   }
-}
-
 #ifdef CONFIG_DEBUG_PAGE_TYPE
 
 #ifdef CONFIG_X86_PAE
@@ -714,7 +705,6 @@ do {
\
vmi_ops.cache = (void *)rel->eip;   \
}   \
 } while (0)
-
 
 /*
  * Activate the VMI interface and switch into paravirtualized mode
@@ -894,8 +884,8 @@ static inline int __init activate_vmi(vo
paravirt_ops.get_wallclock = vmi_get_wallclock;
paravirt_ops.set_wallclock = vmi_set_wallclock;
 #ifdef CONFIG_X86_LOCAL_APIC
-   paravirt_ops.setup_boot_clock = vmi_timer_setup_boot_alarm;
-   paravirt_ops.setup_secondary_clock = 
vmi_timer_setup_secondary_alarm;
+   paravirt_ops.setup_boot_clock = vmi_time_bsp_init; 
+   paravirt_ops.setup_secondary_clock = vmi_time_ap_init;
 #endif
paravirt_ops.get_scheduled_cycles = vmi_get_sched_cycles;
paravirt_ops.get_cpu_khz = vmi_cpu_khz;
@@ -907,11 +897,7 @@ static inline int __init activate_vmi(vo
disable_vmi_timer = 1;
}
 
-   /* No idle HZ mode only works if VMI timer and no idle is enabled */
-   if (disable_noidle || disable_vmi_timer)
-   para_fill(safe_halt, Halt);
-   else
-   para_wrap(safe_halt, vmi_safe_halt, halt, Halt);
+   para_fill(safe_halt, Halt);
 
/*
 * Alternative instruction rewriting doesn't happen soon enough
diff -r c02ab981c99c include/asm-i386/vmi_time.h
--- a/include/asm-i386/vmi_time.h   Mon Apr 09 15:45:27 2007 -0700
+++ b/include/asm-i386/vmi_time.h   Mon Apr 09 15:45:27 2007 -0700
@@ -53,22 +53,8 @@ extern unsigned long vmi_cpu_khz(void);
 extern unsigned long vmi_cpu_khz(void);
 
 #ifdef CONFIG_X86_LOCAL_APIC
-extern void __init vmi_timer_setup_boot_alarm(void);
-extern void __devinit vmi_timer_setup_secondary_alarm(void);
-extern void apic_vmi_timer_interrupt(void);
-#endif
-
-#ifdef CONFIG_NO_IDLE_HZ
-extern int vmi_stop_hz_timer(void);
-extern void vmi_account_time_restart_hz_timer(void);
-#else
-static inline int vmi_stop_hz_timer(void)
-{
-   return 0;
-}
-static inline void vmi_account_time_restart_hz_timer(void)
-{
-}
+extern void __devinit vmi_time_bsp_init(void);
+extern void __devinit vmi_time_ap_init(void);
 #endif
 
 /*
diff -r c02ab981c99c arch/i386/kernel/entry.S
--- a/arch/i386/kernel/entry.S  Mon Apr 09 15:45:27 2007 -0700
+++ b/arch/i386/kernel/entry.S  Mon Apr 09 16:03:18 2007 -0700
@@ -637,11 +637,6 @@ ENDPROC(name)
 /* The include is where all of the SMP etc. interrupts come from */
 #include "entry_arch.h"
 
-/* This alternate entry is needed because we hijack the apic LVTT */
-#if defined(CONFIG_VMI) && defined(CONFIG_X86_LOCAL_APIC)
-BUILD_INTERRUPT(apic_vmi_timer_interrupt,LOCAL_TIMER_VECTOR)
-#endif
-
 KPROBE_ENTRY(page_fault)
RING0_EC_FRAME
pushl $do_page_fault
diff -r c02ab981c99c arch/i386/kernel/vmiclock.c
--- /dev/null   Thu Jan 01 00:00:00 1970 +
+++ b/arch/i386/kernel/vmiclock.c   Mon Apr 09 15:47:17 2007 -0700
@@ -0,0 +1,318 @@
+/*
+ * VMI paravirtual timer support routines.
+ *
+ * Copyright (C) 2007, VMware, Inc