[tesla-dev] Request source code review for CPU Idle Notification enhancements

Li, Aubrey Mon, 6 Apr 2009 21:52:13 +0800

Liu, Jiang wrote:

>>>>    Thanks for your reminder.
>>>>    After reading relative code, I have some questions about DTrace
>>>> prober trigger point in deep C path. On SPARC and non-deep-C idle
>>>> path, DTrace probers have been placed as closer as possible to the
>>>> point that CPU enters into/exits from hardware idle state. On deep
>>>> C state path, the prober trigger points have been pulled out a
>>>> little. I heard there were some discussions about prober trigger
>>>> points but I missed those discussions. Could anybody give some
>>>> hints about those discussions? 
>>>> 
>>> 
>>> What does PowerTop measure and report on other operating systems?
>>> Does PowerTop's C-state data include the software latenct to
>>> enter/exit C-state(s) on other Operating Systems?  My current
>>> thought is Solaris should report the same
>>> measurement as other OSs.  :-)
>> 
>> Different OS is using different time resource.
>> My concern is, idle exiting dtrace probe was added into do_interrupt,
>> which will add too much latency when enabled. That might affect the
>> current report. 
>> 
>> We'd better to force idle exiting dtrace probe back to the idle
>> thread.
> You pointed out an very important and interesting issue, let's do more
> investigation and discussion about it.
> 
> First, based on following factors, I think it's OK to trigger
> DTrace prober in
> do_interrupt().
> 1) During every idle enter/exit loop, Dtrace will only be
> trigger in do_interrupt
> at most once, under the situation that an interrupt wakes up
> CPU from idle
> state.
> 2) Dtrace prober will cost non-ignorable latency only it's enabled.
> 3) There are already existing DTrace probers in interrupt
> path, which implies
> that Dtrace prober in interrupt path is acceptable.
> 
> Second, actually, I think your question has revealled a
> possible design flaw
> in current deep C driver/powertop implementation in some
> extreme conditions.
> Thinking about following posssible extreme case:
> 1) idle thread put cpu into idle state.
> 2) CPU sleeps in idle state.
> 3) Hardware interrupt wakes up CPU from idle state.
> 4) do_interrupt calls hardware interrupt handler
> 5) more interrupts comes and are served by do_inerrupt.
> 6) cpu return to idle thread after served all interrupts.
> 7) Deep C driver get waking up timestamp, calculate CPU
> utilization, also
> triggers DTrace prober for powertop.
> 8) Goto step 1.
> 
> In above example, let's say,
> step 1), 7) and 8) occupies 10%, which is the latency
> introduced by software
> step 2) occupies 30%, which is the actual CPU sleep time.
> step 3), 4), 5) and 6) occupies 60%, which is used to serve interrupt.
> 
> The actual idle percent is about 40%. With current
> implementation, it will be
> calculated as about 100% idle, which will cause CPU falsely
> entering deep C
> state and powertop reporting wrong idle percent.
> 
> With new patchset, it will trigger DTrace probers more
> precisely to reflect the
> actually CPU idle time. It still needs more cooperation
> between deep C driver
> and CPU idle notification to fix the above possible flaw.
> 
> Any comments here?
>


Actually as long as interrupt is restored after the dtrace probe(exchange
the place of hpet.use_lapic_timer and dtrace probe), we shouldn't have the
problem you described. I personally prefer to force the dtrace probe in the
idle path, we need more thoughts here, and some benchmark result like 
libmicro.

Thanks,
-Aubrey

[tesla-dev] Request source code review for CPU Idle Notification enhancements

Reply via email to