Liu, Jiang wrote: >>>> Thanks for your reminder. >>>> After reading relative code, I have some questions about DTrace >>>> prober trigger point in deep C path. On SPARC and non-deep-C idle >>>> path, DTrace probers have been placed as closer as possible to the >>>> point that CPU enters into/exits from hardware idle state. On deep >>>> C state path, the prober trigger points have been pulled out a >>>> little. I heard there were some discussions about prober trigger >>>> points but I missed those discussions. Could anybody give some >>>> hints about those discussions? >>>> >>> >>> What does PowerTop measure and report on other operating systems? >>> Does PowerTop's C-state data include the software latenct to >>> enter/exit C-state(s) on other Operating Systems? My current >>> thought is Solaris should report the same >>> measurement as other OSs. :-) >> >> Different OS is using different time resource. >> My concern is, idle exiting dtrace probe was added into do_interrupt, >> which will add too much latency when enabled. That might affect the >> current report. >> >> We'd better to force idle exiting dtrace probe back to the idle >> thread. > You pointed out an very important and interesting issue, let's do more > investigation and discussion about it. > > First, based on following factors, I think it's OK to trigger > DTrace prober in > do_interrupt(). > 1) During every idle enter/exit loop, Dtrace will only be > trigger in do_interrupt > at most once, under the situation that an interrupt wakes up > CPU from idle > state. > 2) Dtrace prober will cost non-ignorable latency only it's enabled. > 3) There are already existing DTrace probers in interrupt > path, which implies > that Dtrace prober in interrupt path is acceptable. > > Second, actually, I think your question has revealled a > possible design flaw > in current deep C driver/powertop implementation in some > extreme conditions. > Thinking about following posssible extreme case: > 1) idle thread put cpu into idle state. > 2) CPU sleeps in idle state. > 3) Hardware interrupt wakes up CPU from idle state. > 4) do_interrupt calls hardware interrupt handler > 5) more interrupts comes and are served by do_inerrupt. > 6) cpu return to idle thread after served all interrupts. > 7) Deep C driver get waking up timestamp, calculate CPU > utilization, also > triggers DTrace prober for powertop. > 8) Goto step 1. > > In above example, let's say, > step 1), 7) and 8) occupies 10%, which is the latency > introduced by software > step 2) occupies 30%, which is the actual CPU sleep time. > step 3), 4), 5) and 6) occupies 60%, which is used to serve interrupt. > > The actual idle percent is about 40%. With current > implementation, it will be > calculated as about 100% idle, which will cause CPU falsely > entering deep C > state and powertop reporting wrong idle percent. > > With new patchset, it will trigger DTrace probers more > precisely to reflect the > actually CPU idle time. It still needs more cooperation > between deep C driver > and CPU idle notification to fix the above possible flaw. > > Any comments here? >
Actually as long as interrupt is restored after the dtrace probe(exchange the place of hpet.use_lapic_timer and dtrace probe), we shouldn't have the problem you described. I personally prefer to force the dtrace probe in the idle path, we need more thoughts here, and some benchmark result like libmicro. Thanks, -Aubrey
