Hi Bill,
Thanks for your support and please refer to comments below.
Bill.Holler at Sun.COM <mailto:Bill.Holler at Sun.COM> wrote:
> Liu, Jiang wrote:
>> Hi Aubrey,
>> Thanks for your reminder.
>> After reading relative code, I have some questions about DTrace
>> prober trigger point in deep C path. On SPARC and non-deep-C idle
>> path, DTrace probers have been placed as closer as possible to the
>> point that
>> CPU enters into/exits from hardware idle state. On deep C state path,
>> the prober trigger points have been pulled out a little. I heard
>> there were some discussions about prober trigger points but I missed
>> those
>> discussions. Could anybody give some hints about those discussions?
>>
>
> What does PowerTop measure and report on other operating systems?
> Does PowerTop's C-state data include the software latenct to
> enter/exit C-state(s)
> on other Operating Systems? My current thought is Solaris should
> report the same
> measurement as other OSs. :-)
>
> Does LatencyTop show C-state entry/exit software-latency?
>
>
>> Second, with new patchset, Dtrace prober will be triggered later
>> on entering side and earlier on exiting side. On entering side, the
>> difference is small, about ten machine instructions. On exiting
>> side, the difference is bigger because it may need to reprogram the
>> LAPIC timer. I have no
>> concrete idea about the real difference on exiting side, do you have
>> any
>> data about that?
>>
>
> The Deep C-state exit side was optimized for speed because the CPU
> likely has
> real work to do. C-state exit is lock-less and only does a few memory
> writes
> and 1 write to the LAPIC Timer. LAPIC write performance was measured
> while doing the C-state work. I do not remember the exact write
> time. IIRC LAPIC
> access is faster than an un-cached memory write.
Seems good news to us which means that there won't big difference even
on the exiting side, right?
>
> We put back support for Intel's Always Running APIC Timer on Friday.
> Solaris
> does not re-initialize the LAPIC Timer on future Intel Processors when
> exiting
> Deep C-states. :-)
I have an idea here to optimize the deep C path in idle thread. As we know,
there are some existing CPUs on which LAPIC timer will be stopped after
entering C state deeper than C2. And on other future CPUs, they will most
likely support ARAT. With ARAT, idle thread doesn't need to deal with
timer at all.
So the idea here is to use CPU idle notification to optimize timer
relative operation in idle path. On platform without ARAT, we register
timer manipulating functions (hpet_use_lapic_timer/hpet_use_hpet_timer)
as idle callback, then they will be called when entering into/exiting from
idle state. On platform with ARAT, we don't register those function as
idle callback, so no timer relative operation will be incured in idle thread.
By that way, we could optimize for future systems with ARAT
and won't sacrifice current systems without ARAT. And code to implement
deep C idle will be simplified and easier to understand.
Any comment about the idea?
>
> Please send me a pointer to the latest webrev or a patch.
There are three webrev available.
The original draft implementation at
http://cr.opensolaris.org/~gerry/cpuidle_20090327/
The second one is a patch against the original version with several enhancements
at http://cr.opensolaris.org/~gerry/cpuidle_20090402/.
The third one combines original and the patch together for convenience at
http://cr.opensolaris.org/~gerry/cpuidle_20090404/
Thanks!
>
> Thank you,
> Bill
>
>
>> I'm not familiar with powertop implementation, could anybody give
>> us some estimation about the impacts that above changes will have on
>> powertop? Thanks!
>>
>> Li, Aubrey <> wrote:
>>
>>> Liu, Jiang wrote:
>>>
>>>
>>>> 5) Removed cpu_dtrace_idle_probe() and moved dtrace probe for
>>>> idle event into CPU idle framework as built-in callback.
>>>>
>>> The behavior is changed here. Previously we did the idle time to be
>>> all the time the CPU was not executing threads, and now it became
>>> just the time CPU was in a C-state. PowerTOP report probably has a
>>> noticeable difference. Because previously PowerTop would report time
>>> spent doing C-state setup+cleanup as part of idle time (not
>>> executing other threads).
>>>
>>> Any thoughts?
>>>
>>> Thanks,
>>> -Aubrey
>>>
>>
>> Liu Jiang (Gerry)
>> OpenSolaris, OTC, SSG, Intel
Liu Jiang (Gerry)
OpenSolaris, OTC, SSG, Intel