Re: How to get deeper C states working?

2022-10-29 Thread Alexander Motin

Hi,

FreeBSD reports ACPI C-states, while Linux -- CPU C-states.  Mapping of 
ones into the others is controlled by BIOS and not exposed to the OS. 
It is quite likely that ACPI C2 means CPU C3, and ACPI C3 means CPU 
C6/C7.  When you plug in AC adapter BIOS likely hides ACPI C3 state from 
OS, since it makes no much sense to save that little energy, considering 
potential performance loss.


On 29.10.2022 11:52, Lester wrote:

Hi,

I'm using FreeBSD 13.1 on a Thinkpad T420 and noticed 1) with AC plugged 
in I only get C1 and C2 recognized 2) with only battery I get C1, C2 and 
C3. I also have Debian Linux installed on the same machine, under which 
I can get C6 and C7 too (I noticed there's a ssdt6 for Cpu0Cst which 
defines all the C states).


I was wondering if Debian has some SSDT override that provides the 
additional states? From reading FreeBSD's acpi doc, I got the sense that 
I can override the DSDT, but don't know what I need to change, and how 
to get all the override files combined into a single aml file...


Questions: 1) How can I get C3 working on AC? 2) How can I get C6 and C7 
working too? I'm sharing my acpidump results in this folder: 
https://drive.google.com/drive/folders/1q0pY_2fO96RcQCN929sLLtYPpiokVTC3?usp=sharing <https://drive.google.com/drive/folders/1q0pY_2fO96RcQCN929sLLtYPpiokVTC3?usp=sharing>


Many thanks!

== AC
hw.acpi.cpu.cx_lowest: C8
dev.cpu.1.cx_method: C1/hlt C2/io
dev.cpu.1.cx_usage_counters: 124 817
dev.cpu.1.cx_usage: 13.17% 86.82% last 54us
dev.cpu.1.cx_lowest: C8
dev.cpu.1.cx_supported: C1/1/1 C2/3/104
dev.cpu.0.cx_method: C1/hlt C2/io
dev.cpu.0.cx_usage_counters: 70 520
dev.cpu.0.cx_usage: 11.86% 88.13% last 5508us
dev.cpu.0.cx_lowest: C8
dev.cpu.0.cx_supported: C1/1/1 C2/3/104

== Battery
hw.acpi.cpu.cx_lowest: C8
dev.cpu.1.cx_method: C1/hlt C2/io C3/io
dev.cpu.1.cx_usage_counters: 1946 106 11173
dev.cpu.1.cx_usage: 14.71% 0.80% 84.48% last 85us
dev.cpu.1.cx_lowest: C8
dev.cpu.1.cx_supported: C1/1/1 C2/2/80 C3/3/109
dev.cpu.0.cx_method: C1/hlt C2/io C3/io
dev.cpu.0.cx_usage_counters: 1767 105 7127
dev.cpu.0.cx_usage: 19.63% 1.16% 79.19% last 15us
dev.cpu.0.cx_lowest: C8
dev.cpu.0.cx_supported: C1/1/1 C2/2/80 C3/3/109


== Linux
cpupower idle-info
CPUidle driver: intel_idle
CPUidle governor: menu
analyzing CPU 0:

Number of idle states: 6
Available idle states: POLL C1 C1E C3 C6 C7
POLL:
Flags/Description: CPUIDLE CORE POLL IDLE
Latency: 0
Usage: 16099
Duration: 264781
C1:
Flags/Description: MWAIT 0x00
Latency: 2
Usage: 7103
Duration: 1039428
C1E:
Flags/Description: MWAIT 0x01
Latency: 10
Usage: 30433
Duration: 6118359
C3:
Flags/Description: MWAIT 0x10
Latency: 80
Usage: 11891
Duration: 4311399
C6:
Flags/Description: MWAIT 0x20
Latency: 104
Usage: 77
Duration: 26683
C7:
Flags/Description: MWAIT 0x30
Latency: 109
Usage: 157291
Duration: 433120357


--
Alexander Motin



Re: suspend issues with latest -HEAD, ahci failing to complete something?

2014-05-05 Thread Alexander Motin

On 05.05.2014 20:37, Adrian Chadd wrote:

(I know, I just emailed out asking about setting S3 for the default
lid suspend state, however I just updated to the very latest head and
things went a little backwards.)

Suspend no longer works for me:

May  5 10:33:10 lucy-11i386 acpi: suspend at 20140505 10:33:10
May  5 10:33:47 lucy-11i386 kernel: ahcich0: Timeout on slot 19 port 0
May  5 10:33:47 lucy-11i386 kernel: ahcich0: is  cs fff80fff
ss fff80fff rs fff80fff tfd d0 serr  cmd d317
May  5 10:33:47 lucy-11i386 kernel: (ada0:ahcich0:0:0:0):
WRITE_FPDMA_QUEUED. ACB: 61 08 e0 b0 fa 40 42 00 00 00 00 00
May  5 10:33:47 lucy-11i386 kernel: (ada0:ahcich0:0:0:0): CAM status:
Command timeout
May  5 10:33:47 lucy-11i386 kernel: (ada0:ahcich0:0:0:0): Retrying command
May  5 10:33:13 lucy-11i386 acpi: resumed at 20140505 10:33:13

May  5 10:33:59 lucy-11i386 acpi: suspend at 20140505 10:33:59
May  5 10:34:37 lucy-11i386 kernel: ahcich0: Timeout on slot 9 port 0
May  5 10:34:37 lucy-11i386 kernel: ahcich0: is  cs ff83
ss ff83 rs ff83 tfd d0 serr  cmd c717
May  5 10:34:37 lucy-11i386 kernel: (ada0:ahcich0:0:0:0):
WRITE_FPDMA_QUEUED. ACB: 61 08 18 5c f7 40 42 00 00 00 00 00
May  5 10:34:37 lucy-11i386 kernel: (ada0:ahcich0:0:0:0): CAM status:
Command timeout
May  5 10:34:37 lucy-11i386 kernel: (ada0:ahcich0:0:0:0): Retrying command
May  5 10:34:03 lucy-11i386 acpi: resumed at 20140505 10:34:03

What has recently changed that'd possibly break ahci's ability to
correctly suspend?


When I tested it last time (awhile ago), it was working for me. 
ahci_ch_suspend() should block all I/O on the channel and wait until all 
active commands complete. On resume channel should be reinitialized, 
device reset and only then I/Os should be released. Do you see those 
timeouts on suspend or resume?


Do you have kern.cam.ada.spindown_suspend enabled? Can you try to 
disable it?


--
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: Using bintime() in acpi_cpu_idle()?

2012-07-30 Thread Alexander Motin

On 30.07.2012 09:25, Alexander Motin wrote:

On 30.07.2012 07:33, Bruce Evans wrote:

On Sun, 29 Jul 2012, Alexander Motin wrote:


On 29.07.2012 15:26, Bruce Evans wrote:

On Sun, 29 Jul 2012, Alexander Motin wrote:


On 29.07.2012 11:37, Bruce Evans wrote:
...

binuptime() is more accurate than uncalibrated scaling.  Is accuracy
required?


Accuracy is not required at all. +-20% is not a problem.


If not, the CPU ticker might work, and is faster than HPET,
and and is not under user control for perverse settings.  It normally
reduces to readtsc() with no serializing instruction even in proposed
changes.  This is good enough for process times (not very good) and
depends on the CPU not changing.  Its calibration is very accurate
(similar to timecounters) modulo bugs, but not always up to date.


Problem with ticker that it may stop during idle periods, and idle is
exactly what happens here. Unlike timecounter usage here we don't need
CPU synchronicity, but we need it working during deep sleeps.


The ticker is the same as the timecounter in many cases of
interest.  If
the TSC stops then it cannot be used for timecounting unless
timecounting
is reinitialized.  Timecounting should be reinitialized after deep
sleeps,
but you say you need it to work during deep sleeps.


Timecounter already has detection logic to disable TSC in cases where
it is unreliable. I don't want to replicate it here. I need not
precise and not synchronized by reliable and fast time source.


Yes, this logic gives exactly what you don't want (an inefficient
timecounter), by preventing use of the TSC for the timecounter, although
the TSC is perfectly usable for the ticker and here.


Can you teach me how to use ticker that is not ticking? If TSC was
considered unusable for timecounter for reasons unrelated to SMP, how
can I use it as ticker.


I wouldn't trust timecounters for some time after waking up after a
deep sleep.  If their clock stopped then the times read might only be
very out of date.  If their clock didn't stop, then they might have
wrapped or otherwise overflowed and the times read would be garbage.
Is there any locking or ordering to prevent them being used before they
are reinitialized?


I am not sure what reinitialization are you talking about. IIRC, there
is no any waking up code for TSC. None other time counters have
problems with C-states.


It is the timecounter code that needs reinitializing.  If the TSC stops,
or wraps mod 2**32, then its counts become garbage for the purpose of
timecounting.  Maybe it is not used for timecounting in either of these
cases.  But these cases shouldn't prevent its use for timecounting.

The 2**32 number is because timecounters only use 32 bits of hardware
counters (for efficiency).  So even if the hardware has some magic to
not stop the TSC while sleeping (maybe it fakes not stopping it be
reloading on wakeup), it is still unusable by timecounters after sleeping
for a second or 2 so that it wraps.  The software needs similar faking
to reload the timecounter on wakeup.  This makes use of timecounters in
sleep/wakeup code fragile.


At this moment I am not talking about S-states sleeping for hours. I am
talking about C-states for milliseconds. It means that TSC may stop and
start 10K times each second or even more. Attempt to save and restore
its state will consume so much resources, that probably make it useless.

What's about wrap after 2 seconds, I would be happy to make CPU sleep
for so long, but now 100ms is all I can hope even on idle system.


At boot time there is a dummy timecounter that returns bogo-times.
Apparently sleeping doesn't occur before the timecounter is switched to
a real one.  The dummy timecounter isn't switched back to after boot
time.  But it probably should be, since the hardware timecounter may
have stopped or wrapped.  Sleeping could just set a flag to indicate
this state, but then you would have to provide a fake time anyway on
finding the flag set.  Boot time just points to the dummy timecounter
so as not to check this flag in all early timecounter "hardware" calls.


And how dummy timecounter that counts something, but not time, can help
me to measure sleep time?


Nevermind, let it be compromise solution -- ticker for C1 state where 
performance is the most important and where TSC works and ACPI timer for 
others.


--
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: Using bintime() in acpi_cpu_idle()?

2012-07-29 Thread Alexander Motin

On 30.07.2012 07:33, Bruce Evans wrote:

On Sun, 29 Jul 2012, Alexander Motin wrote:


On 29.07.2012 15:26, Bruce Evans wrote:

On Sun, 29 Jul 2012, Alexander Motin wrote:


On 29.07.2012 11:37, Bruce Evans wrote:
...

binuptime() is more accurate than uncalibrated scaling.  Is accuracy
required?


Accuracy is not required at all. +-20% is not a problem.


If not, the CPU ticker might work, and is faster than HPET,
and and is not under user control for perverse settings.  It normally
reduces to readtsc() with no serializing instruction even in proposed
changes.  This is good enough for process times (not very good) and
depends on the CPU not changing.  Its calibration is very accurate
(similar to timecounters) modulo bugs, but not always up to date.


Problem with ticker that it may stop during idle periods, and idle is
exactly what happens here. Unlike timecounter usage here we don't need
CPU synchronicity, but we need it working during deep sleeps.


The ticker is the same as the timecounter in many cases of interest.  If
the TSC stops then it cannot be used for timecounting unless
timecounting
is reinitialized.  Timecounting should be reinitialized after deep
sleeps,
but you say you need it to work during deep sleeps.


Timecounter already has detection logic to disable TSC in cases where
it is unreliable. I don't want to replicate it here. I need not
precise and not synchronized by reliable and fast time source.


Yes, this logic gives exactly what you don't want (an inefficient
timecounter), by preventing use of the TSC for the timecounter, although
the TSC is perfectly usable for the ticker and here.


Can you teach me how to use ticker that is not ticking? If TSC was 
considered unusable for timecounter for reasons unrelated to SMP, how 
can I use it as ticker.



I wouldn't trust timecounters for some time after waking up after a
deep sleep.  If their clock stopped then the times read might only be
very out of date.  If their clock didn't stop, then they might have
wrapped or otherwise overflowed and the times read would be garbage.
Is there any locking or ordering to prevent them being used before they
are reinitialized?


I am not sure what reinitialization are you talking about. IIRC, there
is no any waking up code for TSC. None other time counters have
problems with C-states.


It is the timecounter code that needs reinitializing.  If the TSC stops,
or wraps mod 2**32, then its counts become garbage for the purpose of
timecounting.  Maybe it is not used for timecounting in either of these
cases.  But these cases shouldn't prevent its use for timecounting.

The 2**32 number is because timecounters only use 32 bits of hardware
counters (for efficiency).  So even if the hardware has some magic to
not stop the TSC while sleeping (maybe it fakes not stopping it be
reloading on wakeup), it is still unusable by timecounters after sleeping
for a second or 2 so that it wraps.  The software needs similar faking
to reload the timecounter on wakeup.  This makes use of timecounters in
sleep/wakeup code fragile.


At this moment I am not talking about S-states sleeping for hours. I am 
talking about C-states for milliseconds. It means that TSC may stop and 
start 10K times each second or even more. Attempt to save and restore 
its state will consume so much resources, that probably make it useless.


What's about wrap after 2 seconds, I would be happy to make CPU sleep 
for so long, but now 100ms is all I can hope even on idle system.



At boot time there is a dummy timecounter that returns bogo-times.
Apparently sleeping doesn't occur before the timecounter is switched to
a real one.  The dummy timecounter isn't switched back to after boot
time.  But it probably should be, since the hardware timecounter may
have stopped or wrapped.  Sleeping could just set a flag to indicate
this state, but then you would have to provide a fake time anyway on
finding the flag set.  Boot time just points to the dummy timecounter
so as not to check this flag in all early timecounter "hardware" calls.


And how dummy timecounter that counts something, but not time, can help 
me to measure sleep time?


--
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: Using bintime() in acpi_cpu_idle()?

2012-07-29 Thread Alexander Motin

On 29.07.2012 15:26, Bruce Evans wrote:

On Sun, 29 Jul 2012, Alexander Motin wrote:


On 29.07.2012 11:37, Bruce Evans wrote:
...

binuptime() is more accurate than uncalibrated scaling.  Is accuracy
required?


Accuracy is not required at all. +-20% is not a problem.


If not, the CPU ticker might work, and is faster than HPET,
and and is not under user control for perverse settings.  It normally
reduces to readtsc() with no serializing instruction even in proposed
changes.  This is good enough for process times (not very good) and
depends on the CPU not changing.  Its calibration is very accurate
(similar to timecounters) modulo bugs, but not always up to date.


Problem with ticker that it may stop during idle periods, and idle is
exactly what happens here. Unlike timecounter usage here we don't need
CPU synchronicity, but we need it working during deep sleeps.


The ticker is the same as the timecounter in many cases of interest.  If
the TSC stops then it cannot be used for timecounting unless timecounting
is reinitialized.  Timecounting should be reinitialized after deep sleeps,
but you say you need it to work during deep sleeps.


Timecounter already has detection logic to disable TSC in cases where it 
is unreliable. I don't want to replicate it here. I need not precise and 
not synchronized by reliable and fast time source.



I wouldn't trust timecounters for some time after waking up after a
deep sleep.  If their clock stopped then the times read might only be
very out of date.  If their clock didn't stop, then they might have
wrapped or otherwise overflowed and the times read would be garbage.
Is there any locking or ordering to prevent them being used before they
are reinitialized?


I am not sure what reinitialization are you talking about. IIRC, there 
is no any waking up code for TSC. None other time counters have problems 
with C-states.


--
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: Using bintime() in acpi_cpu_idle()?

2012-07-29 Thread Alexander Motin

On 29.07.2012 11:37, Bruce Evans wrote:

On Sun, 29 Jul 2012, Alexander Motin wrote:


With ACPI timer gradually becoming one of slowest in the system, is
there some reason to use it directly in acpi_cpu_idle()? I've made a
patch:
http://people.freebsd.org/~mav/sleep_time.patch
to use binuptime() instead. Using even HPET from system time counter
(not even speaking about TSC) that significantly improves performance
on some workloads if this code is not covered by MWAIT optimization in
cpu_idle().


Does it work with a perverse timecounter like the i8254 work?


At least on my test system it does, even though predictably much slower 
then the others.



The
user is permitted to switch to any supported timecounter.  There are
other perverse ones:
- ACPI.  This seems to be unavailable if the system thinks ACPI-fast
   works.  Bug.  The user should be able to downgrade to it if ACPI-fast
   in fact doesn't work.  Since it reads the hardware more than once,
   it is much slower than direct use of the hardware.
- ACPI-fast.  Even this is perverse.  It only reads the hardware once,
   but goes through many software layers.

binuptime() is more accurate than uncalibrated scaling.  Is accuracy
required?


Accuracy is not required at all. +-20% is not a problem.


If not, the CPU ticker might work, and is faster than HPET,
and and is not under user control for perverse settings.  It normally
reduces to readtsc() with no serializing instruction even in proposed
changes.  This is good enough for process times (not very good) and
depends on the CPU not changing.  Its calibration is very accurate
(similar to timecounters) modulo bugs, but not always up to date.


Problem with ticker that it may stop during idle periods, and idle is 
exactly what happens here. Unlike timecounter usage here we don't need 
CPU synchronicity, but we need it working during deep sleeps.


--
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Using bintime() in acpi_cpu_idle()?

2012-07-28 Thread Alexander Motin

Hi.

With ACPI timer gradually becoming one of slowest in the system, is 
there some reason to use it directly in acpi_cpu_idle()? I've made a patch:

http://people.freebsd.org/~mav/sleep_time.patch
to use binuptime() instead. Using even HPET from system time counter 
(not even speaking about TSC) that significantly improves performance on 
some workloads if this code is not covered by MWAIT optimization in 
cpu_idle().


--
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: [stable 9] broken hwpstate calls

2012-06-07 Thread Alexander Motin

On 06/07/12 21:04, Andriy Gapon wrote:

on 07/06/2012 11:38 Alexander Motin said the following:

On 06/07/12 11:10, Andriy Gapon wrote:

on 07/06/2012 02:02 Jung-uk Kim said the following:

Any way, hwpstate still isn't quite right even without your patch.

sys/kern/kern_cpu.c cpufreq_curr_sysctl() ->   CPUFREQ_SET() -> /* for all
CPU devices */ cf_set_method() -> /* thread_lock(), sched_bind(), ... */
CPUFREQ_DRV_SET() ->   sys/x86/cpufreq/hwpstate.c hwpstate_set() ->
hwpstate_goto_pstate()/* for each CPU unit */ /* thread_lock(),
sched_bind(), ... */


Oh, I didn't realize that there was the cpufreq-level loop over all CPUs!
That really sucks.

Maybe some day we should accept that different CPUs could legitimately be in
different P-states and provide support for that throughout the stack (from
powerd to drivers).


Support for different P-states on different CPUs can be useful if CPUs have
different capabilities.


Not sure what you mean... I was talking about setting different CPUs to
different P-states based on the per-CPU conditions (e.g. utilization).  I
certainly didn't mean to talk about heterogeneous P-state definitions or any
other heterogeneous silicon issues.


As you wish, but at this moment it is the only realistic application I 
see. As I've told below, setting different frequencies to different 
cores without scheduler awareness is a bad idea.



I believe it is very rare, but possible. At this moment
cpufreq should set for each CPU frequency closest to one that was set on BSP. It
should be possible to make powerd to read sets of frequencies from all CPUs and
do the same, just more intelligently.

Same time using very different frequencies for different CPUs can IMHO be very
problematic even in theory. For SMP systems it is quite difficult (because of
threads migration and possible inter-operations of multiple threads) to identify
cases when even global frequency can be reduced without proportional performance
penalty. Making in per-CPU multiplies number of options and requires awareness
from the scheduler.


I humbly disagree.  I think that it's not a job of scheduler to be overly smart
when power-saving policies are in effect.  IMO, scheduler should just do its own
job and powerd should react to individual loads of CPUs.  Where latencies really
matter there powerd should not be used (or perhaps used with some different
policy skewed towards performance vs economy).


Scheduler usually operates in terms of milliseconds or less. powerd 
operates in best case in terms of fractions of seconds (or it will eat 
more power then save). Unless you are doing some heavy CPU-bound math 
without any context switches, it won't work well without scheduler aware 
about available computation resources.


> Also, Linux does it, so it must at least doable :-)

I don't know whether or how Linux does it. If you know how to do it 
effectively -- welcome, be my guest. :)


--
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: [stable 9] broken hwpstate calls

2012-06-07 Thread Alexander Motin

On 06/07/12 11:10, Andriy Gapon wrote:

on 07/06/2012 02:02 Jung-uk Kim said the following:

Any way, hwpstate still isn't quite right even without your patch.

sys/kern/kern_cpu.c cpufreq_curr_sysctl() ->  CPUFREQ_SET() ->/* for all
CPU devices */ cf_set_method() ->/* thread_lock(), sched_bind(), ... */
CPUFREQ_DRV_SET() ->  sys/x86/cpufreq/hwpstate.c hwpstate_set() ->
hwpstate_goto_pstate()  /* for each CPU unit */ /* thread_lock(),
sched_bind(), ... */


Oh, I didn't realize that there was the cpufreq-level loop over all CPUs!
That really sucks.

Maybe some day we should accept that different CPUs could legitimately be in
different P-states and provide support for that throughout the stack (from
powerd to drivers).


Support for different P-states on different CPUs can be useful if CPUs 
have different capabilities. I believe it is very rare, but possible. At 
this moment cpufreq should set for each CPU frequency closest to one 
that was set on BSP. It should be possible to make powerd to read sets 
of frequencies from all CPUs and do the same, just more intelligently.


Same time using very different frequencies for different CPUs can IMHO 
be very problematic even in theory. For SMP systems it is quite 
difficult (because of threads migration and possible inter-operations of 
multiple threads) to identify cases when even global frequency can be 
reduced without proportional performance penalty. Making in per-CPU 
multiplies number of options and requires awareness from the scheduler.


--
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: Tyan S3992-E: hpet no longer working

2011-01-10 Thread Alexander Motin
Bruce Evans wrote:
> On Tue, 11 Jan 2011, Alexander Motin wrote:
> 
>> Arno J. Klaassen wrote:
>>> Sure .. that said, the BIOS I use is the last official release for this
>>> board (Sept 2009) and not even a more recent beta-release is available.
>>>
>>> I would expect reporting a disabled device which cannot be enabled via
>>> de BIOS a bug deserving a newer release.
>>>
>>> Anyway, this bug isn't very harmful for me, but the non-hpet
>>> timecounters don't seem that fun either :
>>>
>>>  # uptime
>>>10:27PM  up 2 days,  5:44
>>>
>>>  # sysctl kern.timecounter.hardware kern.timecounter.choice
>>>kern.timecounter.hardware: ACPI-safe
>>>kern.timecounter.choice: TSC(-100) i8254(0) ACPI-safe(850)
>>> dummy(-100)
>>>
>>>  # vmstat -i | fgrep cpu:
>>>cpu0:timer  38599321199
>>>cpu6:timer   2151003 11
>>>cpu1:timer   7121075 36
>>>cpu3:timer   1808269  9
>>>cpu5:timer   3832463 19
>>>cpu2:timer   2399988 12
>>>cpu7:timer   2013444 10
>>>cpu4:timer  21630368111
>>>
>>>   (default HZ )
>>>
>>> Maybe I should try downgrading the BIOS?
>>
>> So what here seems not funny to you? Lower timer interrupt rate is not a
>> bug but feature of 9-CURRENT.
> 
> They (cpu*:timer) also aren't timecounters :-).

Sure. They've never been timecounters.

> Hmm, with hpet on FreeBSD cluster machines, there is now only hpet.  How
> is statclock distributed with hpet?

If there are enough timers for each CPU and their IRQs are not shareable
-- they are assigned to each CPU, one to one. The rest logic is same for
all drivers: if timer is not per-CPU - it is used for all CPUs and
events redistributed via IPI by MI code. Plus of one-shot mode - we
don't need separate timer hardware for statclock.

> I never properly reviewed the latest "irqN"-printing changes in systat,
> and just noticed that they break printing of "irq" the usual case where
> the interrupt name starts with "irqN:" (then systat removes "irqN:" and
> never puts back "irq").
> 
> Not-so-quick fix:
> 
> % Index: vmstat.c
> % ===
> % RCS file: /home/ncvs/src/usr.bin/systat/vmstat.c,v
> % retrieving revision 1.93
> % diff -u -2 -r1.93 vmstat.c
> % --- vmstat.c11 Dec 2010 08:32:16 -1.93
> % +++ vmstat.c11 Jan 2011 06:20:01 -
> % @@ -244,5 +244,10 @@
> %  *--cp1 = '\0';
> % % -/* Convert "irqN: name" to "name irqN". */
> % +/*
> % + * Convert "irqN: name" to "name irqN", "name N" or
> % + * "name".  First reduce to "name"; then append
> % + * " irqN"  if that fits, else " N" if that fits,
> % + * else drop all of "N".
> % + */
> %  if (strncmp(cp, "irq", 3) == 0) {
> %  cp1 = cp + 3;
> % @@ -256,5 +261,8 @@
> %  cp2 = strdup(cp);
> %  bcopy(cp1, cp, sz - (cp1 - cp) + 1);
> % -if (sz <= 10 + 4) {
> % +if (sz <= 10 + 1) {
> % +strcat(cp, " ");
> % +strcat(cp, cp2);
> % +} else if (sz <= 10 + 4) {
> %  strcat(cp, " ");
> %  strcat(cp, cp2 + 3);
> % @@ -266,5 +274,9 @@
> %  /*
> %   * Convert "name irqN" to "name N" if the former is
> % - * longer than the field width.
> % + * longer than the field width.  This handles some
> % + * cases where the original name did not start with
> % + * "irqN".  We don't bother dropping partial "N"s in
> % + * this case.  The "name" part may be too long in
> % + * either case; then we blindly truncate it.
> %   */
> %  if ((cp1 = strstr(cp, "irq")) != NULL &&
> 
> This restores part of rev.1.90, updates the first comment to match the code
> changes in rev.1.90, and expands the last comment.

Heh. If I haven't forgot some prehistory, you may be right.

-- 
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: Tyan S3992-E: hpet no longer working

2011-01-10 Thread Alexander Motin
Ah, that's fine and indeed just funny. When CPUs are idle now, they receive
minimal amount of interrupts to allow reduced power consumption. Your
numbers just tells that load is not equal. You may see in systat how rates
changing depending on load.

-- 
Alexander Motin

11.01.2011 0:42 пользователь "Arno J. Klaassen" 
написал:
> Alexander Motin  writes:
>
>> Arno J. Klaassen wrote:
>>> Sure .. that said, the BIOS I use is the last official release for this
>>> board (Sept 2009) and not even a more recent beta-release is available.
>>>
>>> I would expect reporting a disabled device which cannot be enabled via
>>> de BIOS a bug deserving a newer release.
>>>
>>> Anyway, this bug isn't very harmful for me, but the non-hpet
>>> timecounters don't seem that fun either :
>>>
>>> # uptime
>>> 10:27PM up 2 days, 5:44
>>>
>>> # sysctl kern.timecounter.hardware kern.timecounter.choice
>>> kern.timecounter.hardware: ACPI-safe
>>> kern.timecounter.choice: TSC(-100) i8254(0) ACPI-safe(850)
dummy(-100)
>>>
>>> # vmstat -i | fgrep cpu:
>>> cpu0:timer 38599321 199
>>> cpu6:timer 2151003 11
>>> cpu1:timer 7121075 36
>>> cpu3:timer 1808269 9
>>> cpu5:timer 3832463 19
>>> cpu2:timer 2399988 12
>>> cpu7:timer 2013444 10
>>> cpu4:timer 21630368 111
>>>
>>> (default HZ )
>>>
>>> Maybe I should try downgrading the BIOS?
>>
>> So what here seems not funny to you? Lower timer interrupt rate is not a
>> bug but feature of 9-CURRENT.
>
> the standard deviation in the values; I don't have another 8-way by
> hand, but a 4-way 6-STABLE gives :
>
> cpu0: timer 3299774936 2000
> cpu2: timer 3299757640 2000
> cpu3: timer 3299757640 2000
> cpu1: timer 3299757640 2000
>
> and my 8-STABLE notebook (with kern.hz=100) :
>
> cpu0: timer 323161363 400
> cpu1: timer 323161114 400
>
> A range from 9 to 199 is 'funny', maybe I choose the wrong word, but
> I didn't see such discrepancies before. Sorry
>
> Best, Arno
>
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: Tyan S3992-E: hpet no longer working

2011-01-10 Thread Alexander Motin
Arno J. Klaassen wrote:
> Sure .. that said, the BIOS I use is the last official release for this
> board (Sept 2009) and not even a more recent beta-release is available.
> 
> I would expect reporting a disabled device which cannot be enabled via
> de BIOS a bug deserving a newer release.
> 
> Anyway, this bug isn't very harmful for me, but the non-hpet
> timecounters don't seem that fun either :
> 
>  # uptime
>10:27PM  up 2 days,  5:44
> 
>  # sysctl kern.timecounter.hardware kern.timecounter.choice 
>kern.timecounter.hardware: ACPI-safe
>kern.timecounter.choice: TSC(-100) i8254(0) ACPI-safe(850) dummy(-100)
> 
>  # vmstat -i | fgrep cpu:
>cpu0:timer  38599321199
>cpu6:timer   2151003 11
>cpu1:timer   7121075 36
>cpu3:timer   1808269  9
>cpu5:timer   3832463 19
>cpu2:timer   2399988 12
>cpu7:timer   2013444 10
>cpu4:timer  21630368111
> 
>   (default HZ )
> 
> Maybe I should try downgrading the BIOS?

So what here seems not funny to you? Lower timer interrupt rate is not a
bug but feature of 9-CURRENT.

-- 
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: Tyan S3992-E: hpet no longer working

2011-01-10 Thread Alexander Motin
John Baldwin wrote:
> On Saturday, January 08, 2011 11:46:02 am Alexander Motin wrote:
>> Arno J. Klaassen wrote:
>>> John Baldwin  writes:
>>>
>>>> On Thursday, January 06, 2011 5:32:08 pm Arno J. Klaassen wrote:
>>>>> John Baldwin  writes:
>>>>>
>>>>>> On Wednesday, January 05, 2011 4:39:24 pm Arno J. Klaassen wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I have (a long-lasting) problem to get hpet attached to a Tyan S3992-E
>>>>>>> MB. My last known working kernel is 7.1-PRERELEASE Sep 2 2008" , I
>>>>>>> rarely cared about this board for a while...
>>>>>>>
>>>>>>> At that time the dmesg said :
>>>>>>>
>>>>>>>
>>>>>>>   acpi_hpet0:  iomem 0xfed0-0xfed003ff
>>>>>>>   on acpi0
>>>>>>>   Timecounter "HPET" frequency 2500 Hz quality 900
>>>>>>>
>>>>>>> now it says (debug.acpi.hpet_test="1", debug.acpi.layer="ACPI_TIMER",
>>>>>>> debug.acpi.level="ACPI_LV_ALL_EXCEPTIONS" enabled) :
>>>>>>>
>>>>>>>   hpet0:  iomem 0xfed0-0xfed03fff on
>>>>>>>   acpi0
>>>>>>>   hpet0: vendor 0x, rev 0xff, 232831Hz 64bit, 32 timers, legacy 
>>>>>>> route
>>>>>>>   hpet0:  t0: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t1: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t2: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t3: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t4: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t5: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t6: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t7: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t8: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t9: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t10: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t11: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t12: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t13: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t14: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t15: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t16: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t17: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t18: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t19: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t20: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t21: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t22: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t23: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t24: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t25: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t26: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t27: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t28: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t29: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t30: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0:  t31: irqs 0x (31), MSI, 64bit, periodic
>>>>>>>   hpet0: 0.0: 4294967295 ... 4294967295 = 0
>>>>>>>   hpet0: time per call: 0 ns
>>>>>>>   hpet0: HPET never increments, disabling
>>>>>>>   device_attach: hpet0 attach returned 6
>>>>>>>
>>>>>>>
>>>>>>> Some things strike me :
>>>>>>>
>>>>>>>   'vendor 0x, rev 0xf' and '4294967295 (== 0x)' as well
>>>>>>> as 232831Hz
>>>>>>>

Re: Tyan S3992-E: hpet no longer working

2011-01-08 Thread Alexander Motin
 Method (_STA, 0, NotSerialized)
>>>> {
>>>> Return (0x0F)
>>>> }
>>>>
>>>> Method (_CRS, 0, NotSerialized)
>>>> {
>>>> Return (ResourceTemplate ()
>>>> {
>>>> Memory32Fixed (ReadWrite,
>>>> 0xFED0, // Address Base
>>>> 0x4000, // Address Length
>>>> )
>>>> })
>>>> }
>>>> }
>>>>
>>>> So it does look like we are doing what the DSDT tells us in terms
>>>> of the memory address.
>>> yop. That said, I made yet another copy-paste error: the last known
>>> working kernel is 8.0-CURRENT Mar  1 2009 and the hpet says :
>>>
>>>   acpi_hpet0:  iomem 0xfed0-0xfed003ff
>>>   on acpi0
>>>   Timecounter "HPET" frequency 14318180 Hz quality 900 
>>>
>>> [only the frequency differs, the memory range indeed then was reported as
>>> 0x400 and not 0x4000 ]
>>>
>>>> Arno, are there any BIOS options that mention the HPET or have you updated 
>>>> your BIOS since you booted the 7.1 kernel?
>>> yes .. I now use BIOS 1.06 released 06/09/09.
>>> Can I somehow 'overide' the bios and force the driver to use 0X400 as
>>> 'Address Length' in order to test if that makes the driver attach again?
>> Changing the length wouldn't make a difference as we would still read the 
>> same 
>> registers since the start address is identical.  I think the length is 
>> symptomatic of the BIOS doing something differently that has disabled the 
>> HPET.
> 
> good point : this failure probably is not related to the FreeBSD-driver
> : in the current BIOS under the submenu 'South Bridge Chipset
> Configuration', the option to enable the HPET has disappeared (no
> mention of that in the release-notes), whilst it was present in the
> original BIOS, *and* disabled by default.
> 
> Is it possible to write to some register during hpet_enable() and force
> the timer to tick, regardless of the BIOS?

Problem seems not about ticking, but about HPET registers working at
all. Returning ffh values for everything more probably tells that HPET
is just not in place where we look for it.

-- 
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: Event based scheduling and USB.

2010-10-29 Thread Alexander Motin
Alexander Motin wrote:
> Takanori Watanabe wrote:
>> I updated my FreeBSD tree on laptop, to the current
>> as of 18 Oct.2010, it works fine with CPU C3 state enabled,
>>
>> I think this is your achievement of event time scheduler,
>> thanks!
>>
>> But when USB driver is enabled, the load average is considerablly 
>> high (0.6 to 1.0) if sysctl oid kern.eventtimer.periodic is set to 0.
>>  Then kern.eventtimer.periodic is set to 1, the load average goes
>> to 0 quickly as before, but almost never transit to C3.
>>
>> Is this behavior expected, or something wrong?
>> I noticed one of usb host controller device shares HPET irq.
>> When I implement interrupt filter in uhci driver, the load average
>> goes to 0 as before.
>>
>> 
>> % vmstat -i
>> interrupt  total   rate
>> irq1: atkbd0 398  2
>> irq9: acpi0  408  2
>> irq12: psm03  0
>> irq19: ehci1  37  0
>> irq20: hpet0 uhci0 35970230
>> irq22: ehci0   2  0
>> irq256: em04  0
>> irq257: ahci0   1692 10
>> Total  38514246
>> ===
> 
> I haven't noticed that issue and it is surely not expected for me. I
> will try to reproduce it.

I've easily reproduced the problem. Scheduler tracing shows that problem
is the result of aliasing between "swi4: clock" thread on one CPU
(measuring load average) and "irq21: hpet0 uhci1" thread on another.
Those two events are aliased by definition due to shared interrupt
source. Not sure what to do with it. Either we should change algorithm
of load average calculation or exclude timer's interrupt threads from
load average accounting. Adding interrupt filter for USB also reasonably
helps, but it is only a partial solution for this specific sharing case.

-- 
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: Event based scheduling and USB.

2010-10-27 Thread Alexander Motin
Takanori Watanabe wrote:
> In message <4cc732c7.50...@freebsd.org>, Alexander Motin wrote:
>> Most likely you should be able to avoid interrupt sharing using some
>> additional HPET options, described at hpet(4).
> 
> Try to disable using shared IRQ with uhci, the IRQ used by HPET become cpu: 
> interrupt and certainly load average goes quite low, 

If you mean "cpuX:timer" - then probably you did something wrong and
system fallen back to LAPIC timer.

> but never transit to C3 state. Using legacy route, it works quite well.

C3 state is blocked when LAPIC timer used.

-- 
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: Event based scheduling and USB.

2010-10-27 Thread Alexander Motin
Nate Lawson wrote:
> On 10/26/2010 12:57 PM, Alexander Motin wrote:
>> Takanori Watanabe wrote:
>>> I updated my FreeBSD tree on laptop, to the current
>>> as of 18 Oct.2010, it works fine with CPU C3 state enabled,
>>>
>>> I think this is your achievement of event time scheduler,
>>> thanks!
> 
> Ah, so mav@ implemented a tickless-scheduler? That is nice.

Not exactly. I've only made system to delay empty ticks when idle and
execute them later on wakeup in a batch. Scheduler work is still wanted.

>>> But when USB driver is enabled, the load average is considerablly 
>>> high (0.6 to 1.0) if sysctl oid kern.eventtimer.periodic is set to 0.
>>>  Then kern.eventtimer.periodic is set to 1, the load average goes
>>> to 0 quickly as before, but almost never transit to C3.
>>>
>>> Is this behavior expected, or something wrong?
> 
> The USB controller often keeps the bus mastering bit set. This keeps the
> system out of C3. The way to fix this is to implement global suspend.
> Put a device in suspend mode and then turn off power to the USB port it
> is on. Then the USB controller will stop polling the bus.

As I understand, if respective USB port is not used, USB stack should
put it into power_save mode not poll so often to deny entering C3 state.

>>> I noticed one of usb host controller device shares HPET irq.
>>> When I implement interrupt filter in uhci driver, the load average
>>> goes to 0 as before.
>>>
>>>
>>> 
>>> % vmstat -i
>>> interrupt  total   rate
>>> irq1: atkbd0 398  2
>>> irq9: acpi0  408  2
>>> irq12: psm03  0
>>> irq19: ehci1  37  0
>>> irq20: hpet0 uhci0 35970230
>>> irq22: ehci0   2  0
>>> irq256: em04  0
>>> irq257: ahci0   1692 10
>>> Total  38514246
>>> ===
>> I haven't noticed that issue and it is surely not expected for me. I
>> will try to reproduce it.
>>
>> Most likely you should be able to avoid interrupt sharing using some
>> additional HPET options, described at hpet(4).
> 
> This seems silly. The whole point of APIC is to avoid clustering on a
> single interrupt but the BIOS put the timer on the USB controller irq?

HPET timer is not a regular ISA or PCI device. It allows several
different interrupt configurations. In most cases I remember, BIOS
setups interrupts 0 and 8, like for legacy_route mode. But this mode is
not really suitable as default in our case ATM due to conflict with
atrtc and attimer drivers.

-- 
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: Event based scheduling and USB.

2010-10-26 Thread Alexander Motin
Takanori Watanabe wrote:
> I updated my FreeBSD tree on laptop, to the current
> as of 18 Oct.2010, it works fine with CPU C3 state enabled,
> 
> I think this is your achievement of event time scheduler,
> thanks!
> 
> But when USB driver is enabled, the load average is considerablly 
> high (0.6 to 1.0) if sysctl oid kern.eventtimer.periodic is set to 0.
>  Then kern.eventtimer.periodic is set to 1, the load average goes
> to 0 quickly as before, but almost never transit to C3.
> 
> Is this behavior expected, or something wrong?
> I noticed one of usb host controller device shares HPET irq.
> When I implement interrupt filter in uhci driver, the load average
> goes to 0 as before.
> 
> 
> 
> % vmstat -i
> interrupt  total   rate
> irq1: atkbd0 398  2
> irq9: acpi0  408  2
> irq12: psm03  0
> irq19: ehci1  37  0
> irq20: hpet0 uhci0 35970230
> irq22: ehci0   2  0
> irq256: em04  0
> irq257: ahci0   1692 10
> Total  38514246
> ===

I haven't noticed that issue and it is surely not expected for me. I
will try to reproduce it.

Most likely you should be able to avoid interrupt sharing using some
additional HPET options, described at hpet(4).

> BTW, when USB port is enabled C3 transition rate gets lower.
> I think it is likely to occur. But how can I supress power 
> consumption? 

I can't say about USB, but you may try this patch to optimize some other
subsystems: http://people.freebsd.org/~mav/tm6292_idle.patch

> It's time to implement powertop for freebsd, isn't it?

Surely it is. I was even thinking about possibility to port one from
OpenSolaris, but other work distracted me. You may take it, it you wish.

-- 
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: i386/135447: [i386] [request] Intel Core i7 and Nehalem-EP new features not supported

2010-09-24 Thread Alexander Motin
The following reply was made to PR i386/135447; it has been noted by GNATS.

From: Alexander Motin 
To: Dmitry Kubov 
Cc: Andriy Gapon , j...@freebsd.org, 
 bug-followup 
Subject: Re: i386/135447: [i386] [request] Intel Core i7 and Nehalem-EP new
 features not supported
Date: Fri, 24 Sep 2010 10:22:41 +0300

 Dmitry Kubov wrote:
 > Is it possible to stick running threads to same CPU core for longer time
 > to avoid C-states latencies penalty?
 
 man 1 cpuset
 
 -- 
 Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: i386/135447: [i386] [request] Intel Core i7 and Nehalem-EP new features not supported

2010-09-23 Thread Alexander Motin
The following reply was made to PR i386/135447; it has been noted by GNATS.

From: Alexander Motin 
To: Dmitry Kubov 
Cc: Andriy Gapon , j...@freebsd.org, 
 bug-followup 
Subject: Re: i386/135447: [i386] [request] Intel Core i7 and Nehalem-EP new
 features not supported
Date: Thu, 23 Sep 2010 16:07:32 +0300

 Dmitry Kubov wrote:
 >> Try to kill powerd and manually set highest CPU frequency. 0.40s test
 >> time looks a bit suspicious, as powerd may just not react in time to set
 >> P0 state.
 >>
 > powerd does not enabled. Where/how set highest CPU frequency?
 
 sysctl dev.cpu |grep freq
 
 -- 
 Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: i386/135447: [i386] [request] Intel Core i7 and Nehalem-EP new features not supported

2010-09-23 Thread Alexander Motin
The following reply was made to PR i386/135447; it has been noted by GNATS.

From: Alexander Motin 
To: Dmitry Kubov 
Cc: Andriy Gapon , j...@freebsd.org, 
 bug-followup 
Subject: Re: i386/135447: [i386] [request] Intel Core i7 and Nehalem-EP new
 features not supported
Date: Thu, 23 Sep 2010 16:01:09 +0300

 Dmitry Kubov wrote:
 > 
 >> This CPU has only 266MHz TurboBoost speedup. And some part of it
 >> (probably half) could be enabled all the time. This benefit still could
 >> be overweighted by C-states latencies penalty. It could be interesting
 >> to test some other workloads, like compilation with different number of
 >> threads.
 >>
 > 
 > Actually tested 8.1-RELEASE with both TurboBoost options in BIOS:
 > 
 > TurboBoost OFF
 > Ubench Single CPU:   451935 (0.40s)
 > Ubench Single CPU:   450927 (0.40s)
 > Ubench Single CPU:   450486 (0.40s)
 > 
 > TurboBoost ON
 > Ubench Single CPU:   450890 (0.40s)
 > Ubench Single CPU:   450890 (0.40s)
 > Ubench Single CPU:   449926 (0.40s)
 > 
 > C-states latencies penalty is reasonable idea. But looks like P0-state
 > not activated at all.
 
 Try to kill powerd and manually set highest CPU frequency. 0.40s test
 time looks a bit suspicious, as powerd may just not react in time to set
 P0 state.
 
 > What about too high %% for C3 state during heavy load:
 > dev.cpu.0.cx_usage: 0.17% 0.06% 99.75% last 7560us
 
 It's not really strange. These numbers count number of enters into each
 state. So when CPU is completely bust - they won't be updated. Main case
 when C1 state should be actively used/counted is loads with high
 interrupt rate or heavy context switching, such as disk I/O or network load.
 
 >> Disk performance fix is reasonable. Some recent improvements in
 >> 9-CURRENT should improve it even more. What's about ubench - try some
 >> different load.
 >>
 > Can you suggest other CPU only benchmark?
 > 
 > make -j 16 buildworld
 > can't load all cores, can't see less than 11% idle
 
 I think it's not the main goal to completely load all CPUs. But this
 test is realistic and has really usable result.
 
 -- 
 Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: i386/135447: [i386] [request] Intel Core i7 and Nehalem-EP new features not supported

2010-09-23 Thread Alexander Motin
The following reply was made to PR i386/135447; it has been noted by GNATS.

From: Alexander Motin 
To: Dmitry Kubov 
Cc: Andriy Gapon , j...@freebsd.org, 
 bug-followup 
Subject: Re: i386/135447: [i386] [request] Intel Core i7 and Nehalem-EP new
 features not supported
Date: Thu, 23 Sep 2010 15:34:23 +0300

 Dmitry Kubov wrote:
 >> It would be
 >> interesting to repeat same test if you updated to 8-STABLE or at least
 >> apply patch from SVN rev 209897 on 2010-07-11 11:58:46Z.
 > 
 > New system:
 > CPU: Intel(R) Xeon(R) CPU   X5680  @ 3.33GHz (.47-MHz
 > K8-class CPU)
 > FreeBSD/SMP: Multiprocessor System Detected: 12 CPUs
 > FreeBSD/SMP: 2 package(s) x 6 core(s)
 > HT disabled in BIOS.
 
 This CPU has only 266MHz TurboBoost speedup. And some part of it
 (probably half) could be enabled all the time. This benefit still could
 be overweighted by C-states latencies penalty. It could be interesting
 to test some other workloads, like compilation with different number of
 threads.
 
 > Note /3334 difference:
 > TurboBoost disabled:
 > dev.cpu.0.freq: 
 > dev.cpu.0.freq_levels: /13 3200/117000 3067/105000 2933/94000
 > 2800/85000
 >  2667/76000 2533/68000 2400/61000 2267/54000 2133/48000 2000/43000
 > 1867/39000 17
 > 33/35000 1600/32000 1400/28000 1200/24000 1000/2 800/16000 600/12000
 > 400/8000 200/4000
 > dev.est.0.freq_settings: /13 3200/117000 3067/105000 2933/94000
 > 2800/850
 > 00 2667/76000 2533/68000 2400/61000 2267/54000 2133/48000 2000/43000
 > 1867/39000 1733/35000 1600/32000
 > 
 > TurboBoost enabled:
 > dev.cpu.0.freq: 3334
 > dev.cpu.0.freq_levels: 3334/143000 3200/117000 3067/105000 2933/94000
 > 2800/85000
 >  2667/76000 2533/68000 2400/61000 2267/54000 2133/48000 2000/43000
 > 1867/39000 17
 > 33/35000 1600/32000 1400/28000 1200/24000 1000/2 800/16000 600/12000
 > 400/8000 200/4000
 > dev.est.0.freq_settings: 3334/143000 /13 3200/117000 3067/105000
 > 2933/94
 > 000 2800/85000 2667/76000 2533/68000 2400/61000 2267/54000 2133/48000
 > 2000/43000 1867/39000 1733/35000 1600/32000
 
 Intel writes that BIOS may report additional P-state with 1MHz
 difference, to allow OS to control TurboBoost. It's just cpufreq
 subsystem behavior/limitation to drop very close frequencies. Actually I
 am not sure how this additional P-state could be used, except for testing.
 
 > In short: no 60% disk io performance drop in 8.1-STABLE. Other tests
 > give same results like 8.1-RELEASE, 5% average cpu performance drop.
 
 Disk performance fix is reasonable. Some recent improvements in
 9-CURRENT should improve it even more. What's about ubench - try some
 different load.
 
 -- 
 Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: i386/135447: [i386] [request] Intel Core i7 and Nehalem-EP new features not supported

2010-09-21 Thread Alexander Motin
The following reply was made to PR i386/135447; it has been noted by GNATS.

From: Alexander Motin 
To: Dmitry Kubov 
Cc: Andriy Gapon , j...@freebsd.org, 
 bug-follo...@freebsd.org
Subject: Re: i386/135447: [i386] [request] Intel Core i7 and Nehalem-EP new
 features not supported
Date: Tue, 21 Sep 2010 14:16:46 +0300

 Dmitry Kubov wrote:
 > Ok, I am able to activate C3 state after loader.conf tweaks. According
 > to http://www.intel.com/technology/turboboost/
 > 
 > Intel Turbo Boost Technology is activated when the Operating System (OS)
 > requests the highest processor performance state (P0).
 > 
 > I have no clue about P0 state activation on FreeBSD.
 
 P0 is just a highest available CPU frequency. If you are not using
 powerd - it should be set all the time. If you are using powerd - it
 will set it in part of second after load appear.
 
 -- 
 Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: i386/135447: [i386] [request] Intel Core i7 and Nehalem-EP new features not supported

2010-09-20 Thread Alexander Motin
The following reply was made to PR i386/135447; it has been noted by GNATS.

From: Alexander Motin 
To: Dmitry Kubov 
Cc: Andriy Gapon , j...@freebsd.org, 
 bug-follo...@freebsd.org
Subject: Re: i386/135447: [i386] [request] Intel Core i7 and Nehalem-EP new
 features not supported
Date: Mon, 20 Sep 2010 18:49:54 +0300

 Dmitry Kubov wrote:
 >> 205 * 3 and 245 * 3 are both greater than 500, so this is the reason why 
 >> they are
 >> never entered.
 >>
 >> Perhaps Alexander can give some advice here.
 > 
 > Looks like I can simply update src to 8-stable?
 > 
 > SVN rev 212887 on 2010-09-20 05:39:50Z by avg
 > 
 > MFC r212549: acpi_cpu: do not apply P_LVLx_LAT rules to latencies
 > returned by _CST
 
 No, it's different case. This won't help you.
 
 -- 
 Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: i386/135447: [i386] [request] Intel Core i7 and Nehalem-EP new features not supported

2010-09-20 Thread Alexander Motin
The following reply was made to PR i386/135447; it has been noted by GNATS.

From: Alexander Motin 
To: Andriy Gapon 
Cc: Dmitry Kubov , j...@freebsd.org, 
 bug-follo...@freebsd.org
Subject: Re: i386/135447: [i386] [request] Intel Core i7 and Nehalem-EP new
 features not supported
Date: Mon, 20 Sep 2010 18:42:57 +0300

 Andriy Gapon wrote:
 > on 20/09/2010 17:54 Dmitry Kubov said the following:
 >> dev.cpu.7.cx_supported: C1/3 C2/205 C3/245
 > Note these^^^^^^
 >> dev.cpu.7.cx_lowest: C3
 >> dev.cpu.7.cx_usage: 100.00% 0.00% 0.00% last 500us
 > And this --^
 >> C2/C3 not used at all
 > 
 > 205 * 3 and 245 * 3 are both greater than 500, so this is the reason why 
 > they are
 > never entered.
 
 The only way to enter C-states with so high latency is significantly
 increase CPUs' continuous sleep time. Sleep time of 500ms there is
 artificial and calculated as 100/(2*hz). 8.1 was unable yet to
 measure real sleep time in C1. But 2*hz is quite realistic estimation
 for idle system.
 
 Recently I have committed to 9-CURRENT large set of patches, making idle
 CPUs to not wake up on timer interrupts when it is not needed. It allows
 idle CPUs sleep up to as much as 10us, making any C-states available
 now effectively usable. I can acknowledge that TurboBoost on my Core i7
 870 gives about 10% benefit when only one physical core is used:
 http://docs.freebsd.org/cgi/mid.cgi?4C959830.3060808
 
 I have requests and wish to merge these changes into 8-STABLE, but most
 likely it won't happen in nearest few months, as code is very new and
 requires more testing.
 
 Until that time I recommend you to follow this guide:
 http://wiki.freebsd.org/TuningPowerConsumption
 It was actually oriented on laptops, but effective usage of C2/C3 states
  was one of it's goals. Also on my Core i7 870 LAPIC dies in C2/C3
 states, so consider migration to i8254 timer, as also described in this
 guide.
 
 -- 
 Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: CPU C-state storange on Panasonic TOUGH BOOK CF-R9

2010-09-14 Thread Alexander Motin
Andriy Gapon wrote:
> on 14/09/2010 11:44 Andriy Gapon said the following:
>> on 13/09/2010 20:07 Andriy Gapon said the following:
>>> I am also going to take a look how Linux and OpenSolaris name the C-states.
>> Well, Linux does what you suggested, it uses index of a C-state as its name.
>> There is one difference from our current code - if a C-state is skipped for 
>> some
>> reason, then its index is not re-used, but the entry is marked as non-valid.
>> So, if we skip "C2" for some reason, then "C3" will become "C2".  Not so on 
>> Linux.
>> Also, they print a type/class of a C state using C1, C2, C3 and "--" for
>> higher/unknown types.
> 
> OpenSolaris, on the other hand, collapses multiple entries of the same type 
> into
> a single entry using the most power-saving alternative.

I don't think it is perfect choice. In such case it would be useless for
ACPI BIOS to report extra states. The only case when I think can be
reasonable to drop some items is if they are equal except using
different entry methods. For example, one OS may prefer to use port
read, while another may use MWAIT to be able to wake up without using IPI.

> They also use the type as a C state reported name, index is not used in 
> interfacing.

In their case it is possible.

-- 
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: CPU C-state storange on Panasonic TOUGH BOOK CF-R9

2010-09-12 Thread Alexander Motin
Andriy Gapon wrote:
> on 12/09/2010 18:22 Andriy Gapon said the following:
>> Observations are correct, but incomplete; the conclusions are wrong.
>> At the end of the boot there are message like this one:
>> PROCESSOR-0722 [402244] cpu_cx_cst: acpi_cpu0: Got C2 - 245 
>> latency
>> This is a result of re-evaluation of _CST because of a notification from 
>> ACPI.
> 
> But still, as you suggest, a patch like the following should be tested and
> committed:
> 
> --- a/sys/dev/acpica/acpi_cpu.c
> +++ b/sys/dev/acpica/acpi_cpu.c
> @@ -828,7 +828,8 @@ acpi_cpu_cx_list(struct acpi_cpu_softc *sc)
>  sbuf_new(&sb, sc->cpu_cx_supported, sizeof(sc->cpu_cx_supported),
>   SBUF_FIXEDLEN);
>  for (i = 0; i < sc->cpu_cx_count; i++) {
> - sbuf_printf(&sb, "C%d/%d ", i + 1, sc->cpu_cx_states[i].trans_lat);
> + sbuf_printf(&sb, "C%d/%d ", sc->cpu_cx_states[i].type,
> + sc->cpu_cx_states[i].trans_lat);
>   if (sc->cpu_cx_states[i].type < ACPI_STATE_C3)
>   sc->cpu_non_c3 = i;
>  }

I am not sure this patch is complete:
1) AFAIR I have seen somewhere example where system had several C-states
with different latency, but the same type - C3. Type only means
enter/exit semantics, and there could be several states with the same
semantics. Not sire how to properly them in this case. May be existing
approach was not so bad. It is ACPI C-states, not CPU C-states, they are
not same. May be we should just mention type somewhere in addition.
2) This change makes heavily understandable values of cx_lowest.
3) If touch cx_lowest, I would prefer to see there possibility to set it
to some abstract C6 or whatever, allowing system automatically choose
state it has available at the moment.

-- 
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: cpufreq_curr_sysctl: memory allocation

2010-06-20 Thread Alexander Motin
Andriy Gapon wrote:
> I noticed that cpufreq_curr_sysctl performs a substantial memory allocation 
> and
> deallocation on each call.  Its size is CF_MAX_LEVELS * sizeof(*levels), which
> is ~24KB.  This happens even for read-only calls to just query current level.
> And such calls happen quite frequently when powerd is running.

Worse is that it not just consumes time, but causes a bunch or TLB flush
IPIs on free(). For read-only call it doesn't even needs CF_MAX_LEVELS *
sizeof(*levels). sizeof(*levels) seems should be enough there. May be
then it fits into some existing UMA zone, minimizing penalty.

> I think that this is an unnecessary and avoidable load for VM system.
> Couldn't a buffer be preallocated in sc and re-used for the calls?
> Even if not, for some reason, then wouldn't it be better to have a dedicated 
> uma
> zone for that rather than doing malloc+free?

Dedicated rarely used UMA zone may eat much more memory then it is
needed on SMP.

-- 
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Re: Panic on S3 suspend call.

2010-06-08 Thread Alexander Motin
John Baldwin wrote:
> On Tuesday 08 June 2010 5:52:54 am Alexander Motin wrote:
>> Hi.
>>
>> Just noted that fresh HEAD i386 system panics on suspend request when
>> build with INVARIANTS and WITNESS:
>>
>> panic: mutex ACPI global lock owned at ../../../kern/kern_event.c:1899
>> cpuid = 1
>> KDB: enter: panic
>> [ thread pid 1047 tid 100138 ]
>> Stopped at  0x408d29df: movl$0,0x40dded34
>> db> bt
>> Tracing pid 1047 tid 100138 td 0x45fcb9c0
>> kdb_enter(40c75fe3,40c75fe3,40c74763,7c91fb1c,1,...) at 0x408d29df
>> panic(40c74763,40c26898,40c70d4e,76b,7c91fb40,...) at 0x4089ec96
>> _mtx_assert(40da08a0,0,40c70d4e,76b,7c91fb70,...) at 0x4088e227
>> knlist_mtx_assert_unlocked(40da08a0,4088ed2c,40da08a0,45d377c0,3,...) at
>> 0x4086b06e
>> knote(45d377dc,0,0,921,0,...) at 0x4086b9ff
>> acpi_ReqSleepState(456c3700,3,40c2633d,c76,0,...) at 0x404e8f4b
> 
> I think this should fix it:
> 
> Index: acpi.c
> ===
> --- acpi.c(revision 208893)
> +++ acpi.c(working copy)
> @@ -2346,7 +2346,7 @@
>   clone->notify_status = APM_EV_NONE;
>   if ((clone->flags & ACPI_EVF_DEVD) == 0) {
>   selwakeuppri(&clone->sel_read, PZERO);
> - KNOTE_UNLOCKED(&clone->sel_read.si_note, 0);
> +     KNOTE_LOCKED(&clone->sel_read.si_note, 0);
>   }
>  }

With this patch it doesn't panics. A bit surprising, as it was written
so almost three years ago.

-- 
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"


Panic on S3 suspend call.

2010-06-08 Thread Alexander Motin
Hi.

Just noted that fresh HEAD i386 system panics on suspend request when
build with INVARIANTS and WITNESS:

panic: mutex ACPI global lock owned at ../../../kern/kern_event.c:1899
cpuid = 1
KDB: enter: panic
[ thread pid 1047 tid 100138 ]
Stopped at  0x408d29df: movl$0,0x40dded34
db> bt
Tracing pid 1047 tid 100138 td 0x45fcb9c0
kdb_enter(40c75fe3,40c75fe3,40c74763,7c91fb1c,1,...) at 0x408d29df
panic(40c74763,40c26898,40c70d4e,76b,7c91fb40,...) at 0x4089ec96
_mtx_assert(40da08a0,0,40c70d4e,76b,7c91fb70,...) at 0x4088e227
knlist_mtx_assert_unlocked(40da08a0,4088ed2c,40da08a0,45d377c0,3,...) at
0x4086b06e
knote(45d377dc,0,0,921,0,...) at 0x4086b9ff
acpi_ReqSleepState(456c3700,3,40c2633d,c76,0,...) at 0x404e8f4b
acpiioctl(45793400,80045004,45d07810,3,45fcb9c0,...) at 0x404e9118
devfs_ioctl_f(45d820e0,80045004,45d07810,45d34a80,45fcb9c0,...) at
0x4081d1e8
kern_ioctl(45fcb9c0,3,80045004,45d07810,91fcec,...) at 0x408ebdbd
ioctl(45fcb9c0,7c91fcec,0,40cb192e,0,...) at 0x408ebf47
syscallenter(45fcb9c0,7c91fce4,40b9fa00,40dcd190,0,...) at 0x408e0b23
syscall(7c91fd28) at 0x40b9f169
Xint0x80_syscall() at 0x40b7f49a
--- syscall (54, FreeBSD ELF32, ioctl), eip = 0x28183173, esp =
0x3fbfeb1c, ebp = 0x3fbfebf8 ---
db>

-- 
Alexander Motin
___
freebsd-acpi@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-acpi
To unsubscribe, send any mail to "freebsd-acpi-unsubscr...@freebsd.org"