Re: Intel TurboBoost in practice

2010-07-27 Thread Alan Cox
On Mon, Jul 26, 2010 at 9:11 AM, Alexander Motin m...@freebsd.org wrote:

 Robert Watson wrote:
  On Sun, 25 Jul 2010, Alexander Motin wrote:
  The numbers that you are showing doesn't show much difference. Have
  you tried buildworld?
 
  If you mean relative difference -- as I have told, it's mostly because
  of my CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is
  enabled most of time if CPU is not overheated. It probably doesn't, as
  it works on clear table under air conditioner. So maximal effect I can
  expect on is 4.2%. In such situation 2.8% probably not so bad to
  illustrate that feature works and there is space for further
  improvements. If I had Core i5-750S I would expect 33% boost.
 
  Can I recommend the use of ministat(1) and sample sizes of at least 8
  runs per configuration?

 Thanks for pushing me to do it right. :) Here is 3*15 runs with fresh
 kernel with disabled debug. Results are quite close to original: -2.73%
 and -2.19% of time.
 x C1
 + C2
 * C3
 +-+
 |+*  x|
 |+*  x|
 |+*  x|
 |+*  x|
 |+*  x|
 |+*  x|
 |+*  x|
 |+   **  x|
 |+ + ** xx|
 |+ + ** **  xx   x|
 | |__M_A| |
 |A|   |
 ||A|  |
 +-+
NMinMax Median   AvgStddev
 x  15  12.68  12.84  12.69 12.698667   0.039254966
 +  15  12.35  12.36  12.35 12.351333  0.0035186578
 Difference at 95.0% confidence
-0.347333 +/- 0.0208409
-2.7352% +/- 0.164119%
(Student's t, pooled s = 0.0278687)
 *  15  12.41  12.44  12.42 12.42  0.0075592895
 Difference at 95.0% confidence
-0.278667 +/- 0.0211391
-2.19446% +/- 0.166467%
(Student's t, pooled s = 0.0282674)

 I also checked one more aspect -- TurboBoost works only when CPU runs at
 highest EIST frequency (P0 state). I've reduced dev.cpu.0.freq from 3201
 to 3067 and repeated the test:
 x C1
 + C2
 * C3
 +-+
 | x   +  *|
 | x   +  *|
 | x   +  *|
 | x   +  *   *|
 | x  x+  *   *|
 | x  x+  +   *   *|
 | x  x+  +   *   *|
 | x  x+  +   *   *|
 | x  x+   +  +   +   *   *|
 ||MA| |
 |   |_MA_||
 |M_A_||
 +-+
NMinMax Median   AvgStddev
 x  15  13.72  13.73  13.72 13.72  0.0048795004
 +  15  13.79  13.82   13.8 13.80  0.0072374686
 Difference at 95.0% confidence
0.08 +/- 0.00461567
0.582949% +/- 0.0336337%
(Student's t, pooled s = 0.00617213)
 *  15  13.89   13.9  13.8913.894  0.0050709255
 Difference at 95.0% confidence
0.170667 +/- 0.00372127
1.24362% +/- 0.0271164%
(Student's t, pooled s = 0.00497613)

 In that case using C2 or C3 predictably caused small performance reduce,
 as after falling to sleep, CPU needs time to wakeup. Even if tested CPU0
 won't ever sleep during test, it's TLB shutdown IPIs to other cores
 still probably could suffer from waiting other cores' wakeup.


In the deeper sleep states, are the TLB contents actually maintained while
the processor sleeps?  (I notice that in some configurations, we actually
flush dirty data from the cache before sleeping.)

Alan
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to 

Re: Intel TurboBoost in practice

2010-07-27 Thread Alexander Motin
Alan Cox wrote:
 On Mon, Jul 26, 2010 at 9:11 AM, Alexander Motin m...@freebsd.org
 mailto:m...@freebsd.org wrote:
 
 In that case using C2 or C3 predictably caused small performance reduce,
 as after falling to sleep, CPU needs time to wakeup. Even if tested CPU0
 won't ever sleep during test, it's TLB shutdown IPIs to other cores
 still probably could suffer from waiting other cores' wakeup.
 
 In the deeper sleep states, are the TLB contents actually maintained
 while the processor sleeps?  (I notice that in some configurations, we
 actually flush dirty data from the cache before sleeping.)

As I understand, we flush caches only as last resort, if platform does
not supports special techniques, such as disabling arbitration or making
CPU to wake up on bus mastering. But same ACPI C-states could map into
different CPU C-states. Some of these CPU states (like C6) could imply
caches invalidation, though I am not sure it can be seen outside.

ACPI 3.0 specification tells nothing about TLBs, so I am not sure we can
count on their invalidation, except we do it ourselves, like it is done
for caches when CPU can't keep their coherency while sleeping.

-- 
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Intel TurboBoost in practice

2010-07-26 Thread Robert Watson

On Sun, 25 Jul 2010, Alexander Motin wrote:

The numbers that you are showing doesn't show much difference. Have you 
tried buildworld?


If you mean relative difference -- as I have told, it's mostly because of my 
CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is enabled most 
of time if CPU is not overheated. It probably doesn't, as it works on clear 
table under air conditioner. So maximal effect I can expect on is 4.2%. In 
such situation 2.8% probably not so bad to illustrate that feature works and 
there is space for further improvements. If I had Core i5-750S I would 
expect 33% boost.


Can I recommend the use of ministat(1) and sample sizes of at least 8 runs per 
configuration?


Robert



If you mean absolute difference, here are results or four buildworld runs:
hw.acpi.cpu.cx_lowest=C1: 4654.23 sec
hw.acpi.cpu.cx_lowest=C2: 4556.37 sec
hw.acpi.cpu.cx_lowest=C2: 4570.85 sec
hw.acpi.cpu.cx_lowest=C1: 4679.83 sec
Benefit is about 2.1%. Each time results were erased and sources
pre-cached into RAM. Storage was SSD, so disk should not be an issue.

--
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Intel TurboBoost in practice

2010-07-26 Thread Alexander Motin
Robert Watson wrote:
 On Sun, 25 Jul 2010, Alexander Motin wrote:
 The numbers that you are showing doesn't show much difference. Have
 you tried buildworld?

 If you mean relative difference -- as I have told, it's mostly because
 of my CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is
 enabled most of time if CPU is not overheated. It probably doesn't, as
 it works on clear table under air conditioner. So maximal effect I can
 expect on is 4.2%. In such situation 2.8% probably not so bad to
 illustrate that feature works and there is space for further
 improvements. If I had Core i5-750S I would expect 33% boost.
 
 Can I recommend the use of ministat(1) and sample sizes of at least 8
 runs per configuration?

Thanks for pushing me to do it right. :) Here is 3*15 runs with fresh
kernel with disabled debug. Results are quite close to original: -2.73%
and -2.19% of time.
x C1
+ C2
* C3
+-+
|+*  x|
|+*  x|
|+*  x|
|+*  x|
|+*  x|
|+*  x|
|+*  x|
|+   **  x|
|+ + ** xx|
|+ + ** **  xx   x|
| |__M_A| |
|A|   |
||A|  |
+-+
NMinMax Median   AvgStddev
x  15  12.68  12.84  12.69 12.698667   0.039254966
+  15  12.35  12.36  12.35 12.351333  0.0035186578
Difference at 95.0% confidence
-0.347333 +/- 0.0208409
-2.7352% +/- 0.164119%
(Student's t, pooled s = 0.0278687)
*  15  12.41  12.44  12.42 12.42  0.0075592895
Difference at 95.0% confidence
-0.278667 +/- 0.0211391
-2.19446% +/- 0.166467%
(Student's t, pooled s = 0.0282674)

I also checked one more aspect -- TurboBoost works only when CPU runs at
highest EIST frequency (P0 state). I've reduced dev.cpu.0.freq from 3201
to 3067 and repeated the test:
x C1
+ C2
* C3
+-+
| x   +  *|
| x   +  *|
| x   +  *|
| x   +  *   *|
| x  x+  *   *|
| x  x+  +   *   *|
| x  x+  +   *   *|
| x  x+  +   *   *|
| x  x+   +  +   +   *   *|
||MA| |
|   |_MA_||
|M_A_||
+-+
NMinMax Median   AvgStddev
x  15  13.72  13.73  13.72 13.72  0.0048795004
+  15  13.79  13.82   13.8 13.80  0.0072374686
Difference at 95.0% confidence
0.08 +/- 0.00461567
0.582949% +/- 0.0336337%
(Student's t, pooled s = 0.00617213)
*  15  13.89   13.9  13.8913.894  0.0050709255
Difference at 95.0% confidence
0.170667 +/- 0.00372127
1.24362% +/- 0.0271164%
(Student's t, pooled s = 0.00497613)

In that case using C2 or C3 predictably caused small performance reduce,
as after falling to sleep, CPU needs time to wakeup. Even if tested CPU0
won't ever sleep during test, it's TLB shutdown IPIs to other cores
still probably could suffer from waiting other cores' wakeup.

Obviously in first test these 0.58% and 1.24% were subtracted from the
TurboBoost's maximal benefit of 4.3% on this CPU.

-- 
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Intel TurboBoost in practice

2010-07-24 Thread Alexander Motin
Hi.

I've make small observations of Intel TurboBoost technology under
FreeBSD. This technology allows Intel Core i5/i7 CPUs to rise frequency
of some cores if other cores are idle and power/thermal conditions
permit. CPU core counted as idle, if it has been put into C3 or deeper
power state (may reflect ACPI C2/C3 states). So to reach maximal
effectiveness, some tuning may be needed.

Here is my test case: FreeBSD 9-CURRENT on Core i5 650 CPU, 3.2GHz + 1/2
TurboBoost steps (+133/+266MHz) with boxed cooler at the open air. I was
measuring building time of the net/mpd5 from sources, using only one CPU
core (cpuset -l 0 time make).

Untuned system (hz=1000): 14.15 sec
Enabled ACPI C2 (hz=1000+C2): 13.85 sec
Enabled ACPI C3 (hz=1000+C3): 13.91 sec
Reduced HZ (hz=100):  14.16 sec
Enabled ACPI C2 (hz=100+C2):  13.85 sec
Enabled ACPI C3 (hz=100+C3):  13.86 sec
Timers tuned* (hz=100):   14.10 sec
Enabled ACPI C2 (hz=100+C2):  13.71 sec
Enabled ACPI C3 (hz=100+C3):  13.73 sec

All numbers tested few times and are repeatable up to +/-0.01sec.

*) Timers were tuned to reduce interrupt rates and respectively increase
idle cores sleep time. These lines were added to loader.conf:
sysctl kern.eventtimer.timer1=i8254
sysctl kern.eventtimer.timer2=NONE
kern.eventtimer.singlemul=1
kern.hz=100

PS: In this case benefit is small, but it is the least that can be
achieved, depending on CPU model. Some models allow frequency to be
risen by up to 6 steps (+798MHz).

PPS: I expect even better effect achieved by further reducing interrupt
rates on idle CPUs.

-- 
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Intel TurboBoost in practice

2010-07-24 Thread Alan Cox
2010/7/24 Alexander Motin m...@freebsd.org

 Hi.

 I've make small observations of Intel TurboBoost technology under
 FreeBSD. This technology allows Intel Core i5/i7 CPUs to rise frequency
 of some cores if other cores are idle and power/thermal conditions
 permit. CPU core counted as idle, if it has been put into C3 or deeper
 power state (may reflect ACPI C2/C3 states). So to reach maximal
 effectiveness, some tuning may be needed.


[snip]



 PPS: I expect even better effect achieved by further reducing interrupt
 rates on idle CPUs.


I'm currently testing a patch that eliminates another 31% of the global TLB
shootdowns for a buildworld on an amd64 machine.  So, you can expect
improvement in this area.

Alan
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Intel TurboBoost in practice

2010-07-24 Thread Rui Paulo

On 24 Jul 2010, at 14:53, Alexander Motin wrote:

 Hi.
 
 I've make small observations of Intel TurboBoost technology under
 FreeBSD. This technology allows Intel Core i5/i7 CPUs to rise frequency
 of some cores if other cores are idle and power/thermal conditions
 permit. CPU core counted as idle, if it has been put into C3 or deeper
 power state (may reflect ACPI C2/C3 states). So to reach maximal
 effectiveness, some tuning may be needed.
 
 Here is my test case: FreeBSD 9-CURRENT on Core i5 650 CPU, 3.2GHz + 1/2
 TurboBoost steps (+133/+266MHz) with boxed cooler at the open air. I was
 measuring building time of the net/mpd5 from sources, using only one CPU
 core (cpuset -l 0 time make).
 
 Untuned system (hz=1000): 14.15 sec
 Enabled ACPI C2 (hz=1000+C2): 13.85 sec
 Enabled ACPI C3 (hz=1000+C3): 13.91 sec
 Reduced HZ (hz=100):  14.16 sec
 Enabled ACPI C2 (hz=100+C2):  13.85 sec
 Enabled ACPI C3 (hz=100+C3):  13.86 sec
 Timers tuned* (hz=100):   14.10 sec
 Enabled ACPI C2 (hz=100+C2):  13.71 sec
 Enabled ACPI C3 (hz=100+C3):  13.73 sec
 
 All numbers tested few times and are repeatable up to +/-0.01sec.
 
 *) Timers were tuned to reduce interrupt rates and respectively increase
 idle cores sleep time. These lines were added to loader.conf:
 sysctl kern.eventtimer.timer1=i8254
 sysctl kern.eventtimer.timer2=NONE
 kern.eventtimer.singlemul=1
 kern.hz=100
 
 PS: In this case benefit is small, but it is the least that can be
 achieved, depending on CPU model. Some models allow frequency to be
 risen by up to 6 steps (+798MHz).

The numbers that you are showing doesn't show much difference. Have you tried 
buildworld?


 
 PPS: I expect even better effect achieved by further reducing interrupt
 rates on idle CPUs.
 
 -- 
 Alexander Motin
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
 
 
 Use the link below to report this message as spam.
 https://lavabit.com/apps/teacher?sig=1225540key=3283483970
 

Regards,
--
Rui Paulo


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Intel TurboBoost in practice

2010-07-24 Thread Garrett Cooper
On Sat, Jul 24, 2010 at 9:18 AM, Rui Paulo rpa...@lavabit.com wrote:

 On 24 Jul 2010, at 14:53, Alexander Motin wrote:

 Hi.

 I've make small observations of Intel TurboBoost technology under
 FreeBSD. This technology allows Intel Core i5/i7 CPUs to rise frequency
 of some cores if other cores are idle and power/thermal conditions
 permit. CPU core counted as idle, if it has been put into C3 or deeper
 power state (may reflect ACPI C2/C3 states). So to reach maximal
 effectiveness, some tuning may be needed.

 Here is my test case: FreeBSD 9-CURRENT on Core i5 650 CPU, 3.2GHz + 1/2
 TurboBoost steps (+133/+266MHz) with boxed cooler at the open air. I was
 measuring building time of the net/mpd5 from sources, using only one CPU
 core (cpuset -l 0 time make).

 Untuned system (hz=1000):     14.15 sec
 Enabled ACPI C2 (hz=1000+C2): 13.85 sec
 Enabled ACPI C3 (hz=1000+C3): 13.91 sec
 Reduced HZ (hz=100):          14.16 sec
 Enabled ACPI C2 (hz=100+C2):  13.85 sec
 Enabled ACPI C3 (hz=100+C3):  13.86 sec
 Timers tuned* (hz=100):       14.10 sec
 Enabled ACPI C2 (hz=100+C2):  13.71 sec
 Enabled ACPI C3 (hz=100+C3):  13.73 sec

 All numbers tested few times and are repeatable up to +/-0.01sec.

 *) Timers were tuned to reduce interrupt rates and respectively increase
 idle cores sleep time. These lines were added to loader.conf:
 sysctl kern.eventtimer.timer1=i8254
 sysctl kern.eventtimer.timer2=NONE
 kern.eventtimer.singlemul=1
 kern.hz=100

 PS: In this case benefit is small, but it is the least that can be
 achieved, depending on CPU model. Some models allow frequency to be
 risen by up to 6 steps (+798MHz).

 The numbers that you are showing doesn't show much difference. Have you tried 
 buildworld?

Agreed. The numbers are small enough that there could be a large
degree of variation just based on environmental factors alone; there
are other things that go into that as well, such as disk I/O, etc,
that probably shouldn't be factored into a CPU performance test.

Thanks,
-Garrett
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Intel TurboBoost in practice

2010-07-24 Thread Norikatsu Shigemura
Hi mav.

On Sat, 24 Jul 2010 16:53:10 +0300
Alexander Motin m...@freebsd.org wrote:
 PS: In this case benefit is small, but it is the least that can be
 achieved, depending on CPU model. Some models allow frequency to be
 risen by up to 6 steps (+798MHz).

I tested on Core i7 640UM (Arrandale 1.2GHz - 2.26GHz) with
openssl speed (w/o aesni(4)) and
/usr/src/tools/tools/crypto/cryptotest.c (w/ aesni(4)).

http://people.freebsd.org/~nork/aesni/aes128cbc-noaesni.pdf [1]
http://people.freebsd.org/~nork/aesni/aes128cbc-aesni.pdf [2]

[1] $ /usr/bin/cpuset -l$i /usr/bin/openssl speed -elapsed -mr -multi $n 
aes128-cbc
 $i = 0 1 2 3 0,1 0,2 0,3 1,2 1,3 2,3 0,1,2 0,1,3 0,2,3 1,2,3 0,1,2,3
 $n = numbers of core, $((`echo $i | wc -c`/2))

[2] $ /usr/bin/cpuset -l$i ./cryptotest -t $n -z 5 8192
 $i = 0 1 2 3 0,1 0,2 0,3 1,2 1,3 2,3 0,1,2 0,1,3 0,2,3 1,2,3 0,1,2,3
 $n = numbers of core, $((`echo $i | wc -c`/2))

In my environment, according to aes128cbc-noaesni.pdf, at least,
30% performace up by Turbo Boost (I think).

And according to aes128cbc-aesni.pdf, at least, 100% performance
up by Turbo Boost (I think).

And I understand reducing single thread performance by Hyper
Threading:-).

-- 
Norikatsu Shigemura n...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Intel TurboBoost in practice

2010-07-24 Thread Alexander Motin
Norikatsu Shigemura wrote:
 On Sat, 24 Jul 2010 16:53:10 +0300
 Alexander Motin m...@freebsd.org wrote:
 PS: In this case benefit is small, but it is the least that can be
 achieved, depending on CPU model. Some models allow frequency to be
 risen by up to 6 steps (+798MHz).
 
   I tested on Core i7 640UM (Arrandale 1.2GHz - 2.26GHz) with
   openssl speed (w/o aesni(4)) and
   /usr/src/tools/tools/crypto/cryptotest.c (w/ aesni(4)).
 
   http://people.freebsd.org/~nork/aesni/aes128cbc-noaesni.pdf [1]
   http://people.freebsd.org/~nork/aesni/aes128cbc-aesni.pdf [2]
 
   In my environment, according to aes128cbc-noaesni.pdf, at least,
   30% performace up by Turbo Boost (I think).

The numbers are interesting, though they are not proving much, because
of many other factors may influence on result. It would be more
informative to do the tests with C1 and C2/C3 states used.

   And according to aes128cbc-aesni.pdf, at least, 100% performance
   up by Turbo Boost (I think).

This IMHO is even more questionable. Single, even boosted core shouldn't
be faster then 2, 3 and 4. I would say there is some scalability
problem. May be context switches, locking, or something else.

-- 
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Intel TurboBoost in practice

2010-07-24 Thread Alexander Motin
Rui Paulo wrote:
 On 24 Jul 2010, at 14:53, Alexander Motin wrote:
 Here is my test case: FreeBSD 9-CURRENT on Core i5 650 CPU, 3.2GHz + 1/2
 TurboBoost steps (+133/+266MHz) with boxed cooler at the open air. I was
 measuring building time of the net/mpd5 from sources, using only one CPU
 core (cpuset -l 0 time make).

 Untuned system (hz=1000): 14.15 sec
 Enabled ACPI C2 (hz=1000+C2): 13.85 sec
 Enabled ACPI C3 (hz=1000+C3): 13.91 sec
 Reduced HZ (hz=100):  14.16 sec
 Enabled ACPI C2 (hz=100+C2):  13.85 sec
 Enabled ACPI C3 (hz=100+C3):  13.86 sec
 Timers tuned* (hz=100):   14.10 sec
 Enabled ACPI C2 (hz=100+C2):  13.71 sec
 Enabled ACPI C3 (hz=100+C3):  13.73 sec

 All numbers tested few times and are repeatable up to +/-0.01sec.

 PS: In this case benefit is small, but it is the least that can be
 achieved, depending on CPU model. Some models allow frequency to be
 risen by up to 6 steps (+798MHz).
 
 The numbers that you are showing doesn't show much difference. Have you tried 
 buildworld?

If you mean relative difference -- as I have told, it's mostly because
of my CPU. It's maximal boost is 266MHz (8.3%), but 133MHz of them is
enabled most of time if CPU is not overheated. It probably doesn't, as
it works on clear table under air conditioner. So maximal effect I can
expect on is 4.2%. In such situation 2.8% probably not so bad to
illustrate that feature works and there is space for further
improvements. If I had Core i5-750S I would expect 33% boost.

If you mean absolute difference, here are results or four buildworld runs:
hw.acpi.cpu.cx_lowest=C1: 4654.23 sec
hw.acpi.cpu.cx_lowest=C2: 4556.37 sec
hw.acpi.cpu.cx_lowest=C2: 4570.85 sec
hw.acpi.cpu.cx_lowest=C1: 4679.83 sec
Benefit is about 2.1%. Each time results were erased and sources
pre-cached into RAM. Storage was SSD, so disk should not be an issue.

-- 
Alexander Motin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org