[HACKERS] Timing overhead and Linux clock sources

Greg Smith Tue, 06 Dec 2011 18:58:46 -0800

Over in the "add timing of buffer I/O requests" thread I mentionedhaving a system where EXPLAIN ANALYZE of a modest COUNT(*) takes 10X aslong as just executing the statement. Attached is a bit of SQL and ascript that runs it multiple times that demonstrate the problem onsystems that have it. It assumes you've installed the Dell Store 2database. I'd expect it to work with simpler example too (just createsomething with 20,000 rows), but I haven't actually confirmed that yet.


From what I've been able to measure so far, here's the state of things:

-If you have a system with a working TSC clock source (timing data ispulled right from the CPU), timing overhead is reasonable enough thatyou might turn it on even for things that happen frequently, such as thebuffer I/O timing patch enables.

-Some systems have tsc, but it doesn't work right in multi-core setups.Newer kernel versions know this and disable it, older ones let you picktsc anyway and bad things occur.

-Using any of the other timing mechanism--hpt, acpi_pm, and jiffies--hasextremely high timing overhead.

This doesn't conflict with the comment we got from Martijn vanOosterhout in the other thread: "Learned the hard way while trackingclock-skew on a multicore system. The hpet may not be the fastest (thatwould be the cpu timer), but it's the fastest (IME) that givesguaranteed monotonic time." There's more background about TSC, HPET,and an interesting caveat about VMs I came across:


http://en.wikipedia.org/wiki/Time_Stamp_Counter
http://en.wikipedia.org/wiki/High_Precision_Event_Timer
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1007020

I'm starting to envision what a new documentation section discussingtimer overhead would look like now, for Linux at least. And I'mguessing other people can simulate this problem even on hardware thatdoesn't have it in the default config, simply by manually switching toone of the slower timing methods at boot.


The test case is just doing this:

\timing
select count(*) from customers;
explain analyze select count(*) from customers;

Lots of times, with some patterns to try and avoid caching effects.

The system with the worst clock timing issues I've found, the one thatwas so obvious it kicked off this investigation, is my Thinkpad T60laptop. The overhead is worst on the old 32-bit Ubuntu 9.04installation I have there. Here's some info about the system, startingwith the 4 clock sources it supports:


$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
hpet acpi_pm jiffies tsc
$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hpet
$ uname -a

Linux meddle 2.6.28-19-generic #61-Ubuntu SMP Wed May 26 23:35:15 UTC2010 i686 GNU/Linux

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=9.04
DISTRIB_CODENAME=jaunty
DISTRIB_DESCRIPTION="Ubuntu 9.04"
$ psql -c "select version()"

PostgreSQL 8.3.11 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real(Ubuntu 4.3.3-5ubuntu4) 4.3.3

You change the time source by adding a stanza like this to the kernelboot options:


clocksource=acpi_pm

Martijn said you can just cat a new clock source into the settinginstead, I haven't evaluated that yet. Wanted to get my baselines donein a situation where the kernel boots using the one I wanted, whichturns out to be useful for a number of reasons. Boot time is one of thethings that varied a lot when the timer was switched.

The hpet time source is the default on my T60 system, but there are 3others available. Of those, it appears that only acpi_pm is a viablealternative. I gave up on the system booting after 15 minutes whenusing jiffies. I could boot with tsc, but everything was erratic. Icould barely use the mouse for example. tsc did have a much fasterclock as measured by EXPLAIN overhead though. Here are the results, tworuns of the test program that I've de-interleaved so it's a block offast (count only) runs then a block of slow (with EXPLAIN ANALYZE):


Time Source:hpet

Time: 14.783 ms
Time: 9.021 ms
Time: 9.170 ms
Time: 8.634 ms
Time: 5.632 ms
Time: 5.520 ms
Time: 15.276 ms
Time: 9.477 ms
Time: 9.485 ms

Time: 70.720 ms
Time: 67.577 ms
Time: 57.959 ms
Time: 74.661 ms
Time: 73.557 ms
Time: 74.963 ms
Time: 87.898 ms
Time: 82.125 ms
Time: 81.443 ms

Time Source:acpi_pm

Time: 8.124 ms
Time: 5.020 ms
Time: 4.887 ms
Time: 8.125 ms
Time: 5.029 ms
Time: 4.881 ms

Time: 82.986 ms
Time: 82.366 ms
Time: 82.609 ms
Time: 83.089 ms
Time: 82.438 ms
Time: 82.539 ms

Time Source:tsc
Time: 8.371 ms
Time: 4.673 ms
Time: 4.901 ms
Time: 8.409 ms
Time: 4.943 ms
Time: 4.722 ms

Time: 16.436 ms
Time: 16.349 ms
Time: 16.139 ms
Time: 19.871 ms
Time: 17.175 ms
Time: 16.540 ms

There's the 10:1 possible slowdown I was talking about when using thedefault hpet clock, while the faster (but fundamentally unstable) tscclock does better, only around a 4:1 slowdown there.

First question I considered: is it the operating system or kernelcausing this, rather than the hardware? I have Debian Squeeze running a64-bit 2.6.32 kernel installed on here too. It only has two timesources available, with hpet being the default:


$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
hpet acpi_pm
$ uname -a

Linux meddle 2.6.32-5-amd64 #1 SMP Thu Nov 3 03:41:26 UTC 2011 x86_64GNU/Linux

$ psql -c "select version()"
                                               version
------------------------------------------------------------------------------------------------------

PostgreSQL 8.4.9 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real(Debian 4.4.5-8) 4.4.5, 64-bit

It's not coincidence that these are the only two clock sources thatactually worked in the older kernel. jiffies are old stuff that's beenon the deprecation path for a while, and it's eliminating TSC with thismessage during boot:


[    0.168139] checking TSC synchronization [CPU#0 -> CPU#1]:

[ 0.172001] Measured 617820 cycles TSC warp between CPUs, turning offTSC clock.

[    0.172001] Marking TSC unstable due to check_tsc_sync_source failed

So that there can be problems with TSC on some hardware is well knownenough that the newer kernel tests, then rules it out as unusable whenit fails that test.


Here's the same timing results on this kernel:

Time Source:hpet
Time: 5.701 ms
Time: 3.581 ms
Time: 3.560 ms
Time: 5.696 ms
Time: 3.587 ms
Time: 3.342 ms

Time: 43.492 ms
Time: 43.227 ms
Time: 43.319 ms
Time: 43.503 ms
Time: 43.332 ms
Time: 43.244 ms

Time Source:acpi_pm
Time: 6.350 ms
Time: 3.792 ms
Time: 3.490 ms
Time: 5.806 ms
Time: 3.672 ms
Time: 3.538 ms

Time: 80.093 ms
Time: 79.875 ms
Time: 79.789 ms
Time: 80.033 ms
Time: 80.330 ms
Time: 80.062 ms

That's much better in every way on hpet. But the time ratio hasn'treally improved; it's still at least 10:1. Unless there's some blackermagic in an even newer kernel, it seems nothing I can do to this laptopwill make it run gettimeofday() quickly.

Let's move onto something that's actually a usable server. My maindesktop has an Intel i7-860. It has the same 64-bit Debian Squeezesetup as the second set of T60 results I just showed. The only thingdifferent about the two systems is the hardware, software is as close toidentical as I could make it. (That's for my own general sanity, notsomething special I did for this test--all my systems have that sameDebian package set)


But there are different clock options available:

$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm
$ uname -a

Linux grace 2.6.32-5-amd64 #1 SMP Thu Nov 3 03:41:26 UTC 2011 x86_64GNU/Linux

$ psql -c "select version()"

PostgreSQL 8.4.9 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.4.real(Debian 4.4.5-8) 4.4.5, 64-bit

With tsc being the default clock source here. A passing test of itlooks like this:


Here's some related dmesg output:

[    0.000000] Fast TSC calibration using PIT
...
    0.244640] checking TSC synchronization [CPU#0 -> CPU#1]: passed.
...
[    0.891992] checking TSC synchronization [CPU#0 -> CPU#7]: passed.
[    0.911993] Brought up 8 CPUs

And here's how it performs:

Time Source:tsc
Time: 2.915 ms
Time: 1.531 ms
Time: 1.500 ms
Time: 2.926 ms
Time: 1.518 ms
Time: 1.478 ms

Time: 2.945 ms
Time: 2.915 ms
Time: 2.882 ms
Time: 2.920 ms
Time: 2.862 ms
Time: 2.856 ms

That's more like it.

Next up I plan to survey a few AMD systems and a newer laptop, but I'mout of time for this tonight.


--
Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

#!/bin/bash
echo -n Time Source:
cat /sys/devices/system/clocksource/clocksource0/current_clocksource 
psql -d dellstore2 -e -f explain-overhead.sql | grep Time

\timing

-- Count Time
select count(*) from customers;
select count(*) from customers;
select count(*) from customers;

-- EXPLAIN ANALYZE Count Time
explain analyze select count(*) from customers;
explain analyze select count(*) from customers;
explain analyze select count(*) from customers;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Timing overhead and Linux clock sources

Reply via email to