Hi Paul,

Thanks for the valuable feedback.  I suspect you're correct about the
serialization in some capacity, but the actual cause is eluding me.

Basically, every time a registered user checks a page, the site has to
authenticate them (with a query against a table with > 200,000 records).  It
doesn't update this table, however - it updates another table with "user
stats" information (last click, last ip, etc).

>From what I've seen, there doesn't seem to be any serious locking issues.
It does make sense when a number of users whose information isn't in cache,
it could take a bit longer - but AFAIK this shouldn't prevent other
simultaneous queries.  What else could cause such serialization?

If I look at open locks (this is a view, info from pg tables):

       relname        |       mode       | numlocks
----------------------+------------------+----------
 users                | AccessShareLock  |        4
 userstats            | AccessShareLock  |        4
 pg_statistic         | AccessShareLock  |        2
 users_ix_id          | AccessShareLock  |        2
 countries            | AccessShareLock  |        2
 comments             | AccessShareLock  |        2
 countries_ix_id      | AccessShareLock  |        2
 userstats_ix_id      | AccessShareLock  |        2
 comments_ix_parentid | AccessShareLock  |        2
 users                | RowExclusiveLock |        1
 filequeue_ix_id      | AccessShareLock  |        1
 pg_class             | AccessShareLock  |        1
 vopenlocks           | AccessShareLock  |        1
 pg_locks             | AccessShareLock  |        1
 userstats            | RowExclusiveLock |        1
 filequeue            | AccessShareLock  |        1
 pg_class_oid_index   | AccessShareLock  |        1

Also of note, executing a random "in the blue" query on our "users" table
returns results very fast.  While there's no doubt that caching may help,
returning a row that is definitely not cached is very fast: < 0.05 sec.

Top tells me that the system isn't using much memory - almost always under
100MB (of the 2GB we have).  Is there a way to increase the amount of
physical RAM that PG uses?  It seems there's a lot of room there.

Postgresql.conf has:

shared_buffers = 16384
sort_mem = 8192
vacuum_mem = 8192

Also, would queries becoming serialized effect connection delays?  I think
there's still something else at large here...

I've attached a vmstat output, while running dd.  The RAID array is tw0.  It
does show the tw0 device getting significantly more work, numbers not seen
during normal operation.

Thanks,

Jason Coene
Gotfrag eSports
585-598-6621 Phone
585-598-6633 Fax
[EMAIL PROTECTED]
http://www.gotfrag.com


-----Original Message-----
From: Paul Tuckfield [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, May 11, 2004 7:50 PM
To: Jason Coene
Subject: Re: [PERFORM] Intermittent slowdowns, connection delays

The things you point out suggest a heavy dependence on good cache 
performance
(typical of OLTP mind you)  Do not be fooled if a query runs in 2 
seconds then the second
run takes < .01 secons:  the first run put it in cache the second got 
all cache hits :)

But beyond that,  in an OLTP system, and typical website backing 
database, "cache is king".
And serialization is the devil

So look for reasons why your cache performance might deteriorate during 
peak, (like large historical tables
that users pull up dozens of scattered rows from, flooding cache)  or 
why you may be
serializing somewhere inside postgres (ex. if every page hit re-logs 
in, then theres probably serialization
trying to spawn what must be 40 processes/sec assuming your 11hit/sec 
avg peaks at about 40/sec)

Also:
I am really surprised you see zero IO in the vmstat you sent, but I'm 
unfamiliar with BSD version of vmstat.
AFAIR,  Solaris shows cached filesystem reads as "page faults" which is 
rather confusing.  Since you have 1500 page
faults per second, yet no paging (bi bo) does thins mean the 1500 page 
faults are filesystem IO that pg is doing?
do an objective test on an idle system by dd'ing a large file in and 
watching what vmstat does.





On May 11, 2004, at 3:10 PM, Jason Coene wrote:

> Hi All,
>
> We have a Postgres 7.4.1 server running on FreeBSD 5.2.  Hardware is a 
> Dual
> Xeon 2.6 (HT enabled), 2 GB Memory, 3Ware SATA RAID-5 w/ 4 7200 RPM 
> Seagate
> disks and gigabit Intel Server Ethernet.  The server is dedicated to 
> serving
> data to our web-based CMS.
>
> We have a few web servers load balanced, and we do around 1M page
> impressions per day.  Our website is highly personalized, and we've
> optimized it to limit the number of queries, but we still see between 
> 2 and
> 3 SELECT's (with JOIN's) and 1 UPDATE per page load, selectively more 
> - a
> fair volume.
>
> The single UPDATE per page load is updating a timestamp in a small 
> table
> (about 150,000 rows) with only 1 index (on the 1 field that needs to be
> matched).
>
> We're seeing some intermittent spikes in query time as actual 
> connection
> time.  I.e., during these seemingly random spikes, our debug output 
> looks
> like this (times from start of HTTP request):
>
> SQL CONNECTION CREATING 'gf'
> 0.0015 - ESTABLISHING CONNECTION
> 1.7113 - CONNECTION OK
> SQL QUERY ID 1 COST 0.8155 ROWS 1
> SQL QUERY ID 2 COST 0.5607 ROWS 14
> .. etc.. (all queries taking more time than normal, see below)
>
> Refresh the page 2 seconds later, and we'll get:
>
> SQL CONNECTION CREATING 'gf'
> 0.0017 - ESTABLISHING CONNECTION
> 0.0086 - CONNECTION OK
> SQL QUERY ID 1 COST 0.0128 ROWS 1
> SQL QUERY ID 2 COST 0.0033 ROWS 14
> .. etc.. (with same queries)
>
> Indeed, during these types, it takes a moment for "psql" to connect on 
> the
> command line (from the same machine using a local file socket), so 
> it's not
> a network issue or a web-server issue.  During these spurts, there's 
> nothing
> too out of the ordinary in vmstat, systat or top.
>
> These programs show that we're not using much CPU (usually 60-80% 
> idle), and
> disks usage is virtually nil.  I've attached 60 seconds of "vmstat 5".
> Memory usage looks like this (constantly):
>
> Mem: 110M Active, 1470M Inact, 206M Wired, 61M Cache, 112M Buf, 26M 
> Free
>
> I've cleaned up and tested query after query, and nothing is a "hog".  
> On an
> idle server, every query will execute in < 0.05 sec.  Perhaps some of 
> you
> veterans have ideas?
>
> Thanks,
>
> Jason Coene
> Gotfrag eSports
> 585-598-6621 Phone
> 585-598-6633 Fax
> [EMAIL PROTECTED]
> http://www.gotfrag.com
>
>
> <vmstat51min.txt>
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org

(dd if=/dev/zero of=X and vmstat 1)

 procs      memory      page                    disks     faults      cpu
 r b w     avm    fre  flt  re  pi  po  fr  sr tw0 fd0   in   sy  cs us sy id
 1 0 0  245496 294952  652   0   0   0 322   4   0   0  584    0 481  3  5 92
 0 0 1  245772 294892 10681   0   0   0 2380   0   0   0  780    0 1634  4  4 92
 0 0 0  248404 293132 6783   0   0   0 1220   0   4   0  641    0 2005  3  4 93
 0 0 0  243740 296180   89   0   0   0 780   0   0   0  363    0 1038  0  1 99
 0 0 0  247576 293808 4253   0   0   1 897   0   1   0  568    0 1082  2  2 97
 0 0 0  243740 296172 1142   0   0   0 1009   0   0   0  408    0 681  1  1 99
 0 0 0  246072 294572 7128   0   0   0 1495   0   2   0  981    0 2501  4  3 93
 4 0 0  248988 278004 3977   0   0   0 1292   0 117   0  672    0 6082  2 11 87        
 <-begin dd
 0 0 4  253156 254952 19216   0   0   0 381   0 161   0  657    0 26432  4 22 74
 8 1 0  256124 239764 23349   0   0   0 827   0 115   0  621    0 32020  5 22 73
 0 6 4  263236 212068 9642   0   0   0 579   0 174   0  745    0 28661  4 25 71
 0 0 13  269392 193476 9983   0   0   0 278   0 137   0  685    0 36589  5 24 70
 1 6 1  260652 181072 12043   0   0   0 2517   0 130   0  683    0 33646  3 25 72
 0 2 7  265188 162412 7449   0   0   0  39   0 127   0  615    0 38233  6 20 73
 4 6 0  266368 144724 8641   0   0   0 822   0 133   0  624    0 35629  5 23 72
 0 8 3  268884 125808 4609   0   0   0 703   0 138   0  659    0 38966  3 24 73
 7 6 3  271564 111468 19948   0   0   0 1106   0  97   0  589    0 39840  4 25 71
 0 19 3  283872  87824 14926   0   0   0  61   0 105   0  736    0 48359  4 25 71
 0 12 10  291640 109440 7975  12   0   0 3469 11675 147   0  794    0 33581  6 25 69
 8 14 0  292732  89640 11033   6   0   0 6700   0 149   0  782    0 28399  6 24 70
 4 10 0  293364  71488 13177   0   0   0 8696   0 154   0 2623    0 29119  9 22 69
 1 5 0  271656  67016 8240   0   0   0 6490   0 170   0 4574    0 30951  5 25 70
 0 0 3  262840  99200 15399   0   0   0 6278 10814 145   0  879    0 21206  5 22 73
 0 4 0  261216  95812 12921   0   0   0 1830   0 147   0  595    0 13402  3  9 87
 1 6 0  262204  95228 1843   0   0   0 863   0 119   0  580    0 1266  1  1 98
 0 1 4  265508  82648 15552   0   0   0 4328   0 132   0  736    0 22650  3 18 78
 0 0 6  259920  76580 19770   0   0   0 2761   0  90   0  514    0 37724  3 24 72
 7 0 4  261600  67540 10702   0   0   0 2718   0  62   0  536    0 53733  4 24 71
 0 8 6  269752  98244 7281   0   0   0 5609 13252 139   0  614    0 40253  3 27 70     
 <- end dd
 6 10 0  274424  73056 6511   0   0   0 6758   0 161   0  673    0 33534  5 24 71
 0 8 5  283500 111436 20968   0   0   0 3808 12137  36   0  848    0 42361  7 22 70
 0 10 6  287016 457036 16186   0   0   0 89317   0  11   0  736    0 55810 13 29 58
 4 15 0  293108 451540 18923   0   0   0 1156   0   8   0  725    0 53572 13 22 64
 0 6 4  279592 460536 22728   0   0   0 4110   0   6   0  782    0 26818 15 15 69
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Reply via email to