Hi Paul, Thanks for the valuable feedback. I suspect you're correct about the serialization in some capacity, but the actual cause is eluding me.
Basically, every time a registered user checks a page, the site has to authenticate them (with a query against a table with > 200,000 records). It doesn't update this table, however - it updates another table with "user stats" information (last click, last ip, etc). >From what I've seen, there doesn't seem to be any serious locking issues. It does make sense when a number of users whose information isn't in cache, it could take a bit longer - but AFAIK this shouldn't prevent other simultaneous queries. What else could cause such serialization? If I look at open locks (this is a view, info from pg tables): relname | mode | numlocks ----------------------+------------------+---------- users | AccessShareLock | 4 userstats | AccessShareLock | 4 pg_statistic | AccessShareLock | 2 users_ix_id | AccessShareLock | 2 countries | AccessShareLock | 2 comments | AccessShareLock | 2 countries_ix_id | AccessShareLock | 2 userstats_ix_id | AccessShareLock | 2 comments_ix_parentid | AccessShareLock | 2 users | RowExclusiveLock | 1 filequeue_ix_id | AccessShareLock | 1 pg_class | AccessShareLock | 1 vopenlocks | AccessShareLock | 1 pg_locks | AccessShareLock | 1 userstats | RowExclusiveLock | 1 filequeue | AccessShareLock | 1 pg_class_oid_index | AccessShareLock | 1 Also of note, executing a random "in the blue" query on our "users" table returns results very fast. While there's no doubt that caching may help, returning a row that is definitely not cached is very fast: < 0.05 sec. Top tells me that the system isn't using much memory - almost always under 100MB (of the 2GB we have). Is there a way to increase the amount of physical RAM that PG uses? It seems there's a lot of room there. Postgresql.conf has: shared_buffers = 16384 sort_mem = 8192 vacuum_mem = 8192 Also, would queries becoming serialized effect connection delays? I think there's still something else at large here... I've attached a vmstat output, while running dd. The RAID array is tw0. It does show the tw0 device getting significantly more work, numbers not seen during normal operation. Thanks, Jason Coene Gotfrag eSports 585-598-6621 Phone 585-598-6633 Fax [EMAIL PROTECTED] http://www.gotfrag.com -----Original Message----- From: Paul Tuckfield [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 11, 2004 7:50 PM To: Jason Coene Subject: Re: [PERFORM] Intermittent slowdowns, connection delays The things you point out suggest a heavy dependence on good cache performance (typical of OLTP mind you) Do not be fooled if a query runs in 2 seconds then the second run takes < .01 secons: the first run put it in cache the second got all cache hits :) But beyond that, in an OLTP system, and typical website backing database, "cache is king". And serialization is the devil So look for reasons why your cache performance might deteriorate during peak, (like large historical tables that users pull up dozens of scattered rows from, flooding cache) or why you may be serializing somewhere inside postgres (ex. if every page hit re-logs in, then theres probably serialization trying to spawn what must be 40 processes/sec assuming your 11hit/sec avg peaks at about 40/sec) Also: I am really surprised you see zero IO in the vmstat you sent, but I'm unfamiliar with BSD version of vmstat. AFAIR, Solaris shows cached filesystem reads as "page faults" which is rather confusing. Since you have 1500 page faults per second, yet no paging (bi bo) does thins mean the 1500 page faults are filesystem IO that pg is doing? do an objective test on an idle system by dd'ing a large file in and watching what vmstat does. On May 11, 2004, at 3:10 PM, Jason Coene wrote: > Hi All, > > We have a Postgres 7.4.1 server running on FreeBSD 5.2. Hardware is a > Dual > Xeon 2.6 (HT enabled), 2 GB Memory, 3Ware SATA RAID-5 w/ 4 7200 RPM > Seagate > disks and gigabit Intel Server Ethernet. The server is dedicated to > serving > data to our web-based CMS. > > We have a few web servers load balanced, and we do around 1M page > impressions per day. Our website is highly personalized, and we've > optimized it to limit the number of queries, but we still see between > 2 and > 3 SELECT's (with JOIN's) and 1 UPDATE per page load, selectively more > - a > fair volume. > > The single UPDATE per page load is updating a timestamp in a small > table > (about 150,000 rows) with only 1 index (on the 1 field that needs to be > matched). > > We're seeing some intermittent spikes in query time as actual > connection > time. I.e., during these seemingly random spikes, our debug output > looks > like this (times from start of HTTP request): > > SQL CONNECTION CREATING 'gf' > 0.0015 - ESTABLISHING CONNECTION > 1.7113 - CONNECTION OK > SQL QUERY ID 1 COST 0.8155 ROWS 1 > SQL QUERY ID 2 COST 0.5607 ROWS 14 > .. etc.. (all queries taking more time than normal, see below) > > Refresh the page 2 seconds later, and we'll get: > > SQL CONNECTION CREATING 'gf' > 0.0017 - ESTABLISHING CONNECTION > 0.0086 - CONNECTION OK > SQL QUERY ID 1 COST 0.0128 ROWS 1 > SQL QUERY ID 2 COST 0.0033 ROWS 14 > .. etc.. (with same queries) > > Indeed, during these types, it takes a moment for "psql" to connect on > the > command line (from the same machine using a local file socket), so > it's not > a network issue or a web-server issue. During these spurts, there's > nothing > too out of the ordinary in vmstat, systat or top. > > These programs show that we're not using much CPU (usually 60-80% > idle), and > disks usage is virtually nil. I've attached 60 seconds of "vmstat 5". > Memory usage looks like this (constantly): > > Mem: 110M Active, 1470M Inact, 206M Wired, 61M Cache, 112M Buf, 26M > Free > > I've cleaned up and tested query after query, and nothing is a "hog". > On an > idle server, every query will execute in < 0.05 sec. Perhaps some of > you > veterans have ideas? > > Thanks, > > Jason Coene > Gotfrag eSports > 585-598-6621 Phone > 585-598-6633 Fax > [EMAIL PROTECTED] > http://www.gotfrag.com > > > <vmstat51min.txt> > ---------------------------(end of > broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org
(dd if=/dev/zero of=X and vmstat 1) procs memory page disks faults cpu r b w avm fre flt re pi po fr sr tw0 fd0 in sy cs us sy id 1 0 0 245496 294952 652 0 0 0 322 4 0 0 584 0 481 3 5 92 0 0 1 245772 294892 10681 0 0 0 2380 0 0 0 780 0 1634 4 4 92 0 0 0 248404 293132 6783 0 0 0 1220 0 4 0 641 0 2005 3 4 93 0 0 0 243740 296180 89 0 0 0 780 0 0 0 363 0 1038 0 1 99 0 0 0 247576 293808 4253 0 0 1 897 0 1 0 568 0 1082 2 2 97 0 0 0 243740 296172 1142 0 0 0 1009 0 0 0 408 0 681 1 1 99 0 0 0 246072 294572 7128 0 0 0 1495 0 2 0 981 0 2501 4 3 93 4 0 0 248988 278004 3977 0 0 0 1292 0 117 0 672 0 6082 2 11 87 <-begin dd 0 0 4 253156 254952 19216 0 0 0 381 0 161 0 657 0 26432 4 22 74 8 1 0 256124 239764 23349 0 0 0 827 0 115 0 621 0 32020 5 22 73 0 6 4 263236 212068 9642 0 0 0 579 0 174 0 745 0 28661 4 25 71 0 0 13 269392 193476 9983 0 0 0 278 0 137 0 685 0 36589 5 24 70 1 6 1 260652 181072 12043 0 0 0 2517 0 130 0 683 0 33646 3 25 72 0 2 7 265188 162412 7449 0 0 0 39 0 127 0 615 0 38233 6 20 73 4 6 0 266368 144724 8641 0 0 0 822 0 133 0 624 0 35629 5 23 72 0 8 3 268884 125808 4609 0 0 0 703 0 138 0 659 0 38966 3 24 73 7 6 3 271564 111468 19948 0 0 0 1106 0 97 0 589 0 39840 4 25 71 0 19 3 283872 87824 14926 0 0 0 61 0 105 0 736 0 48359 4 25 71 0 12 10 291640 109440 7975 12 0 0 3469 11675 147 0 794 0 33581 6 25 69 8 14 0 292732 89640 11033 6 0 0 6700 0 149 0 782 0 28399 6 24 70 4 10 0 293364 71488 13177 0 0 0 8696 0 154 0 2623 0 29119 9 22 69 1 5 0 271656 67016 8240 0 0 0 6490 0 170 0 4574 0 30951 5 25 70 0 0 3 262840 99200 15399 0 0 0 6278 10814 145 0 879 0 21206 5 22 73 0 4 0 261216 95812 12921 0 0 0 1830 0 147 0 595 0 13402 3 9 87 1 6 0 262204 95228 1843 0 0 0 863 0 119 0 580 0 1266 1 1 98 0 1 4 265508 82648 15552 0 0 0 4328 0 132 0 736 0 22650 3 18 78 0 0 6 259920 76580 19770 0 0 0 2761 0 90 0 514 0 37724 3 24 72 7 0 4 261600 67540 10702 0 0 0 2718 0 62 0 536 0 53733 4 24 71 0 8 6 269752 98244 7281 0 0 0 5609 13252 139 0 614 0 40253 3 27 70 <- end dd 6 10 0 274424 73056 6511 0 0 0 6758 0 161 0 673 0 33534 5 24 71 0 8 5 283500 111436 20968 0 0 0 3808 12137 36 0 848 0 42361 7 22 70 0 10 6 287016 457036 16186 0 0 0 89317 0 11 0 736 0 55810 13 29 58 4 15 0 293108 451540 18923 0 0 0 1156 0 8 0 725 0 53572 13 22 64 0 6 4 279592 460536 22728 0 0 0 4110 0 6 0 782 0 26818 15 15 69
---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match