Hi All, It does sound like we should be pooling connections somehow. I'll be looking at implementing that shortly. I'd really like to understand what the actual problem is, though.
Sorry, I meant 30,000 with 300 connections - not 3,000. The 300 connections / second is realistic, if not underestimated. As is the nature of our site (realtime information about online gaming), there's a huge fan base and as a big upset happens, we'll do 50,000 page views in a span of 3-5 minutes. I get the same results with: ab -n 10000 -c 150 http://www.gotfrag.com/portal/news/ I've attached results from the above test, showing open locks, top output, and vmstat 5. Tom, I've run the test described in: http://archives.postgresql.org/pgsql-performance/2004-04/msg00280.php Results attached in mptest.txt. The box did experience the same problems as we've seen before. I ran it under a separate database (test), and it still caused our other queries to slow significantly from our production database (gf) - semwait again. It does look like the "cs" column under CPU (which I'd assume is Context Swap) does bump up significantly (10x or better) during both my ab test, and the test you suggested in that archived message. Reading the first thread you pointed out (2004-04/msg00249.php), Josh Berkus was questioning the ServerWorks chipsets. We're running on the Intel E7501 Chipset (MSI board). Our CPU's are 2.66 GHz with 533MHz FSB, Hyperthreading enabled. Unfortunately, I don't have physical access to the machine to turn HT off. Thanks, Jason > -----Original Message----- > From: Gaetano Mendola [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 23, 2004 1:41 PM > To: Jason Coene > Subject: Re: Caching of Queries > > Jason Coene wrote: > > Hi Tom, > > > > Easily recreated with Apache benchmark, "ab -n 30000 -c 3000 > > http://webserver ". This runs 1 query per page, everything else is > cached > > on webserver. > > That test require 30000 access with 3000 connections that is not a normal > load. Describe us your HW. > > 3000 connections means a very huge load, may you provide also the result > of > "vmstat 5" my webserver trash already with -c 120 ! > > how many connection your postgres can manage ? > > You have to consider to use a connection pool with that ammount of > connections. > > > Regards > Gaetano Mendola
gf=# SELECT r.relname, l."mode", count(*) AS numlocks FROM pg_locks l, pg_class r WHERE r.oid = l.relation GROUP BY r.relname, l."mode" ORDER BY count(*) DESC; relname | mode | numlocks ----------------------------------+-----------------+---------- threads | AccessShareLock | 63 threads_ix_nuked_lastpost | AccessShareLock | 47 threads_ix_nuked_gameid_lastpost | AccessShareLock | 7 pg_class | AccessShareLock | 5 pg_opclass_am_name_nsp_index | AccessShareLock | 3 pg_opclass | AccessShareLock | 3 pg_class_oid_index | AccessShareLock | 3 pg_type | AccessShareLock | 2 pg_statistic | AccessShareLock | 2 pg_attribute | AccessShareLock | 2 pg_amop_opc_strategy_index | AccessShareLock | 2 pg_attrdef | AccessShareLock | 2 pg_trigger_tgrelid_tgname_index | AccessShareLock | 2 pg_trigger | AccessShareLock | 2 users | AccessShareLock | 2 pg_statistic_relid_att_index | AccessShareLock | 2 pg_type_oid_index | AccessShareLock | 2 pg_amop | AccessShareLock | 2 pg_attribute_relid_attnum_index | AccessShareLock | 2 comments | AccessShareLock | 2 pg_shadow | AccessShareLock | 2 acls | AccessShareLock | 1 pg_index_indexrelid_index | AccessShareLock | 1 pg_attrdef_adrelid_adnum_index | AccessShareLock | 1 surveyresults_ix_userid | AccessShareLock | 1 pg_cast | AccessShareLock | 1 pg_shadow_usesysid_index | AccessShareLock | 1 pg_index | AccessShareLock | 1 games | AccessShareLock | 1 usersessions | AccessShareLock | 1 surveyoptions | AccessShareLock | 1 countries | AccessShareLock | 1 surveyresults | AccessShareLock | 1 vopenlocks | AccessShareLock | 1 pg_class_relname_nsp_index | AccessShareLock | 1 pg_namespace | AccessShareLock | 1 pg_database | AccessShareLock | 1 surveys | AccessShareLock | 1 pg_namespace_nspname_index | AccessShareLock | 1 community | AccessShareLock | 1 surveygroups | AccessShareLock | 1 pg_cast_source_target_index | AccessShareLock | 1 pg_locks | AccessShareLock | 1 pg_shadow_usename_index | AccessShareLock | 1 (44 rows) last pid: 27021; load averages: 9.73, 3.68, 1.93 up 19+22:06:45 14:26:53 102 processes: 27 running, 47 sleeping, 28 lock CPU states: 16.5% user, 0.0% nice, 14.3% system, 0.4% interrupt, 68.8% idle Mem: 95M Active, 1443M Inact, 190M Wired, 94M Cache, 112M Buf, 101M Free Swap: 4096M Total, 4096M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND 50557 pgsql 76 0 95276K 4860K select 0 24:33 1.03% 1.03% postgres 26968 pgsql -4 0 96356K 50828K semwai 0 0:00 1.08% 0.20% postgres 26992 pgsql 129 0 96048K 28616K *Giant 0 0:00 2.05% 0.20% postgres 26983 pgsql 127 0 96048K 21800K RUN 3 0:00 1.40% 0.20% postgres 26924 pgsql -4 0 96848K 38912K semwai 2 0:00 0.56% 0.15% postgres 26916 pgsql -4 0 96232K 40876K semwai 0 0:00 0.50% 0.15% postgres 26957 pgsql 126 0 96048K 32544K RUN 2 0:00 0.81% 0.15% postgres 26949 pgsql 125 0 96048K 32796K RUN 1 0:00 0.66% 0.15% postgres 26979 pgsql 128 0 96048K 35240K RUN 0 0:00 1.05% 0.15% postgres 26950 pgsql 126 0 96048K 29676K RUN 1 0:00 0.81% 0.15% postgres 26984 pgsql -4 0 96320K 12100K semwai 0 0:00 1.05% 0.15% postgres 50562 pgsql 76 0 6584K 4724K select 0 14:17 0.10% 0.10% postgres 26921 pgsql 4 0 96984K 58516K sbwait 1 0:00 0.38% 0.10% postgres 26911 pgsql 122 0 96984K 52856K RUN 1 0:00 0.33% 0.10% postgres 26946 pgsql 125 0 96904K 50328K RUN 1 0:00 0.44% 0.10% postgres 26913 pgsql -4 0 96984K 50536K semwai 0 0:00 0.33% 0.10% postgres 26908 pgsql -4 0 96272K 34812K semwai 0 0:00 0.30% 0.10% postgres 26965 pgsql 126 0 97028K 45736K RUN 0 0:00 0.54% 0.10% postgres 26926 pgsql -4 0 96376K 33628K semwai 0 0:00 0.38% 0.10% postgres 26944 pgsql -4 0 96228K 43220K semwai 3 0:00 0.44% 0.10% postgres 26936 pgsql 125 0 96048K 33600K RUN 1 0:00 0.44% 0.10% postgres 26947 pgsql -4 0 96048K 32544K semwai 1 0:00 0.44% 0.10% postgres 26964 pgsql 126 0 96048K 33720K RUN 2 0:00 0.54% 0.10% postgres 26956 pgsql 126 0 96048K 29644K RUN 0 0:00 0.54% 0.10% postgres 26960 pgsql 126 0 96048K 29000K RUN 1 0:00 0.54% 0.10% postgres 26977 pgsql 127 0 96048K 31988K RUN 3 0:00 0.70% 0.10% postgres 26969 pgsql 126 0 96048K 28896K RUN 1 0:00 0.54% 0.10% postgres 26978 pgsql 127 0 96240K 17868K RUN 0 0:00 0.70% 0.10% postgres 26973 pgsql 126 0 96048K 25872K RUN 0 0:00 0.54% 0.10% postgres 26989 pgsql 129 0 96048K 29680K RUN 2 0:00 1.03% 0.10% postgres 26993 pgsql 129 0 96048K 30276K *Giant 3 0:00 1.03% 0.10% postgres 26985 pgsql -4 0 96048K 20272K semwai 3 0:00 1.03% 0.10% postgres 27002 pgsql 131 0 96048K 22392K *Giant 3 0:00 2.00% 0.10% postgres 26997 pgsql 129 0 96240K 10128K RUN 0 0:00 1.03% 0.10% postgres 26998 pgsql 131 0 96008K 8560K *Giant 3 0:00 2.00% 0.10% postgres 27007 pgsql 131 0 96048K 15016K *Giant 3 0:00 2.00% 0.10% postgres 26937 pgsql 125 0 97028K 29152K RUN 3 0:00 0.22% 0.05% postgres 26959 pgsql 126 0 96048K 30932K RUN 1 0:00 0.27% 0.05% postgres 26981 pgsql -4 0 96048K 28652K semwai 0 0:00 0.35% 0.05% postgres 26955 pgsql 126 0 96048K 28892K RUN 2 0:00 0.27% 0.05% postgres 26972 pgsql 126 0 96272K 16792K RUN 3 0:00 0.27% 0.05% postgres 26976 pgsql 127 0 96048K 28860K RUN 0 0:00 0.35% 0.05% postgres 26994 pgsql 129 0 96048K 28648K *Giant 0 0:00 0.51% 0.05% postgres 26996 pgsql 129 0 96048K 29036K *Giant 0 0:00 0.51% 0.05% postgres 26995 pgsql 129 0 96048K 25360K *Giant 2 0:00 0.51% 0.05% postgres 26987 pgsql 129 0 96048K 21732K *Giant 2 0:00 0.51% 0.05% postgres 26991 pgsql 129 0 96048K 25324K *Giant 3 0:00 0.51% 0.05% postgres 26990 pgsql 129 0 96240K 10660K *Giant 0 0:00 0.51% 0.05% postgres 27003 pgsql 131 0 96048K 14644K *Giant 2 0:00 1.00% 0.05% postgres 50564 pgsql 76 0 5912K 3964K select 0 14:06 0.00% 0.00% postgres 1288 mysql 96 0 51092K 20280K select 1 5:21 0.00% 0.00% mysqld d01.int> vmstat 5 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 fd0 in sy cs us sy id 2 0 0 187748 268744 2332 0 0 0 1686 1 0 0 722 0 422 5 3 91 0 0 2 186664 269572 8795 0 0 0 1867 0 2 0 727 0 3190 5 4 91 0 0 0 183780 271296 8041 0 0 0 1633 0 3 0 665 0 1952 5 3 92 0 0 0 183780 271296 9555 0 0 0 1814 0 4 0 755 0 2850 5 4 90 [begin ab] 1 52 14 280896 213092 25974 0 0 0 2980 0 0 0 1202 0 26804 14 14 72 0 46 23 292164 205220 22854 0 0 0 3606 0 24 0 1082 0 40464 14 17 69 5 56 19 308216 194320 22839 0 0 0 4010 0 2 0 1102 0 41540 15 16 69 4 68 12 311540 194384 21314 0 0 0 3925 0 7 0 1038 0 49771 16 16 68 0 59 27 316904 187996 21678 0 0 0 3752 0 5 0 1057 0 43480 15 16 69 0 43 46 321064 186856 22168 0 0 0 4041 0 4 0 1047 0 43670 15 16 69 4 82 5 330776 180308 21773 0 0 0 3354 0 7 0 995 0 47354 13 18 68 [end ab] 0 29 5 252440 227260 19208 0 0 0 4966 0 34 0 1086 0 47183 16 19 66 0 0 0 183780 271280 10660 0 0 0 3944 0 3 0 760 0 10161 9 6 85 ^C
running two copies of: drop table test_data; create table test_data(f1 int); insert into test_data values (random() * 100); insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; insert into test_data select random() * 100 from test_data; create index test_index on test_data(f1); vacuum verbose analyze test_data; checkpoint; -- force nestloop indexscan plan set enable_seqscan to 0; set enable_mergejoin to 0; set enable_hashjoin to 0; explain select count(*) from test_data a, test_data b, test_data c where a.f1 = b.f1 and b.f1 = c.f1; select count(*) from test_data a, test_data b, test_data c where a.f1 = b.f1 and b.f1 = c.f1; -- Result from 1 of 2 executions: d01.int> psql -U test; Welcome to psql 7.4.5, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit test=> \i test.sql DROP TABLE CREATE TABLE INSERT 11296941 1 INSERT 11296942 1 INSERT 0 2 INSERT 0 4 INSERT 0 8 INSERT 0 16 INSERT 0 32 INSERT 0 64 INSERT 0 128 INSERT 0 256 INSERT 0 512 INSERT 0 1024 INSERT 0 2048 INSERT 0 4096 INSERT 0 8192 INSERT 0 16384 INSERT 0 32768 psql:test.sql:23: ERROR: relation "test_index" already exists psql:test.sql:25: INFO: vacuuming "public.test_data" psql:test.sql:25: INFO: index "test_index" now contains 65536 row versions in 198 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 1.74 sec. psql:test.sql:25: INFO: "test_data": found 0 removable, 65536 nonremovable row versions in 289 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.01u sec elapsed 1.75 sec. psql:test.sql:25: INFO: analyzing "public.test_data" psql:test.sql:25: INFO: "test_data": 289 pages, 3000 rows sampled, 65603 estimated total rows VACUUM CHECKPOINT SET SET SET QUERY PLAN ------------------------------------------------------------------------------------------------------- Aggregate (cost=50844039917.94..50844039917.94 rows=1 width=0) -> Nested Loop (cost=0.00..50773112976.11 rows=28370776731 width=0) -> Nested Loop (cost=0.00..77053124.16 rows=43119685 width=8) -> Index Scan using test_index on test_data b (cost=0.00..2074.77 rows=65536 width=4) -> Index Scan using test_index on test_data c (cost=0.00..1167.59 rows=649 width=4) Index Cond: ("outer".f1 = c.f1) -> Index Scan using test_index on test_data a (cost=0.00..1167.59 rows=649 width=4) Index Cond: ("outer".f1 = a.f1) (8 rows) ^CCancel request sent psql:test.sql:38: ERROR: canceling query due to user request test=> --- d01.int> vmstat 5 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 fd0 in sy cs us sy id 0 0 0 196360 258572 2336 0 0 0 1687 1 0 0 722 0 424 5 3 91 0 2 3 208792 249976 9917 0 0 0 1440 0 2 0 720 0 3958 5 4 91 1 0 0 202056 254900 14728 0 0 0 2844 0 12 0 1128 0 5755 9 7 84 3 0 0 199960 252148 13975 0 1 0 3009 0 47 0 1126 0 13410 15 8 77 0 2 3 203568 247276 8469 0 0 0 1603 0 42 0 906 0 8774 26 5 69 0 10 7 225508 232968 10533 0 0 0 1023 0 13 0 710 0 11756 28 765 0 1 4 202724 248416 8847 0 0 0 2374 0 1 0 697 0 8671 30 5 65 0 15 14 245484 220092 11856 0 0 0 1176 0 9 0 917 0 13885 26 8 66 0 20 7 243836 222684 13190 0 0 0 2208 0 7 0 831 0 17925 24 10 67 0 4 7 217928 240192 10879 0 0 0 3026 0 10 0 954 0 14428 28 7 65 0 3 10 223192 236708 10570 0 0 0 2042 0 35 0 872 0 10530 29 6 66 3 12 1 225916 235372 11205 0 0 0 2048 0 6 0 809 0 12919 26 7 67 0 9 8 224092 236468 12562 0 0 0 2116 0 6 0 882 0 11334 26 7 67 4 2 0 208728 246880 9979 0 0 0 2224 0 3 0 758 0 9319 28 6 66 4 5 6 217832 241144 11436 0 0 0 1995 0 9 0 852 0 10787 26 6 67 5 3 4 216452 241404 10989 0 0 0 2256 0 11 0 831 0 12633 27 7 66 1 0 0 198896 255112 10958 0 0 0 2281 0 45 0 855 0 14768 20 7 73 0 0 0 197204 256160 9382 0 0 0 1701 0 6 0 727 0 3634 5 4 91 ^C
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html