[PERFORM] Keeping processes open for re-use
Dear All Looking at the processes running on our server, it appears that each time a web server program makes a call to the database server, we start a new process on the database server which obviously has a start up cost. In Apache, for example, you can say at start up time,that you want the machine to reserve eg 8 processes and keep them open at all times so you don't have this overhead until you exceed this minimum number. Is there a way that we can achieve this in Postgres? We have a situation whereby we have lots of web based users doing short quick queries and obviously the start up time for a process must add to their perceived response time. Thanks Hilary Hilary Forbes DMR Limited (UK registration 01134804) A DMR Information and Technology Group company (www.dmr.co.uk) Direct tel 01689 889950 Fax 01689 860330 DMR is a UK registered trade mark of DMR Limited **
Re: [PERFORM] Keeping processes open for re-use
On Thu, 2006-11-09 at 13:35, Hilary Forbes wrote: > [snip] Is there a way that we can achieve this in Postgres? We have a > situation whereby we have lots of web based users doing short quick > queries and obviously the start up time for a process must add to > their perceived response time. Yes: google for "connection pooling". Note that different solutions exist for different programming languages, so you should look for connection pooling for the language you're using. HTH, Csaba. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Easy read-heavy benchmark kicking around?
On 11/8/06, Spiegelberg, Greg <[EMAIL PROTECTED]> wrote: Merlin, I'm kinda shocked you had such a bad exp. with the AMS200. We have a unit here hooked up to a 4-node Linux cluster with 4 databases banging on it and we get good, consistent perfomance out of it. All 4 nodes can throw 25 to 75 MB/s simultaneously without a hiccup. I'm curious, what was your AMS, server and SAN config? we had quad opteron 870 in a sun v40z. two trays of 400g sata drives and the 4 15k fc drives they make you buy. o/s was originally gentoo and emulex but we switched to redhat as4/qlogic to get support from them. the highest performance we ever got was around 120mb/sec writing to the 4 fc drives in raid 10. however, the sata's could not even do 100 and for some reason when we added a second raid group the performance dropped 40% for a reason that their performance group could not explain. compounding the problem was that our assigned tech did not know linux and there was a one week turnaround to get support emails answered. Their sales and support staff were snotty and completely unhelpful. Also we had to do a complex migration process which involved physically moving the unit to multiple racks for data transfer which we were going to have to coordinate with hitachi support because they do not allow you to rack/unrack your own unit. ultimately, we returned the unit and bought a adtx san. for less than half the price of the hitachi, we got a dual 4gb controller mixed sata/sas that supports 750g sata drives. It also has sas ports in the back for direct attachment to sas hba. in an active/active configuration, the unit can sustain 500mb/sec, and has 50% more storage in 1/3 the rack space. melrin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Keeping processes open for re-use
Yes. This is connection pooling. You can find a lot of examples from the internet on connection pooling, rather source codes. Also keep in mind that connection pools can be maintained on the application as well as the database server side. Check which one suits you. --Imad www.EnterpriseDB.com On 11/9/06, Hilary Forbes <[EMAIL PROTECTED]> wrote: Dear All Looking at the processes running on our server, it appears that each time a web server program makes a call to the database server, we start a new process on the database server which obviously has a start up cost. In Apache, for example, you can say at start up time,that you want the machine to reserve eg 8 processes and keep them open at all times so you don't have this overhead until you exceed this minimum number. Is there a way that we can achieve this in Postgres? We have a situation whereby we have lots of web based users doing short quick queries and obviously the start up time for a process must add to their perceived response time. Thanks Hilary Hilary Forbes DMR Limited (UK registration 01134804) A DMR Information and Technology Group company (www.dmr.co.uk) Direct tel 01689 889950 Fax 01689 860330 DMR is a UK registered trade mark of DMR Limited ** ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Keeping processes open for re-use
Csaba Nagy wrote: On Thu, 2006-11-09 at 13:35, Hilary Forbes wrote: [snip] Is there a way that we can achieve this in Postgres? We have a situation whereby we have lots of web based users doing short quick queries and obviously the start up time for a process must add to their perceived response time. Yes: google for "connection pooling". Note that different solutions exist for different programming languages, so you should look for connection pooling for the language you're using. If you are using PHP then persistent connections may be a simpler way if it is enough for your needs. Basically replace pg_connect with pg_pconnect Other languages may have a similar option. http://php.net/manual/en/features.persistent-connections.php -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Keeping processes open for re-use
On Fri, 2006-11-10 at 12:39 +1030, Shane Ambler wrote: > Csaba Nagy wrote: > > On Thu, 2006-11-09 at 13:35, Hilary Forbes wrote: > >> [snip] Is there a way that we can achieve this in Postgres? We have a > >> situation whereby we have lots of web based users doing short quick > >> queries and obviously the start up time for a process must add to > >> their perceived response time. > > > > Yes: google for "connection pooling". Note that different solutions > > exist for different programming languages, so you should look for > > connection pooling for the language you're using. > > > > If you are using PHP then persistent connections may be a simpler way if > it is enough for your needs. I would actually suggest pg_pool over pg_pconnect. Joshua D. Drake > > Basically replace pg_connect with pg_pconnect > > Other languages may have a similar option. > > http://php.net/manual/en/features.persistent-connections.php > > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] 10x rowcount mis-estimation favouring merge over nestloop
I'm executing the following query: select hf.mailbox,hf.uid,hf.position,hf.part,hf.field,hf.value, af.address,a.name,a.localpart,a.domain from header_fields hf left join address_fields af using ( mailbox, uid, position, part, field ) left join addresses a on (af.address=a.id) where hf.field<=12 and (hf.part!='' or hf.value ilike '%,%') ; The header_fields table contains 13.5M rows, of which only ~250K match the where condition. I created an index like this: create index hffpv on header_fields(field) where field<=12 and (part!='' or value ilike '%,%') By default, explain analyse shows me a plan like this: Hash Left Join (cost=1225503.02..1506125.88 rows=201 width=143) (actual time=106467.431..117902.397 rows=1113355 loops=1) Hash Cond: ("outer".address = "inner".id) -> Merge Left Join (cost=1211354.65..1288896.97 rows=201 width=91) (actual time=104792.505..110648.253 rows=1113355 loops=1) Merge Cond: (("outer".field = "inner".field) AND ("outer".part = "inner".part) AND ("outer"."position" = "inner"."position") AND ("outer".uid = "inner".uid) AND ("outer".mailbox = "inner".mailbox)) -> Sort (cost=665399.78..670594.21 rows=201 width=87) (actual time=39463.784..39724.772 rows=264180 loops=1) Sort Key: hf.field, hf.part, hf."position", hf.uid, hf.mailbox -> Bitmap Heap Scan on header_fields hf (cost=1505.63..325237.46 rows=201 width=87) (actual time=3495.308..33767.229 rows=264180 loops=1) Recheck Cond: ((field <= 12) AND ((part <> ''::text) OR (value ~~* '%,%'::text))) -> Bitmap Index Scan on hffpv (cost=0.00..1505.63 rows=201 width=0) (actual time=3410.069..3410.069 rows=264180 loops=1) Index Cond: (field <= 12) -> Sort (cost=545954.87..553141.07 rows=2874480 width=24) (actual time=65328.437..67437.846 rows=2874230 loops=1) Sort Key: af.field, af.part, af."position", af.uid, af.mailbox -> Seq Scan on address_fields af (cost=0.00..163548.00 rows=2874480 width=24) (actual time=12.434..4076.694 rows=2874230 loops=1) -> Hash (cost=11714.35..11714.35 rows=190807 width=56) (actual time=1670.629..1670.629 rows=190807 loops=1) -> Seq Scan on addresses a (cost=0.00..11714.35 rows=190807 width=56) (actual time=39.944..1398.897 rows=190807 loops=1) Total runtime: 118381.608 ms Note the 2M estimated rowcount in the bitmap index scan on header_fields vs. the actual number (264180). That mis-estimation also causes it to do a sequential scan of address_fields, though there's a usable index. If I set both enable_mergejoin and enable_seqscan to false, I get a plan like the following: Hash Left Join (cost=8796.82..72064677.06 rows=201 width=143) (actual time=4400.706..58110.697 rows=1113355 loops=1) Hash Cond: ("outer".address = "inner".id) -> Nested Loop Left Join (cost=1505.63..71937416.17 rows=201 width=91) (actual time=3486.238..52351.567 rows=1113355 loops=1) Join Filter: (("outer"."position" = "inner"."position") AND ("outer".part = "inner".part) AND ("outer".field = "inner".field)) -> Bitmap Heap Scan on header_fields hf (cost=1505.63..242126.62 rows=201 width=87) (actual time=3478.202..39181.477 rows=264180 loops=1) Recheck Cond: ((field <= 12) AND ((part <> ''::text) OR (value ~~* '%,%'::text))) -> Bitmap Index Scan on hffpv (cost=0.00..1505.63 rows=201 width=0) (actual time=3393.949..3393.949 rows=264180 loops=1) Index Cond: (field <= 12) -> Index Scan using af_mu on address_fields af (cost=0.00..34.26 rows=11 width=24) (actual time=0.028..0.040 rows=7 loops=264180) Index Cond: (("outer".mailbox = af.mailbox) AND ("outer".uid = af.uid)) -> Hash (cost=4857.17..4857.17 rows=190807 width=56) (actual time=764.337..764.337 rows=190807 loops=1) -> Index Scan using addresses_pkey on addresses a (cost=0.00..4857.17 rows=190807 width=56) (actual time=29.381..484.826 rows=190807 loops=1) Total runtime: 58459.624 ms Which looks like a considerably nicer plan (but still features the wild mis-estimation, though the index has approximately the right rowcount). I tried increasing the statistics target on header_fields.field, part, and value to 100, but the estimate always hovers around the 2M mark. Does anyone have any ideas about what's wrong, and how to fix it? Thanks. -- ams ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 10x rowcount mis-estimation favouring merge over nestloop
Abhijit Menon-Sen <[EMAIL PROTECTED]> writes: > The header_fields table contains 13.5M rows, of which only ~250K match > the where condition. I created an index like this: > create index hffpv on header_fields(field) > where field<=12 and (part!='' or value ilike '%,%') > Note the 2M estimated rowcount in the bitmap index scan on header_fields > vs. the actual number (264180). I think this is basically a lack-of-column-correlation-stats problem. The planner is estimating this on the basis of the overall selectivity of the "field<=12" condition, but it seems that "field<=12" is true for a much smaller fraction of the rows satisfying (part!='' or value ilike '%,%') than for the general population of rows in the header_fields table. There's been some speculation about obtaining stats on partial indexes as a substitute for solving the general problem of correlation stats, but I for one don't have a very clear understanding of how it'd work. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 10x rowcount mis-estimation favouring merge over nestloop
At 2006-11-10 01:15:24 -0500, [EMAIL PROTECTED] wrote: > > it seems that "field<=12" is true for a much smaller fraction of the > rows satisfying (part!='' or value ilike '%,%') than for the general > population of rows in the header_fields table. Indeed. One-sixth of the rows in the entire table match field<=12, but only one-fifteenth of the rows matching the part/value condition also match field<=12. > There's been some speculation about obtaining stats on partial indexes > as a substitute for solving the general problem of correlation stats, Oh. So my partial index's rowcount isn't being considered at all? That explains a lot. Ok, I'll just run the query with mergejoin and seqscan disabled. (I can't think of much else to do to speed it up, anyway.) Thanks. -- ams ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate