[PERFORM] Keeping processes open for re-use

2006-11-09 Thread Hilary Forbes

Dear All
Looking at the processes running on our server, it appears that each time
a web server program makes a call to the database server, we start a new
process on the database server which obviously has a start up cost. In
Apache, for example, you can say at start up time,that you want the
machine to reserve eg 8 processes and keep them open at all times so you
don't have this overhead until you exceed this minimum number. Is there a
way that we can achieve this in Postgres? We have a situation whereby we
have lots of web based users doing short quick queries and obviously the
start up time for a process must add to their perceived response
time.
Thanks
Hilary


Hilary Forbes
DMR Limited (UK registration 01134804) 
A DMR Information and Technology Group company
(www.dmr.co.uk)

Direct tel 01689 889950 Fax 01689 860330 
DMR is a UK registered trade mark of DMR Limited
**



Re: [PERFORM] Keeping processes open for re-use

2006-11-09 Thread Csaba Nagy
On Thu, 2006-11-09 at 13:35, Hilary Forbes wrote:
> [snip] Is there a way that we can achieve this in Postgres? We have a
> situation whereby we have lots of web based users doing short quick
> queries and obviously the start up time for a process must add to
> their perceived response time.

Yes: google for "connection pooling". Note that different solutions
exist for different programming languages, so you should look for
connection pooling for the language you're using.

HTH,
Csaba.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Easy read-heavy benchmark kicking around?

2006-11-09 Thread Merlin Moncure

On 11/8/06, Spiegelberg, Greg <[EMAIL PROTECTED]> wrote:

Merlin,

I'm kinda shocked you had such a bad exp. with the AMS200.  We have a
unit here hooked up to a 4-node Linux cluster with 4 databases banging
on it and we get good, consistent perfomance out of it.  All 4 nodes can
throw 25 to 75 MB/s simultaneously without a hiccup.

I'm curious, what was your AMS, server and SAN config?


we had quad opteron 870 in a sun v40z.   two trays of 400g sata drives
and the 4 15k fc drives they make you buy.  o/s was originally gentoo
and emulex but we switched to redhat as4/qlogic to get support from
them.

the highest performance we ever got was around 120mb/sec writing to
the 4 fc drives in raid 10.  however, the sata's could not even do 100
and for some reason when we added a second raid group the performance
dropped 40% for a reason that their performance group could not
explain. compounding the problem was that our assigned tech did not
know linux and there was a one week turnaround to get support emails
answered.  Their sales and support staff were snotty and completely
unhelpful.  Also we had to do a complex migration process which
involved physically moving the unit to multiple racks for data
transfer which we were going to have to coordinate with hitachi
support because they do not allow you to rack/unrack your own unit.

ultimately, we returned the unit and bought a adtx san. for less than
half the price of the hitachi, we got a dual 4gb controller mixed
sata/sas that supports 750g sata drives.  It also has sas ports in the
back for direct attachment to sas hba.  in an active/active
configuration, the unit can sustain 500mb/sec, and has 50% more
storage in 1/3 the rack space.

melrin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Keeping processes open for re-use

2006-11-09 Thread imad

Yes. This is connection pooling. You can find a lot of examples from
the internet on connection pooling, rather source codes. Also keep in
mind that connection pools can be maintained on the application as
well as the database server side. Check which one suits you.


--Imad
www.EnterpriseDB.com


On 11/9/06, Hilary Forbes <[EMAIL PROTECTED]> wrote:

 Dear All

 Looking at the processes running on our server, it appears that each time a
web server program makes a call to the database server, we start a new
process on the database server which obviously has a start up cost. In
Apache, for example, you can say at start up time,that you want the machine
to reserve eg 8 processes and keep them open at all times so you don't have
this overhead until you exceed this minimum number. Is there a way that we
can achieve this in Postgres? We have a situation whereby we have lots of
web based users doing short quick queries and obviously the start up time
for a process must add to their perceived response time.

 Thanks
 Hilary





 Hilary Forbes
 DMR Limited (UK registration 01134804)
 A DMR Information and Technology Group company (www.dmr.co.uk)
 Direct tel 01689 889950 Fax 01689 860330
 DMR is a UK registered trade mark of DMR Limited
 **


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Keeping processes open for re-use

2006-11-09 Thread Shane Ambler

Csaba Nagy wrote:

On Thu, 2006-11-09 at 13:35, Hilary Forbes wrote:

[snip] Is there a way that we can achieve this in Postgres? We have a
situation whereby we have lots of web based users doing short quick
queries and obviously the start up time for a process must add to
their perceived response time.


Yes: google for "connection pooling". Note that different solutions
exist for different programming languages, so you should look for
connection pooling for the language you're using.



If you are using PHP then persistent connections may be a simpler way if 
it is enough for your needs.


Basically replace pg_connect with pg_pconnect

Other languages may have a similar option.

http://php.net/manual/en/features.persistent-connections.php


--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Keeping processes open for re-use

2006-11-09 Thread Joshua D. Drake
On Fri, 2006-11-10 at 12:39 +1030, Shane Ambler wrote:
> Csaba Nagy wrote:
> > On Thu, 2006-11-09 at 13:35, Hilary Forbes wrote:
> >> [snip] Is there a way that we can achieve this in Postgres? We have a
> >> situation whereby we have lots of web based users doing short quick
> >> queries and obviously the start up time for a process must add to
> >> their perceived response time.
> > 
> > Yes: google for "connection pooling". Note that different solutions
> > exist for different programming languages, so you should look for
> > connection pooling for the language you're using.
> > 
> 
> If you are using PHP then persistent connections may be a simpler way if 
> it is enough for your needs.

I would actually suggest pg_pool over pg_pconnect.

Joshua D. Drake


> 
> Basically replace pg_connect with pg_pconnect
> 
> Other languages may have a similar option.
> 
> http://php.net/manual/en/features.persistent-connections.php
> 
> 
-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] 10x rowcount mis-estimation favouring merge over nestloop

2006-11-09 Thread Abhijit Menon-Sen
I'm executing the following query:

select
hf.mailbox,hf.uid,hf.position,hf.part,hf.field,hf.value,
af.address,a.name,a.localpart,a.domain
from
header_fields hf
left join address_fields af
using ( mailbox, uid, position, part, field )
left join addresses a
on (af.address=a.id)
where
hf.field<=12 and (hf.part!='' or hf.value ilike '%,%') ;

The header_fields table contains 13.5M rows, of which only ~250K match
the where condition. I created an index like this:

create index hffpv on header_fields(field)
where field<=12 and (part!='' or value ilike '%,%')

By default, explain analyse shows me a plan like this:

 Hash Left Join  (cost=1225503.02..1506125.88 rows=201 width=143) (actual 
time=106467.431..117902.397 rows=1113355 loops=1)
   Hash Cond: ("outer".address = "inner".id)
   ->  Merge Left Join  (cost=1211354.65..1288896.97 rows=201 width=91) 
(actual time=104792.505..110648.253 rows=1113355 loops=1)
 Merge Cond: (("outer".field = "inner".field) AND ("outer".part = 
"inner".part) AND ("outer"."position" = "inner"."position") AND ("outer".uid = 
"inner".uid) AND ("outer".mailbox = "inner".mailbox))
 ->  Sort  (cost=665399.78..670594.21 rows=201 width=87) (actual 
time=39463.784..39724.772 rows=264180 loops=1)
   Sort Key: hf.field, hf.part, hf."position", hf.uid, hf.mailbox
   ->  Bitmap Heap Scan on header_fields hf  
(cost=1505.63..325237.46 rows=201 width=87) (actual 
time=3495.308..33767.229 rows=264180 loops=1)
 Recheck Cond: ((field <= 12) AND ((part <> ''::text) OR 
(value ~~* '%,%'::text)))
 ->  Bitmap Index Scan on hffpv  (cost=0.00..1505.63 
rows=201 width=0) (actual time=3410.069..3410.069 rows=264180 loops=1)
   Index Cond: (field <= 12)
 ->  Sort  (cost=545954.87..553141.07 rows=2874480 width=24) (actual 
time=65328.437..67437.846 rows=2874230 loops=1)
   Sort Key: af.field, af.part, af."position", af.uid, af.mailbox
   ->  Seq Scan on address_fields af  (cost=0.00..163548.00 
rows=2874480 width=24) (actual time=12.434..4076.694 rows=2874230 loops=1)
   ->  Hash  (cost=11714.35..11714.35 rows=190807 width=56) (actual 
time=1670.629..1670.629 rows=190807 loops=1)
 ->  Seq Scan on addresses a  (cost=0.00..11714.35 rows=190807 
width=56) (actual time=39.944..1398.897 rows=190807 loops=1)
 Total runtime: 118381.608 ms

Note the 2M estimated rowcount in the bitmap index scan on header_fields
vs. the actual number (264180). That mis-estimation also causes it to do
a sequential scan of address_fields, though there's a usable index. If I
set both enable_mergejoin and enable_seqscan to false, I get a plan like
the following:

 Hash Left Join  (cost=8796.82..72064677.06 rows=201 width=143) (actual 
time=4400.706..58110.697 rows=1113355 loops=1)
   Hash Cond: ("outer".address = "inner".id)
   ->  Nested Loop Left Join  (cost=1505.63..71937416.17 rows=201 width=91) 
(actual time=3486.238..52351.567 rows=1113355 loops=1)
 Join Filter: (("outer"."position" = "inner"."position") AND 
("outer".part = "inner".part) AND ("outer".field = "inner".field))
 ->  Bitmap Heap Scan on header_fields hf  (cost=1505.63..242126.62 
rows=201 width=87) (actual time=3478.202..39181.477 rows=264180 loops=1)
   Recheck Cond: ((field <= 12) AND ((part <> ''::text) OR (value 
~~* '%,%'::text)))
   ->  Bitmap Index Scan on hffpv  (cost=0.00..1505.63 rows=201 
width=0) (actual time=3393.949..3393.949 rows=264180 loops=1)
 Index Cond: (field <= 12)
 ->  Index Scan using af_mu on address_fields af  (cost=0.00..34.26 
rows=11 width=24) (actual time=0.028..0.040 rows=7 loops=264180)
   Index Cond: (("outer".mailbox = af.mailbox) AND ("outer".uid = 
af.uid))
   ->  Hash  (cost=4857.17..4857.17 rows=190807 width=56) (actual 
time=764.337..764.337 rows=190807 loops=1)
 ->  Index Scan using addresses_pkey on addresses a  
(cost=0.00..4857.17 rows=190807 width=56) (actual time=29.381..484.826 
rows=190807 loops=1)
 Total runtime: 58459.624 ms

Which looks like a considerably nicer plan (but still features the wild
mis-estimation, though the index has approximately the right rowcount).
I tried increasing the statistics target on header_fields.field, part,
and value to 100, but the estimate always hovers around the 2M mark.

Does anyone have any ideas about what's wrong, and how to fix it?

Thanks.

-- ams

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] 10x rowcount mis-estimation favouring merge over nestloop

2006-11-09 Thread Tom Lane
Abhijit Menon-Sen <[EMAIL PROTECTED]> writes:
> The header_fields table contains 13.5M rows, of which only ~250K match
> the where condition. I created an index like this:
> create index hffpv on header_fields(field)
> where field<=12 and (part!='' or value ilike '%,%')

> Note the 2M estimated rowcount in the bitmap index scan on header_fields
> vs. the actual number (264180).

I think this is basically a lack-of-column-correlation-stats problem.
The planner is estimating this on the basis of the overall selectivity
of the "field<=12" condition, but it seems that "field<=12" is true for
a much smaller fraction of the rows satisfying (part!='' or value ilike '%,%')
than for the general population of rows in the header_fields table.

There's been some speculation about obtaining stats on partial indexes
as a substitute for solving the general problem of correlation stats,
but I for one don't have a very clear understanding of how it'd work.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] 10x rowcount mis-estimation favouring merge over nestloop

2006-11-09 Thread Abhijit Menon-Sen
At 2006-11-10 01:15:24 -0500, [EMAIL PROTECTED] wrote:
>
> it seems that "field<=12" is true for a much smaller fraction of the
> rows satisfying (part!='' or value ilike '%,%') than for the general
> population of rows in the header_fields table.

Indeed. One-sixth of the rows in the entire table match field<=12, but
only one-fifteenth of the rows matching the part/value condition also
match field<=12.

> There's been some speculation about obtaining stats on partial indexes
> as a substitute for solving the general problem of correlation stats,

Oh. So my partial index's rowcount isn't being considered at all? That
explains a lot. Ok, I'll just run the query with mergejoin and seqscan
disabled. (I can't think of much else to do to speed it up, anyway.)

Thanks.

-- ams

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate