[PERFORM] 158x query improvement when removing 2 (noop) WHERE conditions

2010-07-28 Thread Louis-David Mitterrand
Hi,

On a hunch I removed two (legacy) WHERE conditions from the following
query I obtained a 158x speed improvement. Yet these condiditions do not
filter anything. Does that make any sense?

The EXPLAIN ANALYSE output is attached with, first the fast version and
then the slow one.

I'd like to understand what is at play here to explain such a dramatic
difference. This is with pg 8.4.4.

Thanks,

select p3.price as first_price,
p4.price as second_price,
p5.price as third_price,
t.id_cabin_category, t.id_currency, t.id_alert_cruise, t.id_cruise,
t.created_by, t.cabin_name, t.cabin_cat_code, t.cabin_type_name,
cr.login, cr.email, fx.currency_symbol, fx.currency_code,
c.saildate, ct.id_cruise_type, ct.package_name, s.id_ship, s.ship_name
from (select
first_value(max(p.id_price)) over w as first_id_price,
nth_value(max(p.id_price),2) over w as second_id_price,
p.id_cabin_category, p.id_currency,
p.created_on  ac.modified_on as is_new_price,
ac.id_alert_cruise, ac.id_cruise, ac.cabin_name, ac.created_by,
ac.cabin_cat_code, ac.cabin_type_name
from alert_to_category ac
join price p on (ac.id_cabin_category=p.id_cabin_category and
p.id_cruise=ac.id_cruise and (p.id_currency=ac.id_currency or
ac.id_currency is null))
-- XXX: removing these speeds up query by 158x !
-- where (ac.created_by=0 or nullif(0, 0) is null)
-- and (p.id_cruise=0 or nullif(0, 0) is null)
group by ac.id_cruise,ac.created_by,ac.id_alert_cruise,ac.cabin_name,
ac.cabin_cat_code, ac.cabin_type_name,
p.id_cabin_category,p.id_currency,p.id_cruise,
p.created_on  ac.modified_on
window w as (partition by
p.id_currency,p.id_cabin_category,p.id_cruise order by
p.created_on  ac.modified_on desc
rows between unbounded preceding and unbounded following)
order by p.id_cabin_category,p.id_currency) as t
join cruiser cr on (t.created_by=cr.id_cruiser)
join cruise c using (id_cruise)
join cruise_type ct using (id_cruise_type)
join ship s using (id_ship)
join currency fx using (id_currency)
join price p3 on (t.first_id_price=p3.id_price)
left join price p4 on (t.second_id_price=p4.id_price)
left join price p5 on (p5.id_price=(select id_price from price 
where id_cruise=p3.id_cruise and id_cabin_category=p3.id_cabin_category 
and id_currency=p3.id_currency and id_price  t.second_id_price 
order by id_price desc limit 1))
where t.is_new_price is true and p3.price  p4.price;

-- 
http://www.cruisefish.net

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 158x query improvement when removing 2 (noop) WHERE conditions

2010-07-28 Thread Andres Freund
On Wednesday 28 July 2010 12:27:44 Louis-David Mitterrand wrote:
 The EXPLAIN ANALYSE output is attached with, first the fast version and
 then the slow one.
I think you forgot to attach it.

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-28 Thread Vitalii Tymchyshyn

28.07.10 04:56, Tom Lane написав(ла):


I'm not asserting it's true, just suggesting it's entirely possible.
Other than the fork() cost itself and whatever authentication activity
there might be, practically all the startup cost of a new backend can be
seen as cache-populating activities.  You'd have to redo all of that,
*plus* pay the costs of getting rid of the previous cache entries.
Maybe the latter costs less than a fork(), or maybe not.  fork() is
pretty cheap on modern Unixen.

   
Actually as for me, the problem is that one can't raise number of 
database connections high without overloading CPU/memory/disk, so 
external pooling is needed. If postgresql had something like 
max_active_queries setting that limit number of connections that are not 
in IDLE [in transaction] state, one could raise max connections high 
(and I don't think idle process itself has much overhead) and limit 
max_active_queries to get maximum performance and won't use external 
pooling. Of course this won't help if the numbers are really high, but 
could work out the most common cases.


Best regards, Vitalii Tymchyshyn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...

2010-07-28 Thread Vitalii Tymchyshyn

27.07.10 02:03, Lew написав(ла):

Piotr Gasidło wrote:

EXPLAIN ANALYZE SELECT ...
Total runtime: 4.782 ms
Time: 25,970 ms


Strangely, the runtime is shown with a period for the separator, though.

One value is calculated on server by EXPLAIN ANALYZE command, another is 
calculated by psql itself.


Best regards, Vitalii Tymchyshyn


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-28 Thread Michael Stone

On Mon, Jul 26, 2010 at 01:47:14PM -0600, Scott Marlowe wrote:

Note that SSDs aren't usually real fast at large sequential writes
though, so it might be worth putting pg_xlog on a spinning pair in a
mirror and seeing how much, if any, the SSD drive speeds up when not
having to do pg_xlog.


xlog is also where I use ext2; it does bench faster for me in that 
config, and the fsck issues don't really exist because you're not in a 
situation with a lot of files being created/removed.


Mike Stone

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-28 Thread Michael Stone

On Mon, Jul 26, 2010 at 03:23:20PM -0600, Greg Spiegelberg wrote:

I know I'm talking development now but is there a case for a pg_xlog block
device to remove the file system overhead and guaranteeing your data is
written sequentially every time?


If you dedicate a partition to xlog, you already get that in practice 
with no extra devlopment.


Mike Stone

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-28 Thread Yeb Havinga

Michael Stone wrote:

On Mon, Jul 26, 2010 at 03:23:20PM -0600, Greg Spiegelberg wrote:
I know I'm talking development now but is there a case for a pg_xlog 
block

device to remove the file system overhead and guaranteeing your data is
written sequentially every time?


If you dedicate a partition to xlog, you already get that in practice 
with no extra devlopment.
Due to the LBA remapping of the SSD, I'm not sure of putting files that 
are sequentially written in a different partition (together with e.g. 
tables) would make a difference: in the end the SSD will have a set new 
blocks in it's buffer and somehow arrange them into sets of 128KB of 
256KB writes for the flash chips. See also 
http://www.anandtech.com/show/2899/2


But I ran out of ideas to test, so I'm going to test it anyway.

regards,
Yeb Havinga


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-28 Thread Yeb Havinga

Yeb Havinga wrote:

Michael Stone wrote:

On Mon, Jul 26, 2010 at 03:23:20PM -0600, Greg Spiegelberg wrote:
I know I'm talking development now but is there a case for a pg_xlog 
block

device to remove the file system overhead and guaranteeing your data is
written sequentially every time?


If you dedicate a partition to xlog, you already get that in practice 
with no extra devlopment.
Due to the LBA remapping of the SSD, I'm not sure of putting files 
that are sequentially written in a different partition (together with 
e.g. tables) would make a difference: in the end the SSD will have a 
set new blocks in it's buffer and somehow arrange them into sets of 
128KB of 256KB writes for the flash chips. See also 
http://www.anandtech.com/show/2899/2


But I ran out of ideas to test, so I'm going to test it anyway.
Same machine config as mentioned before, with data and xlog on separate 
partitions, ext3 with barrier off (save on this SSD).


pgbench -c 10 -M prepared -T 3600 -l test
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 300
query mode: prepared
number of clients: 10
duration: 3600 s
number of transactions actually processed: 10856359
tps = 3015.560252 (including connections establishing)
tps = 3015.575739 (excluding connections establishing)

This is about 25% faster than data and xlog combined on the same filesystem.

Below is output from iostat -xk 1 -p /dev/sda, which shows each second 
per partition statistics.

sda2 is data, sda3 is xlog In the third second a checkpoint seems to start.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 63.500.00   30.502.500.003.50

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00  6518.00   36.00 2211.00   148.00 35524.00
31.75 0.280.12   0.11  25.00
sda1  0.00 2.000.005.00 0.00   636.00   
254.40 0.036.00   2.00   1.00
sda2  0.00   218.00   36.00   40.00   148.00  1032.00
31.05 0.000.00   0.00   0.00
sda3  0.00  6298.000.00 2166.00 0.00 33856.00
31.26 0.250.12   0.12  25.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 60.500.00   37.500.500.001.50

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00  6514.00   33.00 2283.00   140.00 35188.00
30.51 0.320.14   0.13  29.00
sda1  0.00 0.000.003.00 0.0012.00 
8.00 0.000.00   0.00   0.00
sda2  0.00 0.00   33.002.00   140.00 8.00 
8.46 0.030.86   0.29   1.00
sda3  0.00  6514.000.00 2278.00 0.00 35168.00
30.88 0.290.13   0.13  29.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 33.000.00   34.00   18.000.00   15.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00  3782.007.00 7235.0028.00 44068.00
12.1869.529.46   0.09  62.00
sda1  0.00 0.000.001.00 0.00 4.00 
8.00 0.000.00   0.00   0.00
sda2  0.00   322.007.00 6018.0028.00 25360.00 
8.4369.22   11.33   0.08  47.00
sda3  0.00  3460.000.00 1222.00 0.00 18728.00
30.65 0.300.25   0.25  30.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  9.000.00   36.00   22.500.00   32.50

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00  1079.003.00 0.0012.00 49060.00 
8.83   120.64   10.95   0.08  86.00
sda1  0.00 2.000.002.00 0.00   320.00   
320.00 0.12   60.00  35.00   7.00
sda2  0.0030.003.00 10739.0012.00 43076.00 
8.02   120.49   11.30   0.08  83.00
sda3  0.00  1047.000.00  363.00 0.00  5640.00
31.07 0.030.08   0.08   3.00


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 62.000.00   31.002.000.005.00

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda   0.00  6267.00   51.00 2493.00   208.00 35040.00
27.71 1.800.71   0.12  31.00
sda1  0.00 0.000.003.00 0.0012.00 
8.00 0.000.00   0.00   0.00
sda2  0.00   123.00   51.00  344.00   208.00  1868.00
10.51 1.503.80   0.10   4.00
sda3  0.00  6144.000.00 2146.00 0.00 33160.00
30.90 0.300.14   0.14  30.00



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-28 Thread Josh Berkus
On 7/27/10 6:56 PM, Tom Lane wrote:
 Yeah, if it weren't for that I'd say sure let's try it.  But I'm
 afraid we'd be introducing significant headaches in return for a gain
 that's quite speculative.

Well, the *gain* isn't speculative.  For example, I am once again
dealing with the issue that PG backend processes on Solaris never give
up their RAM, resulting in pathological swapping situations if you have
many idle connections.  This requires me to install pgpool, which is
overkill (since it has load balancing, replication, and more) just to
make sure that connections get recycled so that I don't have 300 idle
connections eating up 8GB of RAM.

Relative to switching databases, I'd tend to say that, like pgbouncer
and pgpool, we don't need to support that.  Each user/database combo can
have their own pool.  While not ideal, this would be good enough for
90% of users.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-28 Thread Robert Haas
On Wed, Jul 28, 2010 at 3:44 PM, Josh Berkus j...@agliodbs.com wrote:
 On 7/27/10 6:56 PM, Tom Lane wrote:
 Yeah, if it weren't for that I'd say sure let's try it.  But I'm
 afraid we'd be introducing significant headaches in return for a gain
 that's quite speculative.

 Well, the *gain* isn't speculative.  For example, I am once again
 dealing with the issue that PG backend processes on Solaris never give
 up their RAM, resulting in pathological swapping situations if you have
 many idle connections.  This requires me to install pgpool, which is
 overkill (since it has load balancing, replication, and more) just to
 make sure that connections get recycled so that I don't have 300 idle
 connections eating up 8GB of RAM.

 Relative to switching databases, I'd tend to say that, like pgbouncer
 and pgpool, we don't need to support that.  Each user/database combo can
 have their own pool.  While not ideal, this would be good enough for
 90% of users.

However, if we don't support that, we can't do any sort of pooling-ish
thing without the ability to pass file descriptors between processes;
and Tom seems fairly convinced there's no portable way to do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 However, if we don't support that, we can't do any sort of pooling-ish
 thing without the ability to pass file descriptors between processes;
 and Tom seems fairly convinced there's no portable way to do that.

Well, what it would come down to is: are we prepared to not support
pooling on platforms without such a capability?  It's certainly possible
to do it on many modern platforms, but I don't believe we can make it
happen everywhere.  Generally we've tried to avoid having major features
that don't work everywhere ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-28 Thread Andres Freund
On Wed, Jul 28, 2010 at 04:10:08PM -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  However, if we don't support that, we can't do any sort of pooling-ish
  thing without the ability to pass file descriptors between processes;
  and Tom seems fairly convinced there's no portable way to do that.

 Well, what it would come down to is: are we prepared to not support
 pooling on platforms without such a capability?  It's certainly possible
 to do it on many modern platforms, but I don't believe we can make it
 happen everywhere.  Generally we've tried to avoid having major features
 that don't work everywhere ...
Which platforms do you have in mind here? All of the platforms I found
documented to be supported seem to support at least one of SCM_RIGHTS,
WSADuplicateSocket or STREAMS/FD_INSERT.
Most if not all beside windows support SCM_RIGHTS. The ones I am
dubious about support FD_INSERT...

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] planner index choice

2010-07-28 Thread Chris

Hi there,

I have a simple query where I don't understand the planner's choice to 
use a particular index.


The main table looks like this:

# \d sq_ast_attr_val
   Table public.sq_ast_attr_val
   Column| Type  |  Modifiers
-+---+--
 assetid | character varying(15) | not null
 attrid  | integer   | not null
 contextid   | integer   | not null default 0
 custom_val  | text  |
 use_default | character(1)  | not null default '1'::bpchar
Indexes:
ast_attr_val_pk PRIMARY KEY, btree (assetid, attrid, contextid)
sq_ast_attr_val_assetid btree (assetid)
sq_ast_attr_val_attrid btree (attrid)
sq_ast_attr_val_concat btree (((assetid::text || '~'::text) || 
attrid))

sq_ast_attr_val_contextid btree (contextid)


The query:

SELECT
  assetid, custom_val
FROM
  sq_ast_attr_val
WHERE
  attrid IN (SELECT attrid FROM sq_ast_attr WHERE name = 
'is_contextable' AND (type_code = 'metadata_field_select' OR 
owning_type_code = 'metadata_field'))

  AND contextid = 0
INTERSECT
SELECT
  assetid, custom_val
FROM
  sq_ast_attr_val
WHERE
  assetid = '62321'
  AND contextid = 0;


The explain analyze plan:

http://explain.depesz.com/s/nWs

I'm not sure why it's picking the sq_ast_attr_val_contextid index to do 
the contextid = 0 check, the other parts (attrid/assetid) are much more 
selective.


If I drop that particular index:

http://explain.depesz.com/s/zp


All (I hope) relevant postgres info:

Centos 5.5 x86_64 running pg8.4.4.

Server has 8gig memory.

# select name, setting, source from pg_settings where name in 
('shared_buffers', 'effective_cache_size', 'work_mem');

 name | setting
--+
shared_buffers| 262144
effective_cache_size  | 655360
work_mem  | 32768

All planner options are enabled:

# select name, setting, source from pg_settings where name like 'enable_%';
   name| setting | source
---+-+-
 enable_bitmapscan | on  | default
 enable_hashagg| on  | default
 enable_hashjoin   | on  | default
 enable_indexscan  | on  | default
 enable_mergejoin  | on  | default
 enable_nestloop   | on  | default
 enable_seqscan| on  | default
 enable_sort   | on  | default
 enable_tidscan| on  | default

Any insights welcome - thanks!

--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Testing Sandforce SSD

2010-07-28 Thread Greg Spiegelberg
On Wed, Jul 28, 2010 at 9:18 AM, Yeb Havinga yebhavi...@gmail.com wrote:

 Yeb Havinga wrote:

 Due to the LBA remapping of the SSD, I'm not sure of putting files that
 are sequentially written in a different partition (together with e.g.
 tables) would make a difference: in the end the SSD will have a set new
 blocks in it's buffer and somehow arrange them into sets of 128KB of 256KB
 writes for the flash chips. See also http://www.anandtech.com/show/2899/2

 But I ran out of ideas to test, so I'm going to test it anyway.

 Same machine config as mentioned before, with data and xlog on separate
 partitions, ext3 with barrier off (save on this SSD).

 pgbench -c 10 -M prepared -T 3600 -l test
 starting vacuum...end.
 transaction type: TPC-B (sort of)
 scaling factor: 300
 query mode: prepared
 number of clients: 10
 duration: 3600 s
 number of transactions actually processed: 10856359
 tps = 3015.560252 (including connections establishing)
 tps = 3015.575739 (excluding connections establishing)

 This is about 25% faster than data and xlog combined on the same
 filesystem.


The trick may be in kjournald for which there is 1 for each ext3 journalled
file system.  I learned back in Red Hat 4 pre U4 kernels there was a problem
with kjournald that would either cause 30 second hangs or lock up my server
completely when pg_xlog and data were on the same file system plus a few
other right things going on.

Given the multicore world we have today, I think it makes sense that
multiple ext3 file systems, and the kjournald's that service them, is faster
than a single combined file system.


Greg


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-28 Thread Robert Haas
On Wed, Jul 28, 2010 at 4:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 However, if we don't support that, we can't do any sort of pooling-ish
 thing without the ability to pass file descriptors between processes;
 and Tom seems fairly convinced there's no portable way to do that.

 Well, what it would come down to is: are we prepared to not support
 pooling on platforms without such a capability?  It's certainly possible
 to do it on many modern platforms, but I don't believe we can make it
 happen everywhere.  Generally we've tried to avoid having major features
 that don't work everywhere ...

I suppose it depends on the magnitude of the benefit.  And how many
platforms aren't covered.  And how much code is required.  In short,
until someone writes a patch, who knows?  I think the core question we
should be thinking about is what would be the cleanest method of
resetting a backend - either for the same database or for a different
one, whichever seems easier.  And by cleanest, I mean least likely to
introduce bugs.  If we can get to the point where we have something to
play around with, even if it's kind of kludgey or doesn't quite work,
it'll give us some idea of whether further effort is worthwhile and
how it should be directed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] planner index choice

2010-07-28 Thread Tom Lane
Chris dmag...@gmail.com writes:
 The query:

 SELECT
assetid, custom_val
 FROM
sq_ast_attr_val
 WHERE
attrid IN (SELECT attrid FROM sq_ast_attr WHERE name = 
 'is_contextable' AND (type_code = 'metadata_field_select' OR 
 owning_type_code = 'metadata_field'))
AND contextid = 0
 INTERSECT
 SELECT
assetid, custom_val
 FROM
sq_ast_attr_val
 WHERE
assetid = '62321'
AND contextid = 0;

 The explain analyze plan:
 http://explain.depesz.com/s/nWs

Hrm ... are you *certain* that's an 8.4 server?  Because the bit with

Index Cond: (sq_ast_attr_val.attrid = outer.attrid)

is a locution that EXPLAIN hasn't used since 8.1, according to a quick
check.  More recent versions don't say outer.

The actual problem seems to be that choose_bitmap_and() is choosing to
add an indexscan on sq_ast_attr_val_contextid, even though this index
is a lot less selective than the sq_ast_attr_val_attrid scan it had
already picked.  I've seen that behavior before, and there were a series
of patches back in 2006-2007 that seem to have pretty much fixed it.
So that's another reason for suspecting you've got an old server version
there...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance