[PERFORM] 158x query improvement when removing 2 (noop) WHERE conditions
Hi, On a hunch I removed two (legacy) WHERE conditions from the following query I obtained a 158x speed improvement. Yet these condiditions do not filter anything. Does that make any sense? The EXPLAIN ANALYSE output is attached with, first the fast version and then the slow one. I'd like to understand what is at play here to explain such a dramatic difference. This is with pg 8.4.4. Thanks, select p3.price as first_price, p4.price as second_price, p5.price as third_price, t.id_cabin_category, t.id_currency, t.id_alert_cruise, t.id_cruise, t.created_by, t.cabin_name, t.cabin_cat_code, t.cabin_type_name, cr.login, cr.email, fx.currency_symbol, fx.currency_code, c.saildate, ct.id_cruise_type, ct.package_name, s.id_ship, s.ship_name from (select first_value(max(p.id_price)) over w as first_id_price, nth_value(max(p.id_price),2) over w as second_id_price, p.id_cabin_category, p.id_currency, p.created_on ac.modified_on as is_new_price, ac.id_alert_cruise, ac.id_cruise, ac.cabin_name, ac.created_by, ac.cabin_cat_code, ac.cabin_type_name from alert_to_category ac join price p on (ac.id_cabin_category=p.id_cabin_category and p.id_cruise=ac.id_cruise and (p.id_currency=ac.id_currency or ac.id_currency is null)) -- XXX: removing these speeds up query by 158x ! -- where (ac.created_by=0 or nullif(0, 0) is null) -- and (p.id_cruise=0 or nullif(0, 0) is null) group by ac.id_cruise,ac.created_by,ac.id_alert_cruise,ac.cabin_name, ac.cabin_cat_code, ac.cabin_type_name, p.id_cabin_category,p.id_currency,p.id_cruise, p.created_on ac.modified_on window w as (partition by p.id_currency,p.id_cabin_category,p.id_cruise order by p.created_on ac.modified_on desc rows between unbounded preceding and unbounded following) order by p.id_cabin_category,p.id_currency) as t join cruiser cr on (t.created_by=cr.id_cruiser) join cruise c using (id_cruise) join cruise_type ct using (id_cruise_type) join ship s using (id_ship) join currency fx using (id_currency) join price p3 on (t.first_id_price=p3.id_price) left join price p4 on (t.second_id_price=p4.id_price) left join price p5 on (p5.id_price=(select id_price from price where id_cruise=p3.id_cruise and id_cabin_category=p3.id_cabin_category and id_currency=p3.id_currency and id_price t.second_id_price order by id_price desc limit 1)) where t.is_new_price is true and p3.price p4.price; -- http://www.cruisefish.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 158x query improvement when removing 2 (noop) WHERE conditions
On Wednesday 28 July 2010 12:27:44 Louis-David Mitterrand wrote: The EXPLAIN ANALYSE output is attached with, first the fast version and then the slow one. I think you forgot to attach it. Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
28.07.10 04:56, Tom Lane написав(ла): I'm not asserting it's true, just suggesting it's entirely possible. Other than the fork() cost itself and whatever authentication activity there might be, practically all the startup cost of a new backend can be seen as cache-populating activities. You'd have to redo all of that, *plus* pay the costs of getting rid of the previous cache entries. Maybe the latter costs less than a fork(), or maybe not. fork() is pretty cheap on modern Unixen. Actually as for me, the problem is that one can't raise number of database connections high without overloading CPU/memory/disk, so external pooling is needed. If postgresql had something like max_active_queries setting that limit number of connections that are not in IDLE [in transaction] state, one could raise max connections high (and I don't think idle process itself has much overhead) and limit max_active_queries to get maximum performance and won't use external pooling. Of course this won't help if the numbers are really high, but could work out the most common cases. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Big difference in time returned by EXPLAIN ANALYZE SELECT ... AND SELECT ...
27.07.10 02:03, Lew написав(ла): Piotr Gasidło wrote: EXPLAIN ANALYZE SELECT ... Total runtime: 4.782 ms Time: 25,970 ms Strangely, the runtime is shown with a period for the separator, though. One value is calculated on server by EXPLAIN ANALYZE command, another is calculated by psql itself. Best regards, Vitalii Tymchyshyn -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On Mon, Jul 26, 2010 at 01:47:14PM -0600, Scott Marlowe wrote: Note that SSDs aren't usually real fast at large sequential writes though, so it might be worth putting pg_xlog on a spinning pair in a mirror and seeing how much, if any, the SSD drive speeds up when not having to do pg_xlog. xlog is also where I use ext2; it does bench faster for me in that config, and the fsck issues don't really exist because you're not in a situation with a lot of files being created/removed. Mike Stone -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On Mon, Jul 26, 2010 at 03:23:20PM -0600, Greg Spiegelberg wrote: I know I'm talking development now but is there a case for a pg_xlog block device to remove the file system overhead and guaranteeing your data is written sequentially every time? If you dedicate a partition to xlog, you already get that in practice with no extra devlopment. Mike Stone -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Michael Stone wrote: On Mon, Jul 26, 2010 at 03:23:20PM -0600, Greg Spiegelberg wrote: I know I'm talking development now but is there a case for a pg_xlog block device to remove the file system overhead and guaranteeing your data is written sequentially every time? If you dedicate a partition to xlog, you already get that in practice with no extra devlopment. Due to the LBA remapping of the SSD, I'm not sure of putting files that are sequentially written in a different partition (together with e.g. tables) would make a difference: in the end the SSD will have a set new blocks in it's buffer and somehow arrange them into sets of 128KB of 256KB writes for the flash chips. See also http://www.anandtech.com/show/2899/2 But I ran out of ideas to test, so I'm going to test it anyway. regards, Yeb Havinga -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
Yeb Havinga wrote: Michael Stone wrote: On Mon, Jul 26, 2010 at 03:23:20PM -0600, Greg Spiegelberg wrote: I know I'm talking development now but is there a case for a pg_xlog block device to remove the file system overhead and guaranteeing your data is written sequentially every time? If you dedicate a partition to xlog, you already get that in practice with no extra devlopment. Due to the LBA remapping of the SSD, I'm not sure of putting files that are sequentially written in a different partition (together with e.g. tables) would make a difference: in the end the SSD will have a set new blocks in it's buffer and somehow arrange them into sets of 128KB of 256KB writes for the flash chips. See also http://www.anandtech.com/show/2899/2 But I ran out of ideas to test, so I'm going to test it anyway. Same machine config as mentioned before, with data and xlog on separate partitions, ext3 with barrier off (save on this SSD). pgbench -c 10 -M prepared -T 3600 -l test starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 300 query mode: prepared number of clients: 10 duration: 3600 s number of transactions actually processed: 10856359 tps = 3015.560252 (including connections establishing) tps = 3015.575739 (excluding connections establishing) This is about 25% faster than data and xlog combined on the same filesystem. Below is output from iostat -xk 1 -p /dev/sda, which shows each second per partition statistics. sda2 is data, sda3 is xlog In the third second a checkpoint seems to start. avg-cpu: %user %nice %system %iowait %steal %idle 63.500.00 30.502.500.003.50 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6518.00 36.00 2211.00 148.00 35524.00 31.75 0.280.12 0.11 25.00 sda1 0.00 2.000.005.00 0.00 636.00 254.40 0.036.00 2.00 1.00 sda2 0.00 218.00 36.00 40.00 148.00 1032.00 31.05 0.000.00 0.00 0.00 sda3 0.00 6298.000.00 2166.00 0.00 33856.00 31.26 0.250.12 0.12 25.00 avg-cpu: %user %nice %system %iowait %steal %idle 60.500.00 37.500.500.001.50 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6514.00 33.00 2283.00 140.00 35188.00 30.51 0.320.14 0.13 29.00 sda1 0.00 0.000.003.00 0.0012.00 8.00 0.000.00 0.00 0.00 sda2 0.00 0.00 33.002.00 140.00 8.00 8.46 0.030.86 0.29 1.00 sda3 0.00 6514.000.00 2278.00 0.00 35168.00 30.88 0.290.13 0.13 29.00 avg-cpu: %user %nice %system %iowait %steal %idle 33.000.00 34.00 18.000.00 15.00 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 3782.007.00 7235.0028.00 44068.00 12.1869.529.46 0.09 62.00 sda1 0.00 0.000.001.00 0.00 4.00 8.00 0.000.00 0.00 0.00 sda2 0.00 322.007.00 6018.0028.00 25360.00 8.4369.22 11.33 0.08 47.00 sda3 0.00 3460.000.00 1222.00 0.00 18728.00 30.65 0.300.25 0.25 30.00 avg-cpu: %user %nice %system %iowait %steal %idle 9.000.00 36.00 22.500.00 32.50 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 1079.003.00 0.0012.00 49060.00 8.83 120.64 10.95 0.08 86.00 sda1 0.00 2.000.002.00 0.00 320.00 320.00 0.12 60.00 35.00 7.00 sda2 0.0030.003.00 10739.0012.00 43076.00 8.02 120.49 11.30 0.08 83.00 sda3 0.00 1047.000.00 363.00 0.00 5640.00 31.07 0.030.08 0.08 3.00 avg-cpu: %user %nice %system %iowait %steal %idle 62.000.00 31.002.000.005.00 Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6267.00 51.00 2493.00 208.00 35040.00 27.71 1.800.71 0.12 31.00 sda1 0.00 0.000.003.00 0.0012.00 8.00 0.000.00 0.00 0.00 sda2 0.00 123.00 51.00 344.00 208.00 1868.00 10.51 1.503.80 0.10 4.00 sda3 0.00 6144.000.00 2146.00 0.00 33160.00 30.90 0.300.14 0.14 30.00 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On 7/27/10 6:56 PM, Tom Lane wrote: Yeah, if it weren't for that I'd say sure let's try it. But I'm afraid we'd be introducing significant headaches in return for a gain that's quite speculative. Well, the *gain* isn't speculative. For example, I am once again dealing with the issue that PG backend processes on Solaris never give up their RAM, resulting in pathological swapping situations if you have many idle connections. This requires me to install pgpool, which is overkill (since it has load balancing, replication, and more) just to make sure that connections get recycled so that I don't have 300 idle connections eating up 8GB of RAM. Relative to switching databases, I'd tend to say that, like pgbouncer and pgpool, we don't need to support that. Each user/database combo can have their own pool. While not ideal, this would be good enough for 90% of users. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Wed, Jul 28, 2010 at 3:44 PM, Josh Berkus j...@agliodbs.com wrote: On 7/27/10 6:56 PM, Tom Lane wrote: Yeah, if it weren't for that I'd say sure let's try it. But I'm afraid we'd be introducing significant headaches in return for a gain that's quite speculative. Well, the *gain* isn't speculative. For example, I am once again dealing with the issue that PG backend processes on Solaris never give up their RAM, resulting in pathological swapping situations if you have many idle connections. This requires me to install pgpool, which is overkill (since it has load balancing, replication, and more) just to make sure that connections get recycled so that I don't have 300 idle connections eating up 8GB of RAM. Relative to switching databases, I'd tend to say that, like pgbouncer and pgpool, we don't need to support that. Each user/database combo can have their own pool. While not ideal, this would be good enough for 90% of users. However, if we don't support that, we can't do any sort of pooling-ish thing without the ability to pass file descriptors between processes; and Tom seems fairly convinced there's no portable way to do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
Robert Haas robertmh...@gmail.com writes: However, if we don't support that, we can't do any sort of pooling-ish thing without the ability to pass file descriptors between processes; and Tom seems fairly convinced there's no portable way to do that. Well, what it would come down to is: are we prepared to not support pooling on platforms without such a capability? It's certainly possible to do it on many modern platforms, but I don't believe we can make it happen everywhere. Generally we've tried to avoid having major features that don't work everywhere ... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Wed, Jul 28, 2010 at 04:10:08PM -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: However, if we don't support that, we can't do any sort of pooling-ish thing without the ability to pass file descriptors between processes; and Tom seems fairly convinced there's no portable way to do that. Well, what it would come down to is: are we prepared to not support pooling on platforms without such a capability? It's certainly possible to do it on many modern platforms, but I don't believe we can make it happen everywhere. Generally we've tried to avoid having major features that don't work everywhere ... Which platforms do you have in mind here? All of the platforms I found documented to be supported seem to support at least one of SCM_RIGHTS, WSADuplicateSocket or STREAMS/FD_INSERT. Most if not all beside windows support SCM_RIGHTS. The ones I am dubious about support FD_INSERT... Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] planner index choice
Hi there, I have a simple query where I don't understand the planner's choice to use a particular index. The main table looks like this: # \d sq_ast_attr_val Table public.sq_ast_attr_val Column| Type | Modifiers -+---+-- assetid | character varying(15) | not null attrid | integer | not null contextid | integer | not null default 0 custom_val | text | use_default | character(1) | not null default '1'::bpchar Indexes: ast_attr_val_pk PRIMARY KEY, btree (assetid, attrid, contextid) sq_ast_attr_val_assetid btree (assetid) sq_ast_attr_val_attrid btree (attrid) sq_ast_attr_val_concat btree (((assetid::text || '~'::text) || attrid)) sq_ast_attr_val_contextid btree (contextid) The query: SELECT assetid, custom_val FROM sq_ast_attr_val WHERE attrid IN (SELECT attrid FROM sq_ast_attr WHERE name = 'is_contextable' AND (type_code = 'metadata_field_select' OR owning_type_code = 'metadata_field')) AND contextid = 0 INTERSECT SELECT assetid, custom_val FROM sq_ast_attr_val WHERE assetid = '62321' AND contextid = 0; The explain analyze plan: http://explain.depesz.com/s/nWs I'm not sure why it's picking the sq_ast_attr_val_contextid index to do the contextid = 0 check, the other parts (attrid/assetid) are much more selective. If I drop that particular index: http://explain.depesz.com/s/zp All (I hope) relevant postgres info: Centos 5.5 x86_64 running pg8.4.4. Server has 8gig memory. # select name, setting, source from pg_settings where name in ('shared_buffers', 'effective_cache_size', 'work_mem'); name | setting --+ shared_buffers| 262144 effective_cache_size | 655360 work_mem | 32768 All planner options are enabled: # select name, setting, source from pg_settings where name like 'enable_%'; name| setting | source ---+-+- enable_bitmapscan | on | default enable_hashagg| on | default enable_hashjoin | on | default enable_indexscan | on | default enable_mergejoin | on | default enable_nestloop | on | default enable_seqscan| on | default enable_sort | on | default enable_tidscan| on | default Any insights welcome - thanks! -- Postgresql php tutorials http://www.designmagick.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
On Wed, Jul 28, 2010 at 9:18 AM, Yeb Havinga yebhavi...@gmail.com wrote: Yeb Havinga wrote: Due to the LBA remapping of the SSD, I'm not sure of putting files that are sequentially written in a different partition (together with e.g. tables) would make a difference: in the end the SSD will have a set new blocks in it's buffer and somehow arrange them into sets of 128KB of 256KB writes for the flash chips. See also http://www.anandtech.com/show/2899/2 But I ran out of ideas to test, so I'm going to test it anyway. Same machine config as mentioned before, with data and xlog on separate partitions, ext3 with barrier off (save on this SSD). pgbench -c 10 -M prepared -T 3600 -l test starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 300 query mode: prepared number of clients: 10 duration: 3600 s number of transactions actually processed: 10856359 tps = 3015.560252 (including connections establishing) tps = 3015.575739 (excluding connections establishing) This is about 25% faster than data and xlog combined on the same filesystem. The trick may be in kjournald for which there is 1 for each ext3 journalled file system. I learned back in Red Hat 4 pre U4 kernels there was a problem with kjournald that would either cause 30 second hangs or lock up my server completely when pg_xlog and data were on the same file system plus a few other right things going on. Given the multicore world we have today, I think it makes sense that multiple ext3 file systems, and the kjournald's that service them, is faster than a single combined file system. Greg
Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.
On Wed, Jul 28, 2010 at 4:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: However, if we don't support that, we can't do any sort of pooling-ish thing without the ability to pass file descriptors between processes; and Tom seems fairly convinced there's no portable way to do that. Well, what it would come down to is: are we prepared to not support pooling on platforms without such a capability? It's certainly possible to do it on many modern platforms, but I don't believe we can make it happen everywhere. Generally we've tried to avoid having major features that don't work everywhere ... I suppose it depends on the magnitude of the benefit. And how many platforms aren't covered. And how much code is required. In short, until someone writes a patch, who knows? I think the core question we should be thinking about is what would be the cleanest method of resetting a backend - either for the same database or for a different one, whichever seems easier. And by cleanest, I mean least likely to introduce bugs. If we can get to the point where we have something to play around with, even if it's kind of kludgey or doesn't quite work, it'll give us some idea of whether further effort is worthwhile and how it should be directed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] planner index choice
Chris dmag...@gmail.com writes: The query: SELECT assetid, custom_val FROM sq_ast_attr_val WHERE attrid IN (SELECT attrid FROM sq_ast_attr WHERE name = 'is_contextable' AND (type_code = 'metadata_field_select' OR owning_type_code = 'metadata_field')) AND contextid = 0 INTERSECT SELECT assetid, custom_val FROM sq_ast_attr_val WHERE assetid = '62321' AND contextid = 0; The explain analyze plan: http://explain.depesz.com/s/nWs Hrm ... are you *certain* that's an 8.4 server? Because the bit with Index Cond: (sq_ast_attr_val.attrid = outer.attrid) is a locution that EXPLAIN hasn't used since 8.1, according to a quick check. More recent versions don't say outer. The actual problem seems to be that choose_bitmap_and() is choosing to add an indexscan on sq_ast_attr_val_contextid, even though this index is a lot less selective than the sq_ast_attr_val_attrid scan it had already picked. I've seen that behavior before, and there were a series of patches back in 2006-2007 that seem to have pretty much fixed it. So that's another reason for suspecting you've got an old server version there... regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance