[PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.
Good day, I have included a link to the result of EXPLAIN ANALYZE. It's this one: https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h Here's a link to Depesz's explain (if links to the site are okay): http://explain.depesz.com/s/gCk http://explain.depesz.com/s/gCk I have just tried setting geqo_threshold, join_collapse_limit and from_collapse_limit to 16, but it yielded no improvement. Changing those three parameters to 32 did speed up the query from about 3.3 seconds to about a second (give or take 50 ms), which is a pretty good improvement, but not quite there, as I'm looking to bring it down to about 300 ms if possible. Changing those three settings to 48 yielded no improvements over 32. Is there possibly something something else to tweak there? Here's EXPLAIN ANALYZE output when the three settings have been set to 32: http://explain.depesz.com/s/cj2 http://explain.depesz.com/s/cj2 Thank you. Peter Slapansky __ Od: Pavel Stehule pavel.steh...@gmail.com Komu: sl...@centrum.sk Dátum: 06.08.2013 21:01 Predmet: Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it. CC: pgsql-performance@postgresql.org Hello please, send result of EXPLAIN ANALYZE please, use a http://explain.depesz.com/ http://explain.depesz.com/ for saving a plan there is a more than 8 joins - so try to set geqo_threshold to 16, join_collapse_limit to 16, and from_collapse_limit to 16. Regards Pavel Stehule 2013/8/2 sl...@centrum.sk: Good day, I have a performance issue when JOINing a view within another view more than once. The query takes over three seconds to execute, which is too long in this case. It's not a problem if the tables are nearly empty, but that isn't the case on the production database. I suspect the planner thinks it's better to first put together the v_address view and JOIN it to the parcel table later on, but the function fx_get_user_tree_subordinates_by_id should be JOINed to the parcel table first, as it reduces the number of rows to less than 200 and any following JOINs would be much faster. I have also ran vacuum, reindex and analyze on the whole database, but it seems to have had to effect. Is there any way to nudge the planner toward that way of execution? This is the query: https://app.box.com/s/jzxiuuxoyj28q4q8rzxr https://app.box.com/s/jzxiuuxoyj28q4q8rzxr This is the query plan: https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h (plain text) https://app.box.com/s/jzxiuuxoyj28q4q8rzxr https://app.box.com/s/jzxiuuxoyj28q4q8rzxr (graphical output) These are the views: https://app.box.com/s/uibzidsazwv3eeauovuk https://app.box.com/s/uibzidsazwv3eeauovuk (paginated view) https://app.box.com/s/v71vyexmdyl97m4f3m6u https://app.box.com/s/v71vyexmdyl97m4f3m6u (used three times in the paginated view). Thank you. Peter Slapansky -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.
From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of sl...@centrum.sk Sent: Wednesday, August 07, 2013 8:43 AM To: Pavel Stehule Cc: pgsql-performance@postgresql.org Subject: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it. Good day, I have included a link to the result of EXPLAIN ANALYZE. It's this one: https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h Here's a link to Depesz's explain (if links to the site are okay): http://explain.depesz.com/s/gCk I have just tried setting geqo_threshold, join_collapse_limit and from_collapse_limit to 16, but it yielded no improvement. Changing those three parameters to 32 did speed up the query from about 3.3 seconds to about a second (give or take 50 ms), which is a pretty good improvement, but not quite there, as I'm looking to bring it down to about 300 ms if possible. Changing those three settings to 48 yielded no improvements over 32. Is there possibly something something else to tweak there? Here's EXPLAIN ANALYZE output when the three settings have been set to 32: http://explain.depesz.com/s/cj2 Thank you. Peter Slapansky - Your last explain analyze (with 3 settings set to 32) shows query duration 10ms, not 1sec. Am I wrong? Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.
You're right, it does... but it's quite odd, because I re-ran the explain-analyze statement and got the same results. Still, the query now runs for about a second as mentioned before, so it's almost like something's missing from the explain, but I'm certain I copied it all. I did this via pgadmin, but that shouldn't matter, should it? Thank you, Peter Slapansky __ Od: Igor Neyman iney...@perceptron.com Komu: sl...@centrum.sk sl...@centrum.sk, Pavel Stehule pavel.steh...@gmail.com Dátum: 07.08.2013 15:47 Predmet: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it. CC: pgsql-performance@postgresql.org Your last explain analyze (with 3 settings set to 32) shows query duration 10ms, not 1sec. Am I wrong? Regards, Igor Neyman __ From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of sl...@centrum.sk Sent: Wednesday, August 07, 2013 8:43 AM To: Pavel Stehule Cc: pgsql-performance@postgresql.org Subject: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it. Good day, I have included a link to the result of EXPLAIN ANALYZE. It's this one: https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h Here's a link to Depesz's explain (if links to the site are okay): http://explain.depesz.com/s/gCk http://explain.depesz.com/s/gCk I have just tried setting geqo_threshold, join_collapse_limit and from_collapse_limit to 16, but it yielded no improvement. Changing those three parameters to 32 did speed up the query from about 3.3 seconds to about a second (give or take 50 ms), which is a pretty good improvement, but not quite there, as I'm looking to bring it down to about 300 ms if possible. Changing those three settings to 48 yielded no improvements over 32. Is there possibly something something else to tweak there? Here's EXPLAIN ANALYZE output when the three settings have been set to 32: http://explain.depesz.com/s/cj2 http://explain.depesz.com/s/cj2 Thank you. Peter Slapansky
Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.
From: sl...@centrum.sk [mailto:sl...@centrum.sk] Sent: Wednesday, August 07, 2013 10:43 AM To: Igor Neyman; Pavel Stehule Cc: pgsql-performance@postgresql.org Subject: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it. You're right, it does... but it's quite odd, because I re-ran the explain-analyze statement and got the same results. Still, the query now runs for about a second as mentioned before, so it's almost like something's missing from the explain, but I'm certain I copied it all. I did this via pgadmin, but that shouldn't matter, should it? Thank you, Peter Slapansky __ _ At very end of explain analyze output there should be a line: Total runtime: What do you get there? Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.
2013/8/7 Igor Neyman iney...@perceptron.com: From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of sl...@centrum.sk Sent: Wednesday, August 07, 2013 8:43 AM To: Pavel Stehule Cc: pgsql-performance@postgresql.org Subject: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it. Good day, I have included a link to the result of EXPLAIN ANALYZE. It's this one: https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h Here's a link to Depesz's explain (if links to the site are okay): http://explain.depesz.com/s/gCk I have just tried setting geqo_threshold, join_collapse_limit and from_collapse_limit to 16, but it yielded no improvement. Changing those three parameters to 32 did speed up the query from about 3.3 seconds to about a second (give or take 50 ms), which is a pretty good improvement, but not quite there, as I'm looking to bring it down to about 300 ms if possible. Changing those three settings to 48 yielded no improvements over 32. Is there possibly something something else to tweak there? Here's EXPLAIN ANALYZE output when the three settings have been set to 32: http://explain.depesz.com/s/cj2 Thank you. Peter Slapansky - Your last explain analyze (with 3 settings set to 32) shows query duration 10ms, not 1sec. Am I wrong? I afraid so 1 sec is planning time :( .. So execution is fast, but planning is expensive and relatively slow .. maybe prepared statements can helps in this case. Regards Pavel Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.
2013/8/7 sl...@centrum.sk: You're right, it does... but it's quite odd, because I re-ran the explain-analyze statement and got the same results. Still, the query now runs for about a second as mentioned before, so it's almost like something's missing from the explain, but I'm certain I copied it all. what is time of EXPLAIN only ? Pavel I did this via pgadmin, but that shouldn't matter, should it? Thank you, Peter Slapansky __ Od: Igor Neyman iney...@perceptron.com Komu: sl...@centrum.sk sl...@centrum.sk, Pavel Stehule pavel.steh...@gmail.com Dátum: 07.08.2013 15:47 Predmet: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it. CC: pgsql-performance@postgresql.org Your last explain analyze (with 3 settings set to 32) shows query duration 10ms, not 1sec. Am I wrong? Regards, Igor Neyman __ From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of sl...@centrum.sk Sent: Wednesday, August 07, 2013 8:43 AM To: Pavel Stehule Cc: pgsql-performance@postgresql.org Subject: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it. Good day, I have included a link to the result of EXPLAIN ANALYZE. It's this one: https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h Here's a link to Depesz's explain (if links to the site are okay): http://explain.depesz.com/s/gCk I have just tried setting geqo_threshold, join_collapse_limit and from_collapse_limit to 16, but it yielded no improvement. Changing those three parameters to 32 did speed up the query from about 3.3 seconds to about a second (give or take 50 ms), which is a pretty good improvement, but not quite there, as I'm looking to bring it down to about 300 ms if possible. Changing those three settings to 48 yielded no improvements over 32. Is there possibly something something else to tweak there? Here's EXPLAIN ANALYZE output when the three settings have been set to 32: http://explain.depesz.com/s/cj2 Thank you. Peter Slapansky -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.
I got: Total runtime: 9.313 ms in pgAdmin Total runtime: 9.363 ms in psql. But timing after the query finished was 912.842 ms in psql. Cheers, Peter Slapansky __ Od: Igor Neyman iney...@perceptron.com Komu: sl...@centrum.sk sl...@centrum.sk, Pavel Stehule pavel.steh...@gmail.com Dátum: 07.08.2013 16:48 Predmet: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it. CC: pgsql-performance@postgresql.org From: sl...@centrum.sk [mailto:sl...@centrum.sk] Sent: Wednesday, August 07, 2013 10:43 AM To: Igor Neyman; Pavel Stehule Cc: pgsql-performance@postgresql.org Subject: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it. You're right, it does... but it's quite odd, because I re-ran the explain-analyze statement and got the same results. Still, the query now runs for about a second as mentioned before, so it's almost like something's missing from the explain, but I'm certain I copied it all. I did this via pgadmin, but that shouldn't matter, should it? Thank you, Peter Slapansky __ _ At very end of explain analyze output there should be a line: Total runtime: What do you get there? Regards, Igor Neyman
[PERFORM] Better performance possible for a pathological query?
Hi, I've seen a couple of bad queries go through one instance and I'm wondering whether there's something simple that can be done to help. Not running the query in the first place is what I am looking to do ultimately but in the meantime, I'm interested in understanding more about the plan below. The query itself is very simple: a primary key lookup on a 1.5x10^7 rows. The issue is that we are looking up over 11,000 primary keys at once, causing the db to consume a lot of CPU. Where I'm not sure I follow, is the discrepancy between the planned and actual rows. Bitmap Heap Scan on dim_context c (cost=6923.33..11762.31 rows=1 width=329) (actual time=17128.121..22031.783 rows=10858 loops=1) Would a sequential scan be more beneficial? The table itself is about 10GB on a 64GB box, 30% of these 10GB are buffered in RAM from what I can tell. Thanks for your help, Alexis Here are the full details. explain (analyze, buffers) SELECT c.key, c.x_id, c.tags, c.source_type_id, x.api_key FROM dim_context c join x on c.x_id = x.id WHERE c.key = ANY (ARRAY[15368196, (11,000 other keys)]) AND ((c.x_id = 1 AND c.tags @ ARRAY[E'blah'])) Here is the plan, abridged Nested Loop (cost=6923.33..11770.59 rows=1 width=362) (actual time=17128.188..22109.283 rows=10858 loops=1) Buffers: shared hit=83494 - Bitmap Heap Scan on dim_context c (cost=6923.33..11762.31 rows=1 width=329) (actual time=17128.121..22031.783 rows=10858 loops=1) Recheck Cond: ((tags @ '{blah}'::text[]) AND (x_id = 1)) Filter: (key = ANY ('{15368196,(a lot more keys here)}'::integer[])) Buffers: shared hit=50919 - BitmapAnd (cost=6923.33..6923.33 rows=269 width=0) (actual time=132.910..132.910 rows=0 loops=1) Buffers: shared hit=1342 - Bitmap Index Scan on dim_context_tags_idx (cost=0.00..1149.61 rows=15891 width=0) (actual time=64.614..64.614 rows=264777 loops=1) Index Cond: (tags @ '{blah}'::text[]) Buffers: shared hit=401 - Bitmap Index Scan on dim_context_x_id_source_type_id_idx (cost=0.00..5773.47 rows=268667 width=0) (actual time=54.648..54.648 rows=267659 loops=1) Index Cond: (x_id = 1) Buffers: shared hit=941 - Index Scan using x_pkey on x (cost=0.00..8.27 rows=1 width=37) (actual time=0.003..0.004 rows=1 loops=10858) Index Cond: (x.id = 1) Buffers: shared hit=32575 Total runtime: 22117.417 ms And here are the stats attname | null_frac | avg_width | n_distinct | correlation +---+---++- key| 0 | 4 | -1 |0.999558 x_id | 0 | 4 | 1498 |0.351316 h_id| 0.05632 | 4 | 116570 |0.653092 tags | 0.0544567 | 284 | 454877 | -0.169626 source_type_id | 0 | 4 | 23 | 0.39552 handle | 0 | 248 | -1 |0.272456 created| 0 | 8 | -0.645231 |0.999559 modified | 0 | 8 | -0.645231 |0.999559
Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.
From: sl...@centrum.sk [mailto:sl...@centrum.sk] Sent: Wednesday, August 07, 2013 11:34 AM To: Igor Neyman; Pavel Stehule Cc: pgsql-performance@postgresql.org Subject: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it. I got: Total runtime: 9.313 ms in pgAdmin Total runtime: 9.363 ms in psql. But timing after the query finished was 912.842 ms in psql. Cheers, Peter Slapansky __ That proves what Pavel suggested regarding planning time. Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.
sl...@centrum.sk writes: Total runtime: 9.313 ms in pgAdmin Total runtime:Â 9.363 ms in psql. But timing after the query finished was 912.842 ms in psql. Well, that's the downside of increasing join_collapse_limit and from_collapse_limit: you might get a better plan, but it takes a lot longer to get it because the planner is considering many more options. If you're sufficiently desperate, you could consider rewriting the query so that its JOIN structure matches the join order that the planner chooses at the high collapse_limit settings. Then you can reduce the limits back down and it'll still find the same plan. This tends to suck from a query readability/maintainability standpoint though :-(. The prepared-query approach might offer a solution too, if the good plan isn't dependent on specific parameter values. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Better performance possible for a pathological query?
=?UTF-8?B?QWxleGlzIEzDqi1RdcO0Yw==?= a...@datadoghq.com writes: The query itself is very simple: a primary key lookup on a 1.5x10^7 rows. The issue is that we are looking up over 11,000 primary keys at once, causing the db to consume a lot of CPU. It looks like most of the runtime is probably going into checking the c.key = ANY (ARRAY[...]) construct. PG isn't especially smart about that if it fails to optimize the construct into an index operation --- I think it's just searching the array linearly for each row meeting the other restrictions on c. You could try writing the test like this: c.key = ANY (VALUES (1), (17), (42), ...) to see if the sub-select code path gives better results than the array code path. In a quick check it looked like this might produce a hash join, which seemed promising anyway. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Better performance possible for a pathological query?
On Wed, Aug 7, 2013 at 12:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: =?UTF-8?B?QWxleGlzIEzDqi1RdcO0Yw==?= a...@datadoghq.com writes: The query itself is very simple: a primary key lookup on a 1.5x10^7 rows. The issue is that we are looking up over 11,000 primary keys at once, causing the db to consume a lot of CPU. It looks like most of the runtime is probably going into checking the c.key = ANY (ARRAY[...]) construct. PG isn't especially smart about that if it fails to optimize the construct into an index operation --- I think it's just searching the array linearly for each row meeting the other restrictions on c. You could try writing the test like this: c.key = ANY (VALUES (1), (17), (42), ...) to see if the sub-select code path gives better results than the array code path. In a quick check it looked like this might produce a hash join, which seemed promising anyway. regards, tom lane Thank you very much Tom, your suggestion is spot on. Runtime decreased 100-fold, from 20s to 200ms with a simple search-and-replace. Here's the updated plan for the record. Nested Loop (cost=168.22..2116.29 rows=148 width=362) (actual time=22.134..256.531 rows=10858 loops=1) Buffers: shared hit=44967 - Index Scan using x_pkey on x (cost=0.00..8.27 rows=1 width=37) (actual time=0.071..0.073 rows=1 loops=1) Index Cond: (id = 1) Buffers: shared hit=4 - Nested Loop (cost=168.22..2106.54 rows=148 width=329) (actual time=22.060..242.406 rows=10858 loops=1) Buffers: shared hit=44963 - HashAggregate (cost=168.22..170.22 rows=200 width=4) (actual time=21.529..32.820 rows=11215 loops=1) - Values Scan on *VALUES* (cost=0.00..140.19 rows=11215 width=4) (actual time=0.005..9.527 rows=11215 loops=1) - Index Scan using dim_context_pkey on dim_context c (cost=0.00..9.67 rows=1 width=329) (actual time=0.015..0.016 rows=1 loops=11215) Index Cond: (c.key = *VALUES*.column1) Filter: ((c.tags @ '{blah}'::text[]) AND (c.org_id = 1)) Buffers: shared hit=44963 Total runtime: 263.639 ms
[PERFORM] Efficiently query for the most recent record for a given user
Let's say I have a table something like this: create table call_activity ( id int8 not null, called timestamp, user_id int8 not null, primary key (id) foreign key (user_id) references my_users ) I want to get the last call_activity record for a single user. Is there ANY way to efficiently retrieve the last record for a specified user_id, or do I need to de-normalize and update a table with a single row for each user each time a new call_activity record is inserted? I know I how to do the query without the summary table (subquery or GROUP BY with MAX) but that seems like it will never perform well for large data sets. Or am I full of beans and it should perform just fine for a huge data set as long as I have an index on called? Thanks in advance!
Re: [PERFORM] Efficiently query for the most recent record for a given user
On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco robert.difa...@gmail.com wrote: Let's say I have a table something like this: create table call_activity ( id int8 not null, called timestamp, user_id int8 not null, primary key (id) foreign key (user_id) references my_users ) I want to get the last call_activity record for a single user. Is there ANY way to efficiently retrieve the last record for a specified user_id, or do I need to de-normalize and update a table with a single row for each user each time a new call_activity record is inserted? I know I how to do the query without the summary table (subquery or GROUP BY with MAX) but that seems like it will never perform well for large data sets. Or am I full of beans and it should perform just fine for a huge data set as long as I have an index on called? Create an index over (user_id, called desc), and do select * from call_activity where user_id = blarg order by called desc limit 1 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Efficiently query for the most recent record for a given user
Claudio Freire klaussfre...@gmail.com writes: On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco robert.difa...@gmail.com wrote: I want to get the last call_activity record for a single user. Create an index over (user_id, called desc), and do select * from call_activity where user_id = blarg order by called desc limit 1 Note that there's no particular need to specify desc in the index definition. This same index can support searches in either direction on the called column. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Efficiently query for the most recent record for a given user
-Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql- performance-ow...@postgresql.org] On Behalf Of Claudio Freire Sent: Wednesday, August 07, 2013 2:20 PM To: Robert DiFalco Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Efficiently query for the most recent record for a given user On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco robert.difa...@gmail.com wrote: Let's say I have a table something like this: create table call_activity ( id int8 not null, called timestamp, user_id int8 not null, primary key (id) foreign key (user_id) references my_users ) I want to get the last call_activity record for a single user. Is there ANY way to efficiently retrieve the last record for a specified user_id, or do I need to de-normalize and update a table with a single row for each user each time a new call_activity record is inserted? I know I how to do the query without the summary table (subquery or GROUP BY with MAX) but that seems like it will never perform well for large data sets. Or am I full of beans and it should perform just fine for a huge data set as long as I have an index on called? Create an index over (user_id, called desc), and do select * from call_activity where user_id = blarg order by called desc limit 1 And most recent call for every user: SELECT id, user_id, MAX(called) OVER (PARTITION BY user_id) FROM call_activity; Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Efficiently query for the most recent record for a given user
Thanks guys! On Wed, Aug 7, 2013 at 11:35 AM, Igor Neyman iney...@perceptron.com wrote: -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql- performance-ow...@postgresql.org] On Behalf Of Claudio Freire Sent: Wednesday, August 07, 2013 2:20 PM To: Robert DiFalco Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Efficiently query for the most recent record for a given user On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco robert.difa...@gmail.com wrote: Let's say I have a table something like this: create table call_activity ( id int8 not null, called timestamp, user_id int8 not null, primary key (id) foreign key (user_id) references my_users ) I want to get the last call_activity record for a single user. Is there ANY way to efficiently retrieve the last record for a specified user_id, or do I need to de-normalize and update a table with a single row for each user each time a new call_activity record is inserted? I know I how to do the query without the summary table (subquery or GROUP BY with MAX) but that seems like it will never perform well for large data sets. Or am I full of beans and it should perform just fine for a huge data set as long as I have an index on called? Create an index over (user_id, called desc), and do select * from call_activity where user_id = blarg order by called desc limit 1 And most recent call for every user: SELECT id, user_id, MAX(called) OVER (PARTITION BY user_id) FROM call_activity; Regards, Igor Neyman
Re: [PERFORM] Efficiently query for the most recent record for a given user
On Wed, Aug 7, 2013 at 3:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: Claudio Freire klaussfre...@gmail.com writes: On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco robert.difa...@gmail.com wrote: I want to get the last call_activity record for a single user. Create an index over (user_id, called desc), and do select * from call_activity where user_id = blarg order by called desc limit 1 Note that there's no particular need to specify desc in the index definition. This same index can support searches in either direction on the called column. Yeah, but it's faster if it's in the same direction, because the kernel read-ahead code detects sequential reads, whereas it doesn't when it goes backwards. The difference can be up to a factor of 10 for long index scans. Though... true... for a limit 1... it wouldn't matter that much. But it's become habit to match index sort order by now. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Efficiently query for the most recent record for a given user
Claudio Freire klaussfre...@gmail.com writes: On Wed, Aug 7, 2013 at 3:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: Note that there's no particular need to specify desc in the index definition. This same index can support searches in either direction on the called column. Yeah, but it's faster if it's in the same direction, because the kernel read-ahead code detects sequential reads, whereas it doesn't when it goes backwards. The difference can be up to a factor of 10 for long index scans. Color me skeptical. Index searches are seldom purely sequential block accesses. Maybe if you had a freshly built index that'd never yet suffered any inserts/updates, but in practice any advantage would disappear very quickly after a few index page splits. Though... true... for a limit 1... it wouldn't matter that much. That's the other point. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Efficiently query for the most recent record for a given user
Claudio Freire escribió: On Wed, Aug 7, 2013 at 3:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: Note that there's no particular need to specify desc in the index definition. This same index can support searches in either direction on the called column. Yeah, but it's faster if it's in the same direction, because the kernel read-ahead code detects sequential reads, whereas it doesn't when it goes backwards. The difference can be up to a factor of 10 for long index scans. That might be true when an index is new, but as it grows, the leaf pages are not going to be sequential anymore. And this doesn't much apply for an equality lookup anyway, does it? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Efficiently query for the most recent record for a given user
On Wed, Aug 7, 2013 at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yeah, but it's faster if it's in the same direction, because the kernel read-ahead code detects sequential reads, whereas it doesn't when it goes backwards. The difference can be up to a factor of 10 for long index scans. Color me skeptical. Index searches are seldom purely sequential block accesses. Maybe if you had a freshly built index that'd never yet suffered any inserts/updates, but in practice any advantage would disappear very quickly after a few index page splits. Maybe. I've tested on pgbench test databases, which I'm not sure whether they're freshly built indexes or incrementally built ones, and it applies there (in fact backward index-only scans was one of the workloads the read-ahead patch improved the most). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance