[PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread slapo

Good day,
 
I have included a link to the result of EXPLAIN ANALYZE. It's this one:
https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h 
https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h
 
Here's a link to Depesz's explain (if links to the site are okay):
http://explain.depesz.com/s/gCk http://explain.depesz.com/s/gCk
 
I have just tried setting geqo_threshold, join_collapse_limit and 
from_collapse_limit to 16, but it yielded no improvement.
Changing those three parameters to 32 did speed up the query from about 3.3 
seconds to about a second (give or take 50 ms), which is a pretty good 
improvement, but not quite there, as I'm looking to bring it down to about 300 
ms if possible. Changing those three settings to 48 yielded no improvements 
over 32.
Is there possibly something something else to tweak there?
 
Here's EXPLAIN ANALYZE output when the three settings have been set to 32:
http://explain.depesz.com/s/cj2 http://explain.depesz.com/s/cj2
 
Thank you.
 
Peter Slapansky
 
__

Od: Pavel Stehule pavel.steh...@gmail.com
Komu: sl...@centrum.sk
Dátum: 06.08.2013 21:01
Predmet: Re: [PERFORM] Sub-optimal plan for a paginated query on a view with 
another view inside of it.

CC: pgsql-performance@postgresql.org

Hello

please, send result of EXPLAIN ANALYZE

please, use a http://explain.depesz.com/ http://explain.depesz.com/ for 
saving a plan

there is a more than 8 joins - so try to set geqo_threshold to 16,
join_collapse_limit to 16, and from_collapse_limit to 16.

Regards

Pavel Stehule

2013/8/2  sl...@centrum.sk:

Good day,

I have a performance issue when JOINing a view within another view more than 
once.
The query takes over three seconds to execute, which is too long in this case. 
It's not a problem if the tables are nearly empty, but that isn't the case on 
the production database.

I suspect the planner thinks it's better to first put together the v_address view and 
JOIN it to the parcel table later on, but the function 
fx_get_user_tree_subordinates_by_id should be JOINed to the parcel table 
first, as it reduces the number of rows to less than 200 and any following JOINs would be 
much faster.

I have also ran vacuum, reindex and analyze on the whole database, but it seems 
to have had to effect.

Is there any way to nudge the planner toward that way of execution?

This is the query:
https://app.box.com/s/jzxiuuxoyj28q4q8rzxr 
https://app.box.com/s/jzxiuuxoyj28q4q8rzxr

This is the query plan:
https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h 
https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h (plain text)
https://app.box.com/s/jzxiuuxoyj28q4q8rzxr 
https://app.box.com/s/jzxiuuxoyj28q4q8rzxr (graphical output)

These are the views:
https://app.box.com/s/uibzidsazwv3eeauovuk 
https://app.box.com/s/uibzidsazwv3eeauovuk (paginated view)
https://app.box.com/s/v71vyexmdyl97m4f3m6u 
https://app.box.com/s/v71vyexmdyl97m4f3m6u (used three times in the paginated 
view).


Thank you.

Peter Slapansky


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance 
http://www.postgresql.org/mailpref/pgsql-performance




Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread Igor Neyman


From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of sl...@centrum.sk
Sent: Wednesday, August 07, 2013 8:43 AM
To: Pavel Stehule
Cc: pgsql-performance@postgresql.org
Subject: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a 
view with another view inside of it.

Good day,
 
I have included a link to the result of EXPLAIN ANALYZE. It's this one:
https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h
 
Here's a link to Depesz's explain (if links to the site are okay):
http://explain.depesz.com/s/gCk
 
I have just tried setting geqo_threshold, join_collapse_limit and 
from_collapse_limit to 16, but it yielded no improvement.
Changing those three parameters to 32 did speed up the query from about 3.3 
seconds to about a second (give or take 50 ms), which is a pretty good 
improvement, but not quite there, as I'm looking to bring it down to about 300 
ms if possible. Changing those three settings to 48 yielded no improvements 
over 32.
Is there possibly something something else to tweak there?
 
Here's EXPLAIN ANALYZE output when the three settings have been set to 32:
http://explain.depesz.com/s/cj2
 
Thank you.
 
Peter Slapansky

-

Your last explain analyze (with 3 settings set to 32)  shows query duration 
10ms, not 1sec.
Am I wrong? 

Regards,
Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread slapo

You're right, it does... but it's quite odd, because I re-ran the 
explain-analyze statement and got the same results.
Still, the query now runs for about a second as mentioned before, so it's 
almost like something's missing from the explain, but I'm certain I copied it 
all.
 
I did this via pgadmin, but that shouldn't matter, should it?
 
Thank you,
 
Peter Slapansky
__

Od: Igor Neyman iney...@perceptron.com
Komu: sl...@centrum.sk sl...@centrum.sk, Pavel Stehule 
pavel.steh...@gmail.com
Dátum: 07.08.2013 15:47
Predmet: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on 
a view with another view inside of it.

CC: pgsql-performance@postgresql.org

Your last explain analyze (with 3 settings set to 32)  shows query duration 
10ms, not 1sec.
Am I wrong? 

Regards,
Igor Neyman
 
__
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of sl...@centrum.sk
Sent: Wednesday, August 07, 2013 8:43 AM
To: Pavel Stehule
Cc: pgsql-performance@postgresql.org
Subject: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a 
view with another view inside of it.

Good day,
 
I have included a link to the result of EXPLAIN ANALYZE. It's this one:
https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h 
https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h
 
Here's a link to Depesz's explain (if links to the site are okay):
http://explain.depesz.com/s/gCk http://explain.depesz.com/s/gCk
 
I have just tried setting geqo_threshold, join_collapse_limit and 
from_collapse_limit to 16, but it yielded no improvement.
Changing those three parameters to 32 did speed up the query from about 3.3 
seconds to about a second (give or take 50 ms), which is a pretty good 
improvement, but not quite there, as I'm looking to bring it down to about 300 
ms if possible. Changing those three settings to 48 yielded no improvements 
over 32.
Is there possibly something something else to tweak there?
 
Here's EXPLAIN ANALYZE output when the three settings have been set to 32:
http://explain.depesz.com/s/cj2 http://explain.depesz.com/s/cj2
 
Thank you.
 
Peter Slapansky





Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread Igor Neyman


From: sl...@centrum.sk [mailto:sl...@centrum.sk] 
Sent: Wednesday, August 07, 2013 10:43 AM
To: Igor Neyman; Pavel Stehule
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on 
a view with another view inside of it.

You're right, it does... but it's quite odd, because I re-ran the 
explain-analyze statement and got the same results.
Still, the query now runs for about a second as mentioned before, so it's 
almost like something's missing from the explain, but I'm certain I copied it 
all.
 
I did this via pgadmin, but that shouldn't matter, should it?
 
Thank you,
 
Peter Slapansky
__
_

At very end of explain analyze output there should be a line:

Total runtime: 

What do you get there?

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread Pavel Stehule
2013/8/7 Igor Neyman iney...@perceptron.com:


 From: pgsql-performance-ow...@postgresql.org 
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of sl...@centrum.sk
 Sent: Wednesday, August 07, 2013 8:43 AM
 To: Pavel Stehule
 Cc: pgsql-performance@postgresql.org
 Subject: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a 
 view with another view inside of it.

 Good day,

 I have included a link to the result of EXPLAIN ANALYZE. It's this one:
 https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h

 Here's a link to Depesz's explain (if links to the site are okay):
 http://explain.depesz.com/s/gCk

 I have just tried setting geqo_threshold, join_collapse_limit and 
 from_collapse_limit to 16, but it yielded no improvement.
 Changing those three parameters to 32 did speed up the query from about 3.3 
 seconds to about a second (give or take 50 ms), which is a pretty good 
 improvement, but not quite there, as I'm looking to bring it down to about 
 300 ms if possible. Changing those three settings to 48 yielded no 
 improvements over 32.
 Is there possibly something something else to tweak there?

 Here's EXPLAIN ANALYZE output when the three settings have been set to 32:
 http://explain.depesz.com/s/cj2

 Thank you.

 Peter Slapansky

 -

 Your last explain analyze (with 3 settings set to 32)  shows query duration 
 10ms, not 1sec.
 Am I wrong?

I afraid so 1 sec is planning time :( .. So execution is fast, but
planning is expensive and relatively slow .. maybe prepared statements
can helps in this case.

Regards

Pavel


 Regards,
 Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread Pavel Stehule
2013/8/7  sl...@centrum.sk:
 You're right, it does... but it's quite odd, because I re-ran the
 explain-analyze statement and got the same results.

 Still, the query now runs for about a second as mentioned before, so it's
 almost like something's missing from the explain, but I'm certain I copied
 it all.

what is time of EXPLAIN only ?

Pavel




 I did this via pgadmin, but that shouldn't matter, should it?



 Thank you,



 Peter Slapansky

 __
 Od: Igor Neyman iney...@perceptron.com
 Komu: sl...@centrum.sk sl...@centrum.sk, Pavel Stehule
 pavel.steh...@gmail.com
 Dátum: 07.08.2013 15:47
 Predmet: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated
 query on a view with another view inside of it.


 CC: pgsql-performance@postgresql.org

 Your last explain analyze (with 3 settings set to 32)  shows query duration
 10ms, not 1sec.
 Am I wrong?

 Regards,
 Igor Neyman



 __


 From: pgsql-performance-ow...@postgresql.org
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of
 sl...@centrum.sk
 Sent: Wednesday, August 07, 2013 8:43 AM
 To: Pavel Stehule
 Cc: pgsql-performance@postgresql.org
 Subject: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a
 view with another view inside of it.

 Good day,

 I have included a link to the result of EXPLAIN ANALYZE. It's this one:
 https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h

 Here's a link to Depesz's explain (if links to the site are okay):
 http://explain.depesz.com/s/gCk

 I have just tried setting geqo_threshold, join_collapse_limit and
 from_collapse_limit to 16, but it yielded no improvement.
 Changing those three parameters to 32 did speed up the query from about 3.3
 seconds to about a second (give or take 50 ms), which is a pretty good
 improvement, but not quite there, as I'm looking to bring it down to about
 300 ms if possible. Changing those three settings to 48 yielded no
 improvements over 32.
 Is there possibly something something else to tweak there?

 Here's EXPLAIN ANALYZE output when the three settings have been set to 32:
 http://explain.depesz.com/s/cj2

 Thank you.

 Peter Slapansky



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread slapo

I got:
Total runtime: 9.313 ms in pgAdmin
Total runtime: 9.363 ms in psql.
But timing after the query finished was 912.842 ms in psql.
 
Cheers,
 
Peter Slapansky
__

Od: Igor Neyman iney...@perceptron.com
Komu: sl...@centrum.sk sl...@centrum.sk, Pavel Stehule 
pavel.steh...@gmail.com
Dátum: 07.08.2013 16:48
Predmet: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on 
a view with another view inside of it.

CC: pgsql-performance@postgresql.org



From: sl...@centrum.sk [mailto:sl...@centrum.sk] 
Sent: Wednesday, August 07, 2013 10:43 AM

To: Igor Neyman; Pavel Stehule
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on 
a view with another view inside of it.

You're right, it does... but it's quite odd, because I re-ran the 
explain-analyze statement and got the same results.
Still, the query now runs for about a second as mentioned before, so it's 
almost like something's missing from the explain, but I'm certain I copied it 
all.
 
I did this via pgadmin, but that shouldn't matter, should it?
 
Thank you,
 
Peter Slapansky
__
_

At very end of explain analyze output there should be a line:

Total runtime: 

What do you get there?

Regards,
Igor Neyman



[PERFORM] Better performance possible for a pathological query?

2013-08-07 Thread Alexis Lê-Quôc
Hi,

I've seen a couple of bad queries go through one instance and I'm wondering
whether there's something simple that can be done to help.

Not running the query in the first place is what I am looking to do
ultimately but in the meantime, I'm interested in understanding more about
the plan below.

The query itself is very simple: a primary key lookup on a 1.5x10^7 rows.
The issue is that we are looking up over 11,000 primary keys at once,
causing the db to consume a lot of CPU.

Where I'm not sure I follow, is the discrepancy between the planned and
actual rows.

Bitmap Heap Scan on dim_context c  (cost=6923.33..11762.31 rows=1
width=329) (actual time=17128.121..22031.783 rows=10858 loops=1)

Would a sequential scan be more beneficial? The table itself is about 10GB
on a 64GB box, 30% of these 10GB are buffered in RAM from what I can tell.

Thanks for your help,

Alexis

Here are the full details.

explain (analyze, buffers)
SELECT c.key,
   c.x_id,
   c.tags,
   c.source_type_id,
   x.api_key
  FROM dim_context c
  join x on c.x_id = x.id
 WHERE c.key = ANY (ARRAY[15368196, (11,000 other keys)])
   AND ((c.x_id = 1 AND c.tags @ ARRAY[E'blah']))

Here is the plan, abridged

 Nested Loop  (cost=6923.33..11770.59 rows=1 width=362) (actual
time=17128.188..22109.283 rows=10858 loops=1)
   Buffers: shared hit=83494
   -  Bitmap Heap Scan on dim_context c  (cost=6923.33..11762.31 rows=1
width=329) (actual time=17128.121..22031.783 rows=10858 loops=1)
 Recheck Cond: ((tags @ '{blah}'::text[]) AND (x_id = 1))
 Filter: (key = ANY ('{15368196,(a lot more keys
here)}'::integer[]))
 Buffers: shared hit=50919
 -  BitmapAnd  (cost=6923.33..6923.33 rows=269 width=0) (actual
time=132.910..132.910 rows=0 loops=1)
   Buffers: shared hit=1342
   -  Bitmap Index Scan on dim_context_tags_idx
 (cost=0.00..1149.61 rows=15891 width=0) (actual time=64.614..64.614
rows=264777 loops=1)
 Index Cond: (tags @ '{blah}'::text[])
 Buffers: shared hit=401
   -  Bitmap Index Scan on dim_context_x_id_source_type_id_idx
 (cost=0.00..5773.47 rows=268667 width=0) (actual time=54.648..54.648
rows=267659 loops=1)
 Index Cond: (x_id = 1)
 Buffers: shared hit=941
   -  Index Scan using x_pkey on x  (cost=0.00..8.27 rows=1 width=37)
(actual time=0.003..0.004 rows=1 loops=10858)
 Index Cond: (x.id = 1)
 Buffers: shared hit=32575
 Total runtime: 22117.417 ms

And here are the stats

attname | null_frac | avg_width | n_distinct | correlation
+---+---++-
 key| 0 | 4 | -1 |0.999558
 x_id   | 0 | 4 |   1498 |0.351316
 h_id|   0.05632 | 4 | 116570 |0.653092
 tags   | 0.0544567 |   284 | 454877 |   -0.169626
 source_type_id | 0 | 4 | 23 | 0.39552
 handle | 0 |   248 | -1 |0.272456
 created| 0 | 8 |  -0.645231 |0.999559
 modified   | 0 | 8 |  -0.645231 |0.999559


Re: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread Igor Neyman


From: sl...@centrum.sk [mailto:sl...@centrum.sk] 
Sent: Wednesday, August 07, 2013 11:34 AM
To: Igor Neyman; Pavel Stehule
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on 
a view with another view inside of it.

I got:
Total runtime: 9.313 ms in pgAdmin
Total runtime: 9.363 ms in psql.
But timing after the query finished was 912.842 ms in psql.
 
Cheers,
 
Peter Slapansky
__

That proves what Pavel suggested regarding planning time.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] RE: [PERFORM] Re: [PERFORM] Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-07 Thread Tom Lane
sl...@centrum.sk writes:
 Total runtime: 9.313 ms in pgAdmin
 Total runtime: 9.363 ms in psql.
 But timing after the query finished was 912.842 ms in psql.

Well, that's the downside of increasing join_collapse_limit and
from_collapse_limit: you might get a better plan, but it takes a lot
longer to get it because the planner is considering many more options.

If you're sufficiently desperate, you could consider rewriting the query
so that its JOIN structure matches the join order that the planner chooses
at the high collapse_limit settings.  Then you can reduce the limits back
down and it'll still find the same plan.  This tends to suck from a query
readability/maintainability standpoint though :-(.

The prepared-query approach might offer a solution too, if the good plan
isn't dependent on specific parameter values.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Better performance possible for a pathological query?

2013-08-07 Thread Tom Lane
=?UTF-8?B?QWxleGlzIEzDqi1RdcO0Yw==?= a...@datadoghq.com writes:
 The query itself is very simple: a primary key lookup on a 1.5x10^7 rows.
 The issue is that we are looking up over 11,000 primary keys at once,
 causing the db to consume a lot of CPU.

It looks like most of the runtime is probably going into checking the
c.key = ANY (ARRAY[...]) construct.  PG isn't especially smart about that
if it fails to optimize the construct into an index operation --- I think
it's just searching the array linearly for each row meeting the other
restrictions on c.

You could try writing the test like this:
c.key = ANY (VALUES (1), (17), (42), ...)
to see if the sub-select code path gives better results than the array
code path.  In a quick check it looked like this might produce a hash
join, which seemed promising anyway.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Better performance possible for a pathological query?

2013-08-07 Thread Alexis Lê-Quôc
On Wed, Aug 7, 2013 at 12:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 =?UTF-8?B?QWxleGlzIEzDqi1RdcO0Yw==?= a...@datadoghq.com writes:
  The query itself is very simple: a primary key lookup on a 1.5x10^7 rows.
  The issue is that we are looking up over 11,000 primary keys at once,
  causing the db to consume a lot of CPU.

 It looks like most of the runtime is probably going into checking the
 c.key = ANY (ARRAY[...]) construct.  PG isn't especially smart about that
 if it fails to optimize the construct into an index operation --- I think
 it's just searching the array linearly for each row meeting the other
 restrictions on c.

 You could try writing the test like this:
 c.key = ANY (VALUES (1), (17), (42), ...)
 to see if the sub-select code path gives better results than the array
 code path.  In a quick check it looked like this might produce a hash
 join, which seemed promising anyway.

 regards, tom lane


Thank you very much Tom, your suggestion is spot on. Runtime decreased
100-fold, from 20s to 200ms with a simple search-and-replace.

Here's the updated plan for the record.

 Nested Loop  (cost=168.22..2116.29 rows=148 width=362) (actual
time=22.134..256.531 rows=10858 loops=1)
   Buffers: shared hit=44967
   -  Index Scan using x_pkey on x  (cost=0.00..8.27 rows=1 width=37)
(actual time=0.071..0.073 rows=1 loops=1)
 Index Cond: (id = 1)
 Buffers: shared hit=4
   -  Nested Loop  (cost=168.22..2106.54 rows=148 width=329) (actual
time=22.060..242.406 rows=10858 loops=1)
 Buffers: shared hit=44963
 -  HashAggregate  (cost=168.22..170.22 rows=200 width=4) (actual
time=21.529..32.820 rows=11215 loops=1)
   -  Values Scan on *VALUES*  (cost=0.00..140.19 rows=11215
width=4) (actual time=0.005..9.527 rows=11215 loops=1)
 -  Index Scan using dim_context_pkey on dim_context c
 (cost=0.00..9.67 rows=1 width=329) (actual time=0.015..0.016 rows=1
loops=11215)
   Index Cond: (c.key = *VALUES*.column1)
   Filter: ((c.tags @ '{blah}'::text[]) AND (c.org_id = 1))
   Buffers: shared hit=44963
 Total runtime: 263.639 ms


[PERFORM] Efficiently query for the most recent record for a given user

2013-08-07 Thread Robert DiFalco
Let's say I have a table something like this:

   create table call_activity (
id int8 not null,
called timestamp,
user_id int8 not null,
primary key (id)
foreign key (user_id) references my_users
)


I want to get the last call_activity record for a single user.  Is there
ANY way to efficiently retrieve the last record for a specified user_id, or
do I need to de-normalize and update a table with a single row for each
user each time a new call_activity record is inserted? I know I how to do
the query without the summary table (subquery or GROUP BY with MAX) but
that seems like it will never perform well for large data sets. Or am I
full of beans and it should perform just fine for a huge data set as long
as I have an index on called?

Thanks in advance!


Re: [PERFORM] Efficiently query for the most recent record for a given user

2013-08-07 Thread Claudio Freire
On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco robert.difa...@gmail.com wrote:
 Let's say I have a table something like this:

create table call_activity (
 id int8 not null,
 called timestamp,
 user_id int8 not null,
 primary key (id)
 foreign key (user_id) references my_users
 )


 I want to get the last call_activity record for a single user.  Is there ANY
 way to efficiently retrieve the last record for a specified user_id, or do I
 need to de-normalize and update a table with a single row for each user each
 time a new call_activity record is inserted? I know I how to do the query
 without the summary table (subquery or GROUP BY with MAX) but that seems
 like it will never perform well for large data sets. Or am I full of beans
 and it should perform just fine for a huge data set as long as I have an
 index on called?


Create an index over (user_id, called desc), and do

select * from call_activity where user_id = blarg order by called desc limit 1


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Efficiently query for the most recent record for a given user

2013-08-07 Thread Tom Lane
Claudio Freire klaussfre...@gmail.com writes:
 On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco robert.difa...@gmail.com 
 wrote:
 I want to get the last call_activity record for a single user.

 Create an index over (user_id, called desc), and do
 select * from call_activity where user_id = blarg order by called desc limit 1

Note that there's no particular need to specify desc in the index
definition.  This same index can support searches in either direction
on the called column.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Efficiently query for the most recent record for a given user

2013-08-07 Thread Igor Neyman
 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
 performance-ow...@postgresql.org] On Behalf Of Claudio Freire
 Sent: Wednesday, August 07, 2013 2:20 PM
 To: Robert DiFalco
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Efficiently query for the most recent record for a
 given user
 
 On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco robert.difa...@gmail.com
 wrote:
  Let's say I have a table something like this:
 
 create table call_activity (
  id int8 not null,
  called timestamp,
  user_id int8 not null,
  primary key (id)
  foreign key (user_id) references my_users
  )
 
 
  I want to get the last call_activity record for a single user.  Is
  there ANY way to efficiently retrieve the last record for a specified
  user_id, or do I need to de-normalize and update a table with a single
  row for each user each time a new call_activity record is inserted? I
  know I how to do the query without the summary table (subquery or
  GROUP BY with MAX) but that seems like it will never perform well for
  large data sets. Or am I full of beans and it should perform just fine
  for a huge data set as long as I have an index on called?
 
 
 Create an index over (user_id, called desc), and do
 
 select * from call_activity where user_id = blarg order by called desc limit 1
 

And most recent call for every user:

SELECT id, user_id, MAX(called) OVER (PARTITION BY user_id) FROM call_activity;

Regards,
Igor Neyman



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Efficiently query for the most recent record for a given user

2013-08-07 Thread Robert DiFalco
Thanks guys!


On Wed, Aug 7, 2013 at 11:35 AM, Igor Neyman iney...@perceptron.com wrote:

  -Original Message-
  From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-
  performance-ow...@postgresql.org] On Behalf Of Claudio Freire
  Sent: Wednesday, August 07, 2013 2:20 PM
  To: Robert DiFalco
  Cc: pgsql-performance@postgresql.org
  Subject: Re: [PERFORM] Efficiently query for the most recent record for a
  given user
 
  On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco robert.difa...@gmail.com
 
  wrote:
   Let's say I have a table something like this:
  
  create table call_activity (
   id int8 not null,
   called timestamp,
   user_id int8 not null,
   primary key (id)
   foreign key (user_id) references my_users
   )
  
  
   I want to get the last call_activity record for a single user.  Is
   there ANY way to efficiently retrieve the last record for a specified
   user_id, or do I need to de-normalize and update a table with a single
   row for each user each time a new call_activity record is inserted? I
   know I how to do the query without the summary table (subquery or
   GROUP BY with MAX) but that seems like it will never perform well for
   large data sets. Or am I full of beans and it should perform just fine
   for a huge data set as long as I have an index on called?
 
 
  Create an index over (user_id, called desc), and do
 
  select * from call_activity where user_id = blarg order by called desc
 limit 1
 

 And most recent call for every user:

 SELECT id, user_id, MAX(called) OVER (PARTITION BY user_id) FROM
 call_activity;

 Regards,
 Igor Neyman




Re: [PERFORM] Efficiently query for the most recent record for a given user

2013-08-07 Thread Claudio Freire
On Wed, Aug 7, 2013 at 3:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Claudio Freire klaussfre...@gmail.com writes:
 On Wed, Aug 7, 2013 at 3:12 PM, Robert DiFalco robert.difa...@gmail.com 
 wrote:
 I want to get the last call_activity record for a single user.

 Create an index over (user_id, called desc), and do
 select * from call_activity where user_id = blarg order by called desc limit 
 1

 Note that there's no particular need to specify desc in the index
 definition.  This same index can support searches in either direction
 on the called column.


Yeah, but it's faster if it's in the same direction, because the
kernel read-ahead code detects sequential reads, whereas it doesn't
when it goes backwards. The difference can be up to a factor of 10 for
long index scans.

Though... true... for a limit 1... it wouldn't matter that much. But
it's become habit to match index sort order by now.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Efficiently query for the most recent record for a given user

2013-08-07 Thread Tom Lane
Claudio Freire klaussfre...@gmail.com writes:
 On Wed, Aug 7, 2013 at 3:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Note that there's no particular need to specify desc in the index
 definition.  This same index can support searches in either direction
 on the called column.

 Yeah, but it's faster if it's in the same direction, because the
 kernel read-ahead code detects sequential reads, whereas it doesn't
 when it goes backwards. The difference can be up to a factor of 10 for
 long index scans.

Color me skeptical.  Index searches are seldom purely sequential block
accesses.  Maybe if you had a freshly built index that'd never yet
suffered any inserts/updates, but in practice any advantage would
disappear very quickly after a few index page splits.

 Though... true... for a limit 1... it wouldn't matter that much.

That's the other point.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Efficiently query for the most recent record for a given user

2013-08-07 Thread Alvaro Herrera
Claudio Freire escribió:
 On Wed, Aug 7, 2013 at 3:34 PM, Tom Lane t...@sss.pgh.pa.us wrote:

  Note that there's no particular need to specify desc in the index
  definition.  This same index can support searches in either direction
  on the called column.
 
 Yeah, but it's faster if it's in the same direction, because the
 kernel read-ahead code detects sequential reads, whereas it doesn't
 when it goes backwards. The difference can be up to a factor of 10 for
 long index scans.

That might be true when an index is new, but as it grows, the leaf pages
are not going to be sequential anymore.  And this doesn't much apply for
an equality lookup anyway, does it?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Efficiently query for the most recent record for a given user

2013-08-07 Thread Claudio Freire
On Wed, Aug 7, 2013 at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah, but it's faster if it's in the same direction, because the
 kernel read-ahead code detects sequential reads, whereas it doesn't
 when it goes backwards. The difference can be up to a factor of 10 for
 long index scans.

 Color me skeptical.  Index searches are seldom purely sequential block
 accesses.  Maybe if you had a freshly built index that'd never yet
 suffered any inserts/updates, but in practice any advantage would
 disappear very quickly after a few index page splits.

Maybe.

I've tested on pgbench test databases, which I'm not sure whether
they're freshly built indexes or incrementally built ones, and it
applies there (in fact backward index-only scans was one of the
workloads the read-ahead patch improved the most).


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance