Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-28 Thread Evgeniy Shishkin




On 28.05.2013, at 2:17, John Mudd johnbm...@gmail.com wrote:

 Thanks again.
 
 Well, I have two problems with using the CLUSTER option. It's only temporary 
 since any updates, depending how much free space is reserved per page, 
 requires re-running the CLUSTER. And my primary concern is that it 
 arbitrarily gives an unfair advantage to the primary key SELECT. Still, it's 
 easy to test so here are the results. The primary key still looses even with 
 the CLUSTER. Granted it is close but considering this is now an unfair 
 comparison it still doesn't make sense to me. How can a search for a specific 
 row that should be fairly straight forward take longer than a search that 
 includes an ORDER BY clause?
 

Well, you do just regular index scan because of LIMIT 1.

And now it is just a matter of index size and table organization.

I also don't understand why you consider CLUSTER unfair - the way you populated 
the table was natural cluster over my_key.

But it bothers me why my_key is always better. Can you please test it on 
different values but the same rows? Because now it is two different tuples and 
you count every io.

 
 test=# CLUSTER test_select USING test_select_pkey ;
 CLUSTER
 test=# VACUUM ANALYZE test_select ;
 VACUUM
 
 (stopped postgres; reset O/S cache; started postgres)
 
 
 test=# explain analyze SELECT * FROM test_select WHERE key1 = 50 ORDER 
 BY key1, key2, key3, id LIMIT 1;
   QUERY PLAN
 --
  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431 rows=1 
 loops=1)
-  Index Scan using my_key on test_select  (cost=0.00..41938.15 
 rows=42 width=21) (actual time=19.428..19.428 rows=1 loops=1)
  Index Cond: (key1 = 50)
  Total runtime: 19.526 ms
 
 
 (stopped postgres; reset O/S cache; started postgres)
 
 
 test=# explain analyze SELECT * FROM test_select WHERE id = 50;
QUERY PLAN 

 -
  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 
 width=21) (actual time=21.070..21.072 rows=1 loops=1)
Index Cond: (id = 50)
  Total runtime: 21.178 ms
 
 
 
 
 On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin itparan...@gmail.com 
 wrote:
 
 On May 27, 2013, at 6:35 PM, John Mudd johnbm...@gmail.com wrote:
 
 Thanks, that's easy enough to test. Didn't seem to help though.
 
 Ok. And if you CLUSTER tables USING PK?
 
 
 test=# REINDEX index test_select_pkey;
 REINDEX
 test=# VACUUM ANALYZE test_select ;
 VACUUM
 
 
 (stopped postgres; reset O/S cache; started postgres)
 
 test=# explain analyze SELECT * FROM test_select WHERE key1 = 50 ORDER 
 BY key1, key2, key3, id LIMIT 1;
   QUERY PLAN
   
 --
  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 
 rows=1 loops=1)
-  Index Scan using my_key on test_select  (cost=0.00..41981.16 
 rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
  Index Cond: (key1 = 50)
  Total runtime: 16.444 ms
 
 
 (stopped postgres; reset O/S cache; started postgres)
 
 test=# explain analyze SELECT * FROM test_select WHERE id = 50;
QUERY PLAN
 -
  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 
 width=21) (actual time=23.072..23.074 rows=1 loops=1)
Index Cond: (id = 50)
  Total runtime: 23.192 ms
 
 
 
 
 On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin itparan...@gmail.com 
 wrote:
 
 On May 27, 2013, at 6:02 PM, John Mudd johnbm...@gmail.com wrote:
 
  Postgres 9.1.2 on Ubuntu 12.04
 
  Any reason why a select by primary key would be slower than a select 
  that includes an ORDER BY? I was really hoping using the primary key 
  would give me a boost.
 
 
 You created my_key after data loading, and PK was there all the time.
 If you REINDEX PK, i bet it will be as fast.
 
  I stopped the server and cleared the O/S cache using sync; echo 3  
  /proc/sys/vm/drop_caches between the runs.
 
 
 
  test=# VACUUM ANALYZE test_select;
  VACUUM
 
  (stopped postgres; reset O/S cache; started postgres)
 
  test=# explain analyze SELECT * FROM test_select WHERE key1 = 50 
  ORDER BY key1, key2, key3, id LIMIT 1;
  

Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-28 Thread Roman Konoval
John,

And can you please include BUFFERS to ANALYZE?

Regards,
Roman Konoval


On Tue, May 28, 2013 at 9:48 AM, Evgeniy Shishkin itparan...@gmail.comwrote:





 On 28.05.2013, at 2:17, John Mudd johnbm...@gmail.com wrote:

 Thanks again.

 Well, I have two problems with using the CLUSTER option. It's only
 temporary since any updates, depending how much free space is reserved per
 page, requires re-running the CLUSTER. And my primary concern is that it
 arbitrarily gives an unfair advantage to the primary key SELECT. Still,
 it's easy to test so here are the results. The primary key still looses
 even with the CLUSTER. Granted it is close but considering this is now an
 unfair comparison it still doesn't make sense to me. How can a search for a
 specific row that should be fairly straight forward take longer than a
 search that includes an ORDER BY clause?


 Well, you do just regular index scan because of LIMIT 1.

 And now it is just a matter of index size and table organization.

 I also don't understand why you consider CLUSTER unfair - the way you
 populated the table was natural cluster over my_key.

 But it bothers me why my_key is always better. Can you please test it on
 different values but the same rows? Because now it is two different tuples
 and you count every io.


 test=# CLUSTER test_select USING test_select_pkey ;
 CLUSTER
 test=# VACUUM ANALYZE test_select ;
 VACUUM

 (stopped postgres; reset O/S cache; started postgres)


 test=# explain analyze SELECT * FROM test_select WHERE key1 = 50
 ORDER BY key1, key2, key3, id LIMIT 1;
   QUERY PLAN

 --
  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431
 rows=1 loops=1)
-  Index Scan using my_key on test_select  (cost=0.00..41938.15
 rows=42 width=21) (actual time=19.428..19.428 rows=1 loops=1)
  Index Cond: (key1 = 50)
  Total runtime: 19.526 ms


 (stopped postgres; reset O/S cache; started postgres)


 test=# explain analyze SELECT * FROM test_select WHERE id = 50;
QUERY PLAN


 -
  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1
 width=21) (actual time=21.070..21.072 rows=1 loops=1)
Index Cond: (id = 50)
  Total runtime: 21.178 ms




 On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin itparan...@gmail.comwrote:


 On May 27, 2013, at 6:35 PM, John Mudd johnbm...@gmail.com wrote:

 Thanks, that's easy enough to test. Didn't seem to help though.


 Ok. And if you CLUSTER tables USING PK?


 test=# REINDEX index test_select_pkey;
 REINDEX
 test=# VACUUM ANALYZE test_select ;
 VACUUM


 (stopped postgres; reset O/S cache; started postgres)

 test=# explain analyze SELECT * FROM test_select WHERE key1 = 50
 ORDER BY key1, key2, key3, id LIMIT 1;
   QUERY PLAN


 --
  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369
 rows=1 loops=1)
-  Index Scan using my_key on test_select  (cost=0.00..41981.16
 rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
  Index Cond: (key1 = 50)
  Total runtime: 16.444 ms


 (stopped postgres; reset O/S cache; started postgres)

 test=# explain analyze SELECT * FROM test_select WHERE id = 50;
QUERY PLAN

 -
   Index Scan using test_select_pkey on test_select  (cost=0.00..8.36
 rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1)
Index Cond: (id = 50)
  Total runtime: 23.192 ms




 On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin 
 itparan...@gmail.comwrote:


 On May 27, 2013, at 6:02 PM, John Mudd johnbm...@gmail.com wrote:

  Postgres 9.1.2 on Ubuntu 12.04
 
  Any reason why a select by primary key would be slower than a select
 that includes an ORDER BY? I was really hoping using the primary key would
 give me a boost.
 

 You created my_key after data loading, and PK was there all the time.
 If you REINDEX PK, i bet it will be as fast.

  I stopped the server and cleared the O/S cache using sync; echo 3 
 /proc/sys/vm/drop_caches between the runs.
 
 
 
  test=# VACUUM ANALYZE test_select;
  VACUUM
 
  (stopped postgres; reset O/S cache; started postgres)
 
  test=# explain analyze SELECT * FROM test_select WHERE key1 = 50
 ORDER BY key1, key2, key3, id LIMIT 1;

Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-28 Thread Matheus de Oliveira
On Mon, May 27, 2013 at 11:02 AM, John Mudd johnbm...@gmail.com wrote:

 Postgres 9.1.2 on Ubuntu 12.04

 Any reason why a select by primary key would be slower than a select that
 includes an ORDER BY? I was really hoping using the primary key would give
 me a boost.

 I stopped the server and cleared the O/S cache using sync; echo 3 
 /proc/sys/vm/drop_caches between the runs.



  test=# VACUUM ANALYZE test_select;
 VACUUM

 (stopped postgres; reset O/S cache; started postgres)

 test=# explain analyze SELECT * FROM test_select WHERE key1 = 50
 ORDER BY key1, key2, key3, id LIMIT 1;
   QUERY PLAN

 --
  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
 rows=1 loops=1)
-  Index Scan using my_key on test_select  (cost=0.00..41895.49
 rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
  Index Cond: (key1 = 50)
  Total runtime: 12.678 ms

 (stopped postgres; reset O/S cache; started postgres)

 test=# explain analyze SELECT * FROM test_select WHERE id = 50;
QUERY PLAN


 -
  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1
 width=21) (actual time=31.396..31.398 rows=1 loops=1)
Index Cond: (id = 50)
  Total runtime: 31.504 ms



 Schema:

 test=# \d test_select
 Table public.test_select
  Column | Type |Modifiers


 +--+--
  id | integer  | not null default
 nextval('test_select_id_seq'::regclass)
  key1   | integer  |
  key2   | integer  |
  key3   | integer  |
  data   | character(4) |
 Indexes:
 test_select_pkey PRIMARY KEY, btree (id)
 my_key btree (key1, key2, key3, id)

 test=#



 Sample data:

 test=# SELECT * FROM test_select LIMIT 10;
  id |  key1  |  key2  |  key3  | data
 ++++--
   1 | 984966 | 283954 | 772063 | x
   2 | 817668 | 393533 | 924888 | x
   3 | 751039 | 798753 | 454309 | x
   4 | 128505 | 329643 | 280553 | x
   5 | 105600 | 257225 | 710015 | x
   6 | 323891 | 615614 |  83206 | x
   7 | 194054 |  63506 | 353171 | x
   8 | 212068 | 881225 | 271804 | x
   9 | 644180 |  26693 | 200738 | x
  10 | 136586 | 498699 | 554417 | x
 (10 rows)



For me looks like my_key index should be better than the PK in this case.
For some reasons:

1. You are using a ORDER BY that has the same fields (and at the same
order) from your index, so PG only needs to navigate the index.
2. You are using LIMIT 1, which means PG only needs to fetch the first
element which key1=5 (and stop the search right after it).

In the case of your PK, PG will need to navigate through the index and
return only one value also, but in this case the number of entries it needs
to look at is bigger, because id has more distinct values than key1.

Regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nĂ­vel F!
www.dextra.com.br/postgres


Re: [PERFORM] Performance of complicated query

2013-05-28 Thread Jonathan Morra
I have been working on this query, and I was able to modify it and get it's
run time cut in half.  Here's where it is right now:

SELECT first_name, last_name, serial_number, latest_read, value,
lifetime_value, lifetime.patient_id
FROM (
SELECT DISTINCT patient_id, first_name, last_name, MAX(read_datetime)
OVER(PARTITION BY patient_id) AS latest_read
, SUM(value) OVER(PARTITION BY patient_id) AS value,
first_value(serial_number) OVER(PARTITION BY patient_id ORDER BY
read_datetime DESC) AS serial_number
 FROM (
SELECT patient_devices.device_id, patient_id, MAX(value - issued_value) AS
value, MAX(read_datetime) AS read_datetime
FROM read_reads
INNER JOIN patient_devices ON patient_devices.device_id =
read_reads.device_id
AND read_datetime = issuance_datetime
AND read_datetime  COALESCE(unassignment_datetime , 'infinity'::timestamp)
WHERE read_datetime BETWEEN '2012-01-01 10:30:01' AND '2013-05-18 03:03:42'
GROUP BY patient_devices.id
) AS first
INNER JOIN devices ON devices.id = device_id
INNER JOIN patients ON patient_id = patients.id
) AS filtered
INNER JOIN (
SELECT patient_id, SUM(value) AS lifetime_value
 FROM (
SELECT patient_id, MAX(value - issued_value) AS value FROM read_reads
INNER JOIN patient_devices ON patient_devices.device_id =
read_reads.device_id
AND read_datetime = issuance_datetime
AND read_datetime  COALESCE(unassignment_datetime , 'infinity'::timestamp)
GROUP BY patient_devices.id
) AS first GROUP BY patient_id
) AS lifetime ON filtered.patient_id = lifetime.patient_id

I think the key to cutting it down was moving some of the joins up a level.
 Even though this is faster, I'd still like to cut it down a bunch more (as
this will be run a lot in my application).  Any more insight would be
greatly appreciated.  A summary of explain (analyze, buffers) can be found
at http://explain.depesz.com/s/qx7f.

Thanks


On Thu, May 23, 2013 at 5:21 PM, Jonathan Morra jonmo...@gmail.com wrote:

 Sorry for the messy query, I'm very new to writing these complex queries.
  I'll try and make it easier to read by using WITH clauses.  However, just
 to clarify, the WITH clauses only increase readability and not performance
 in any way, right?


 On Thu, May 23, 2013 at 4:22 PM, james ja...@mansionfamily.plus.comwrote:

  On 23/05/2013 22:57, Jonathan Morra wrote:

  I'm not sure I understand your proposed solution.  There is also the
 case to consider where the same patient can be assigned the same device
 multiple times.  In this case, the value may be reset at each assignment
 (hence the line value - issued_value AS value from the original query).


 Perhaps you could use triggers to help somewhat?  At least for the
 lifetime part.

 For a given assignment of a device to a patient, only the last value is
 useful, so you can maintain that easily enough (a bit like a materialised
 view but before 9.3 I guess).

 But, that might fix 'lifetime' but not some arbitrary windowed view.  I
 can see why an 'as at' end time is useful, but not why a start time is so
 useful: if a device has readings before the window but not in the window,
 is that 'no reading' or should the last reading prior to the window apply?

 It also seems to me that the solution you have is hard to reason about.
 Its like a Haskell program done in one big inline fold rather than a bunch
 of 'where' clauses, and I find these cause significant brain overload.

 Perhaps you could break it out into identifiable chunks that work out
 (both for lifetime if not using triggers, and for your date range
 otherwise) the readings that are not superceded (ie the last in the date
 bounds for a device assignment), and then work with those.  Consider the
 CTE 'WITH queries' for doing this?

 It seems to me that if you can do this, then the problem might be easier
 to express.

 Failing that, I'd be looking at using temporary tables, and forcing a
 series of reduce steps using them, but then I'm a nasty old Sybase hacker
 at heart. ;-)





Re: [PERFORM] Performance of complicated query

2013-05-28 Thread Steve Crawford

On 05/23/2013 05:21 PM, Jonathan Morra wrote:
Sorry for the messy query, I'm very new to writing these complex 
queries.  I'll try and make it easier to read by using WITH clauses. 
 However, just to clarify, the WITH clauses only increase readability 
and not performance in any way, right?


It depends. The planner is a tricky beast and sometimes rewriting a 
seeming identical query will result in a much more (or less) efficient 
plan. A classic case was the difference between where foo in (select 
bar from...)... vs. where exists (select 1 from bar where...) In an 
ideal world the planner would figure out that both are the same and 
optimize accordingly but there was a point where one was typically more 
efficient then it switched to the other being better for the planner. I 
don't recall the current state.


Casting can be important - sometimes the planner needs a nudge to use 
an index on, say, a varchar column being compared to, perhaps, a text 
value or column in which case casting to the exact data-type being 
indexed can be a big win.


Cheers,
Steve



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance