Re: [PERFORM] Possible to improve query plan?

2011-01-25 Thread Cédric Villemain
2011/1/25 Kevin Grittner kevin.gritt...@wicourts.gov:
 Jeremy Palmer jpal...@linz.govt.nz wrote:

 Kevin I've now clustered the table. And the performance did
 increase quite a bit.

 Yeah, that's enough to notice the difference.

 My only question is how often will I need to re-cluster the table,
 because it comes at quite a cost. The setup I'm running will mean
 that 10,000 new rows will be inserted, and 2,500 rows will be
 updated on this table each day.

 You're going to see performance drop off as the data fragments.
 You'll need to balance the performance against maintenance
 down-time.  I would guess, though, that if you have a weekly
 maintenance window big enough to handle the CLUSTER, it might be
 worth doing it that often.

Was FILLFACTOR already suggested regarding the INSERT vs UPDATE per day ?

http://www.postgresql.org/docs/9.0/static/sql-altertable.html (and
index too, but they already have a default at 90% for btree)


 -Kevin

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-24 Thread Robert Haas
On Mon, Jan 17, 2011 at 11:48 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Jeremy Palmer  wrote:

   WHERE (
       (_revision_created = 16
        AND _revision_expired  16
        AND _revision_expired = 40)
    OR (_revision_created  16
        AND _revision_created = 40))

 - Bitmap Heap Scan on version_crs_coordinate_revision
      (actual time=70.925..13531.720 rows=149557 loops=1)

 - BitmapOr (actual time=53.650..53.650 rows=0 loops=1)

 This plan actually looks pretty good for what you're doing.  The
 Bitmap Index Scans and BitmapOr determine which tuples in the heap
 need to be visited.  The Bitmap Heap Scan then visits the heap pages
 in physical order (to avoid repeated fetches of the same page and to
 possibly edge toward sequential access speeds).  You don't seem to
 have a lot of bloat, which could be a killer on this type of query,
 since the rowcounts from the index scans aren't that much higher than
 the counts after you check the heap.

But isn't 13.5 seconds awfully slow to scan 149557 rows?  The sort is
sorting 23960kB.  Dividing that by 149557 rows gives ~169 bytes/per
row, or roughly 49 rows per block, which works out to 3k blows, or
about 24MB of data.  Clearly we must be hitting a LOT more data than
that, or this would be much faster than it is, I would think.

Any chance this is 9.0.X?  It'd be interesting to see the EXPLAIN
(ANALYZE, BUFFERS) output for this query.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-24 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 But isn't 13.5 seconds awfully slow to scan 149557 rows?  The sort
 is sorting 23960kB.  Dividing that by 149557 rows gives ~169
 bytes/per row
 
You're right.  I would expect 9 ms as per tuple as a worst case if
it doesn't need to go to TOAST data.  Caching, multiple rows per
page, or adjacent pages should all tend to bring it down from there.
How does it get to 90 ms per row with rows that narrow?
 
Is the table perhaps horribly bloated?  Jeremy, did you try my
suggestion of using CLUSTER on the index which will tend to be more
selective?
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 But isn't 13.5 seconds awfully slow to scan 149557 rows?

Depends on how many physical blocks they're scattered across, which
is hard to tell from this printout.  And on how many of the blocks
are already in cache, and what sort of disk hardware he's got, etc.

 Any chance this is 9.0.X?  It'd be interesting to see the EXPLAIN
 (ANALYZE, BUFFERS) output for this query.

Yeah.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-24 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Robert Haas robertmh...@gmail.com wrote:
 But isn't 13.5 seconds awfully slow to scan 149557 rows?  The sort
 is sorting 23960kB.  Dividing that by 149557 rows gives ~169
 bytes/per row
 
 You're right.  I would expect 9 ms as per tuple as a worst case if
 it doesn't need to go to TOAST data.  Caching, multiple rows per
 page, or adjacent pages should all tend to bring it down from there.
 How does it get to 90 ms per row with rows that narrow?

Um, that looks like 90 usec per row, not msec.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-24 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 Um, that looks like 90 usec per row, not msec.
 
Oh, right.  Well, having to do a random heap access for 1% of the
rows would pretty much explain the run time, then.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-24 Thread Jeremy Palmer
Might be a chance on 9.0 in a couple of weeks, when I do an upgrade on one our 
dev boxes. 

Kevin I've now clustered the table. And the performance did increase quite a 
bit. My only question is how often will I need to re-cluster the table, because 
it comes at quite a cost. The setup I'm running will mean that 10,000 new rows 
will be inserted, and 2,500 rows will be updated on this table each day.

Here is the new explain output once I have clustered on the 
idx_crs_coordinate_revision_created index:


Subquery Scan t  (cost=168227.04..173053.88 rows=743 width=205) (actual 
time=392.586..946.879 rows=106299 loops=1)
  Output: t.row_number, t._revision_created, t._revision_expired, t.id, 
t.cos_id, t.nod_id, t.ort_type_1, t.ort_type_2, t.ort_type_3, t.status, 
t.sdc_status, t.source, t.value1, t.value2, t.value3, t.wrk_id_created, 
t.cor_id, t.audit_id
  Filter: (t.row_number = 1)
  -  WindowAgg  (cost=168227.04..171197.40 rows=148518 width=86) (actual 
time=392.577..834.477 rows=149557 loops=1)
Output: row_number() OVER (?), 
table_version_crs_coordinate_revision._revision_created, 
table_version_crs_coordinate_revision._revision_expired, 
table_version_crs_coordinate_revision.id, 
table_version_crs_coordinate_revision.cos_id, 
table_version_crs_coordinate_revision.nod_id, 
table_version_crs_coordinate_revision.ort_type_1, 
table_version_crs_coordinate_revision.ort_type_2, 
table_version_crs_coordinate_revision.ort_type_3, 
table_version_crs_coordinate_revision.status, 
table_version_crs_coordinate_revision.sdc_status, 
table_version_crs_coordinate_revision.source, 
table_version_crs_coordinate_revision.value1, 
table_version_crs_coordinate_revision.value2, 
table_version_crs_coordinate_revision.value3, 
table_version_crs_coordinate_revision.wrk_id_created, 
table_version_crs_coordinate_revision.cor_id, 
table_version_crs_coordinate_revision.audit_id
-  Sort  (cost=168227.04..168598.34 rows=148518 width=86) (actual 
time=392.550..457.460 rows=149557 loops=1)
  Output: table_version_crs_coordinate_revision._revision_created, 
table_version_crs_coordinate_revision._revision_expired, 
table_version_crs_coordinate_revision.id, 
table_version_crs_coordinate_revision.cos_id, 
table_version_crs_coordinate_revision.nod_id, 
table_version_crs_coordinate_revision.ort_type_1, 
table_version_crs_coordinate_revision.ort_type_2, 
table_version_crs_coordinate_revision.ort_type_3, 
table_version_crs_coordinate_revision.status, 
table_version_crs_coordinate_revision.sdc_status, 
table_version_crs_coordinate_revision.source, 
table_version_crs_coordinate_revision.value1, 
table_version_crs_coordinate_revision.value2, 
table_version_crs_coordinate_revision.value3, 
table_version_crs_coordinate_revision.wrk_id_created, 
table_version_crs_coordinate_revision.cor_id, 
table_version_crs_coordinate_revision.audit_id
  Sort Key: table_version_crs_coordinate_revision.id, 
table_version_crs_coordinate_revision._revision_created
  Sort Method:  quicksort  Memory: 23960kB
  -  Bitmap Heap Scan on table_version_crs_coordinate_revision  
(cost=3215.29..155469.14 rows=148518 width=86) (actual time=38.808..196.993 
rows=149557 loops=1)
Output: 
table_version_crs_coordinate_revision._revision_created, 
table_version_crs_coordinate_revision._revision_expired, 
table_version_crs_coordinate_revision.id, 
table_version_crs_coordinate_revision.cos_id, 
table_version_crs_coordinate_revision.nod_id, 
table_version_crs_coordinate_revision.ort_type_1, 
table_version_crs_coordinate_revision.ort_type_2, 
table_version_crs_coordinate_revision.ort_type_3, 
table_version_crs_coordinate_revision.status, 
table_version_crs_coordinate_revision.sdc_status, 
table_version_crs_coordinate_revision.source, 
table_version_crs_coordinate_revision.value1, 
table_version_crs_coordinate_revision.value2, 
table_version_crs_coordinate_revision.value3, 
table_version_crs_coordinate_revision.wrk_id_created, 
table_version_crs_coordinate_revision.cor_id, 
table_version_crs_coordinate_revision.audit_id
Recheck Cond: (((_revision_expired  16) AND 
(_revision_expired = 40)) OR ((_revision_created  16) AND (_revision_created 
= 40)))
Filter: (((_revision_created = 16) AND (_revision_expired 
 16) AND (_revision_expired = 40)) OR ((_revision_created  16) AND 
(_revision_created = 40)))
-  BitmapOr  (cost=3215.29..3215.29 rows=149432 width=0) 
(actual time=27.330..27.330 rows=0 loops=1)
  -  Bitmap Index Scan on 
idx_crs_coordinate_revision_expired  (cost=0.00..2225.36 rows=106001 width=0) 
(actual time=21.596..21.596 rows=110326 loops=1)
Index Cond: ((_revision_expired  16) AND 
(_revision_expired = 40))
  -  Bitmap Index Scan on 
idx_crs_coordinate_revision_created  (cost=0.00..915.67 rows=43432 width=0) 
(actual time=5.728..5.728 rows=43258 loops=1)

Re: [PERFORM] Possible to improve query plan?

2011-01-24 Thread Kevin Grittner
Jeremy Palmer jpal...@linz.govt.nz wrote:
 
 Kevin I've now clustered the table. And the performance did
 increase quite a bit.
 
Yeah, that's enough to notice the difference.
 
 My only question is how often will I need to re-cluster the table,
 because it comes at quite a cost. The setup I'm running will mean
 that 10,000 new rows will be inserted, and 2,500 rows will be
 updated on this table each day.
 
You're going to see performance drop off as the data fragments. 
You'll need to balance the performance against maintenance
down-time.  I would guess, though, that if you have a weekly
maintenance window big enough to handle the CLUSTER, it might be
worth doing it that often.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-24 Thread Kevin Grittner
Jeremy Palmer jpal...@linz.govt.nz wrote:
 
 My only question is how often will I need to re-cluster the
 table, because it comes at quite a cost.
 
I probably should have mentioned that the CLUSTER will run faster if
the data is already mostly in the right sequence.  You'll be doing a
nearly sequential pass over the heap, which should minimize seek
time, especially if the OS notices the pattern and starts doing
sequential read-ahead.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-24 Thread Jeremy Palmer
Thanks heaps for the advice. I will do some benchmarks to see how long it takes 
to cluster all of the database tables.

Cheers,
Jeremy

-Original Message-
From: Kevin Grittner [mailto:kevin.gritt...@wicourts.gov] 
Sent: Tuesday, 25 January 2011 1:02 p.m.
To: Jeremy Palmer; Tom Lane
Cc: Robert Haas; pgsql-performance@postgresql.org; a...@squeakycode.net
Subject: RE: [PERFORM] Possible to improve query plan?

Jeremy Palmer jpal...@linz.govt.nz wrote:
 
 My only question is how often will I need to re-cluster the
 table, because it comes at quite a cost.
 
I probably should have mentioned that the CLUSTER will run faster if
the data is already mostly in the right sequence.  You'll be doing a
nearly sequential pass over the heap, which should minimize seek
time, especially if the OS notices the pattern and starts doing
sequential read-ahead.
 
-Kevin
__

This message contains information, which is confidential and may be subject to 
legal privilege. 
If you are not the intended recipient, you must not peruse, use, disseminate, 
distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 
0800 665 463 or i...@linz.govt.nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any 
attachments, after its transmission from LINZ.

Thank you.
__

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-17 Thread Jayadevan M
Hello,

 
 The distribution of the data is that all but 120,000 rows have null 
 values in the _revision_expired column.
 

A shot in the dark - will a partial index on the above column help?
http://www.postgresql.org/docs/current/interactive/indexes-partial.html
http://en.wikipedia.org/wiki/Partial_index

One link with discussion about it...
http://www.devheads.net/database/postgresql/general/when-can-postgresql-use-partial-not-null-index-seems-depend-size-clause-even-enable-seqscan.htm

Regards,
Jayadevan





DISCLAIMER: 

The information in this e-mail and any attachment is intended only for 
the person to whom it is addressed and may contain confidential and/or 
privileged material. If you have received this e-mail in error, kindly 
contact the sender and destroy all copies of the original communication. 
IBS makes no warranty, express or implied, nor guarantees the accuracy, 
adequacy or completeness of the information contained in this email or any 
attachment and is not liable for any errors, defects, omissions, viruses 
or for resultant loss or damage, if any, direct or indirect.






Re: [PERFORM] Possible to improve query plan?

2011-01-17 Thread Ing. Marcos Ortiz Valmaseda
Which is the type of your application? You can see it on the Performance 
Whackamole Presentation from Josh Berkus on the 
PgCon 2009:
- Web application
- Online Transaction Processing (OLTP)
- Data WareHousing (DW)

And based on the type of your application, you can configure the 
postgresql.conf to gain a better performance of your PostgreSQL server.
PostgreSQL postgresql.conf baseline:
 shared_buffers = 25% RAM
 work_mem = 512K[W] 2 MB[O] 128 MB[D]
 - but no more that RAM/no_connections
 maintenance_work_mem = 1/16 RAM
 checkpoint_segments = 8 [W], 16-64 [O], [D]
 wal_buffer = 1 MB [W], 8 MB [O], [D]
 effective_cache_size = 2/3 RAM

Regards
 

Ing. Marcos Luís Ortíz Valmaseda
Linux User # 418229  PostgreSQL DBA
Centro de Tecnologías Gestión de Datos (DATEC)
http://postgresql.uci.cu
http://www.postgresql.org
http://it.toolbox.com/blogs/sql-apprentice

- Mensaje original -
De: Jeremy Palmer jpal...@linz.govt.nz
Para: Andy Colson a...@squeakycode.net
CC: pgsql-performance@postgresql.org
Enviados: Lunes, 17 de Enero 2011 0:13:25 GMT -05:00 Región oriental EE. 
UU./Canadá
Asunto: Re: [PERFORM] Possible to improve query plan?

Hi Andy,

Yes important omissions:

Server version: 8.4.6
OS Windows Server 2003 Standard Ed :(
The work mem is 50mb.

I tried setting the work_mem to 500mb, but it didn't make a huge difference in 
query execution time. But then again the OS disk caching is probably taking 
over here.

Ok here's the new plan with work_mem = 50mb:

http://explain.depesz.com/s/xwv

And here another plan with work_mem = 500mb:

http://explain.depesz.com/s/VmO

Thanks,
Jeremy

-Original Message-
From: Andy Colson [mailto:a...@squeakycode.net] 
Sent: Monday, 17 January 2011 5:57 p.m.
To: Jeremy Palmer
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Possible to improve query plan?


Hum.. yeah it looks like it takes no time at all to pull data from the 
individual indexes, and them bitmap them.  I'm not sure what the bitmap heap 
scan is, or why its slow.  Hopefully someone smarter will come along.

Also its weird that explain.depesz.com didnt parse and show your entire plan.  
Hum.. you seem to have ending quotes on some of the lines?

One other though: quicksort  Memory: 23960kB
It needs 20Meg to sort... It could be your sort is swapping to disk.

What sort of PG version is this?
What are you using for work_mem?  (you could try to bump it up a little (its 
possible to set for session only, no need for server restart) and see if that'd 
help.

And sorry, but its my bedtime, good luck though.

-Andy

__

This message contains information, which is confidential and may be subject to 
legal privilege. 
If you are not the intended recipient, you must not peruse, use, disseminate, 
distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 
0800 665 463 or i...@linz.govt.nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any 
attachments, after its transmission from LINZ.

Thank you.
__

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-17 Thread Kevin Grittner
Jeremy Palmer  wrote:
 
   WHERE (
   (_revision_created = 16
AND _revision_expired  16
AND _revision_expired = 40)
OR (_revision_created  16
AND _revision_created = 40))
 
 - Bitmap Heap Scan on version_crs_coordinate_revision
  (actual time=70.925..13531.720 rows=149557 loops=1)
 
 - BitmapOr (actual time=53.650..53.650 rows=0 loops=1)
 
This plan actually looks pretty good for what you're doing.  The
Bitmap Index Scans and BitmapOr determine which tuples in the heap
need to be visited.  The Bitmap Heap Scan then visits the heap pages
in physical order (to avoid repeated fetches of the same page and to
possibly edge toward sequential access speeds).  You don't seem to
have a lot of bloat, which could be a killer on this type of query,
since the rowcounts from the index scans aren't that much higher than
the counts after you check the heap.
 
The only thing I can think of which might help is to CLUSTER the
table on whichever of the two indexes used in the plan which is
typically more selective for such queries.  (In the example query
that seems to be idx_crs_coordinate_revision_created.)  That might
reduce the number of heap pages which need to be accessed and/or put
place them close enough that you'll get some sequential readahead.
 
I guess you could also try adjusting effective_io_concurrency upward
to see if that helps.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-17 Thread Tom Lane
Jeremy Palmer jpal...@linz.govt.nz writes:
 I've come to a dead end in trying to get a commonly used query to
 perform better.

 EXPLAIN
 SELECT * FROM (
 SELECT
 row_number() OVER (PARTITION BY id ORDER BY _revision_created DESC) 
 as row_number,
 * 
 FROM
 version_crs_coordinate_revision
 WHERE (
 (_revision_created = 16 AND _revision_expired  16 AND 
 _revision_expired = 40) OR 
 (_revision_created  16 AND _revision_created = 40)
 )
 ) AS T 
 WHERE row_number = 1;

If I'm not mistaken, that's a DB2-ish locution for a query with DISTINCT
ON, ie, you're looking for the row with highest _revision_created for
each value of id.  It might perform well on DB2, but it's going to
mostly suck on Postgres --- we don't optimize window-function queries
very much at all at the moment.  Try writing it with DISTINCT ON instead
of a window function, like so:

SELECT DISTINCT ON (id)
* 
FROM
version_crs_coordinate_revision
WHERE (
(_revision_created = 16 AND _revision_expired  16 AND 
_revision_expired = 40) OR 
(_revision_created  16 AND _revision_created = 40)
)
ORDER BY id, _revision_created DESC;

You could also experiment with various forms of GROUP BY if you're loath
to use any Postgres-specific syntax.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-17 Thread Jeremy Palmer
It fits a Data Warehousing type application. 

Apart from work_mem, my other parameters are pretty close to these numbers. I 
had the work_mem down a little because a noticed some clients were getting out 
of memory errors with large queries which involved lots of sorting.

Thanks
Jeremy

-Original Message-
From: Ing. Marcos Ortiz Valmaseda [mailto:mlor...@uci.cu] 
Sent: Tuesday, 18 January 2011 2:38 a.m.
To: Jeremy Palmer
Cc: pgsql-performance@postgresql.org; Andy Colson
Subject: Re: [PERFORM] Possible to improve query plan?

Which is the type of your application? You can see it on the Performance 
Whackamole Presentation from Josh Berkus on the 
PgCon 2009:
- Web application
- Online Transaction Processing (OLTP)
- Data WareHousing (DW)

And based on the type of your application, you can configure the 
postgresql.conf to gain a better performance of your PostgreSQL server.
PostgreSQL postgresql.conf baseline:
 shared_buffers = 25% RAM
 work_mem = 512K[W] 2 MB[O] 128 MB[D]
 - but no more that RAM/no_connections
 maintenance_work_mem = 1/16 RAM
 checkpoint_segments = 8 [W], 16-64 [O], [D]
 wal_buffer = 1 MB [W], 8 MB [O], [D]
 effective_cache_size = 2/3 RAM

Regards
 

Ing. Marcos Luís Ortíz Valmaseda
Linux User # 418229  PostgreSQL DBA
Centro de Tecnologías Gestión de Datos (DATEC)
http://postgresql.uci.cu
http://www.postgresql.org
http://it.toolbox.com/blogs/sql-apprentice

__

This message contains information, which is confidential and may be subject to 
legal privilege. 
If you are not the intended recipient, you must not peruse, use, disseminate, 
distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 
0800 665 463 or i...@linz.govt.nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any 
attachments, after its transmission from LINZ.

Thank you.
__

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-17 Thread Jeremy Palmer
Thanks that seems to make the query 10-15% faster :)

Cheers
jeremy

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Tuesday, 18 January 2011 9:24 a.m.
To: Jeremy Palmer
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Possible to improve query plan? 

Jeremy Palmer jpal...@linz.govt.nz writes:
 I've come to a dead end in trying to get a commonly used query to
 perform better.

 EXPLAIN
 SELECT * FROM (
 SELECT
 row_number() OVER (PARTITION BY id ORDER BY _revision_created DESC) 
 as row_number,
 * 
 FROM
 version_crs_coordinate_revision
 WHERE (
 (_revision_created = 16 AND _revision_expired  16 AND 
 _revision_expired = 40) OR 
 (_revision_created  16 AND _revision_created = 40)
 )
 ) AS T 
 WHERE row_number = 1;

If I'm not mistaken, that's a DB2-ish locution for a query with DISTINCT
ON, ie, you're looking for the row with highest _revision_created for
each value of id.  It might perform well on DB2, but it's going to
mostly suck on Postgres --- we don't optimize window-function queries
very much at all at the moment.  Try writing it with DISTINCT ON instead
of a window function, like so:

SELECT DISTINCT ON (id)
* 
FROM
version_crs_coordinate_revision
WHERE (
(_revision_created = 16 AND _revision_expired  16 AND 
_revision_expired = 40) OR 
(_revision_created  16 AND _revision_created = 40)
)
ORDER BY id, _revision_created DESC;

You could also experiment with various forms of GROUP BY if you're loath
to use any Postgres-specific syntax.

regards, tom lane
__

This message contains information, which is confidential and may be subject to 
legal privilege. 
If you are not the intended recipient, you must not peruse, use, disseminate, 
distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 
0800 665 463 or i...@linz.govt.nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any 
attachments, after its transmission from LINZ.

Thank you.
__

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-17 Thread Mladen Gogala

Tom Lane wrote:
If I'm not mistaken, that's a DB2-ish locution 


It could also be a part of the Oracle vernacular. I've seen queries like 
that running against Oracle RDBMS, too.



for a query with DISTINCT
ON, ie, you're looking for the row with highest _revision_created for
each value of id.  It might perform well on DB2, but it's going to
mostly suck on Postgres --- we don't optimize window-function queries
very much at all at the moment.  
Hmmm, what optimizations do you have in mind? I thought that window 
functions are just clever tricks with memory? Anything that can be 
expected for 9.0x?




Try writing it with DISTINCT ON instead
of a window function, like so:
  

Wouldn't distinct necessarily bring about the sort/merge?

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Possible to improve query plan?

2011-01-16 Thread Jeremy Palmer
Hi all,

I've come to a dead end in trying to get a commonly used query to perform 
better. The query is against one table with 10 million rows. This table has 
been analysed. The table definition is:

CREATE TABLE version_crs_coordinate_revision
(
  _revision_created integer NOT NULL,
  _revision_expired integer,
  id integer NOT NULL,
  cos_id integer NOT NULL,
  nod_id integer NOT NULL,
  ort_type_1 character varying(4),
  ort_type_2 character varying(4),
  ort_type_3 character varying(4),
  status character varying(4) NOT NULL,
  sdc_status character(1) NOT NULL,
  source character varying(4),
  value1 numeric(22,12),
  value2 numeric(22,12),
  value3 numeric(22,12),
  wrk_id_created integer,
  cor_id integer,
  audit_id integer NOT NULL,
  CONSTRAINT pkey_version_crs_coordinate_revision PRIMARY KEY 
(_revision_created, id),
  CONSTRAINT version_crs_coordinate_revision_revision_created_fkey FOREIGN KEY 
(_revision_created)
  REFERENCES revision (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT version_crs_coordinate_revision_revision_expired_fkey FOREIGN KEY 
(_revision_expired)
  REFERENCES revision (id) MATCH SIMPLE
  ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
  OIDS=FALSE
);
ALTER TABLE version.version_crs_coordinate_revision ALTER COLUMN 
_revision_created SET STATISTICS 1000;
ALTER TABLE version.version_crs_coordinate_revision ALTER COLUMN 
_revision_expired SET STATISTICS 1000;
ALTER TABLE version.version_crs_coordinate_revision ALTER COLUMN id SET 
STATISTICS 1000;

CREATE INDEX idx_crs_coordinate_revision_created ON 
version.version_crs_coordinate_revision USING btree (_revision_created);
CREATE INDEX idx_crs_coordinate_revision_created_expired ON 
version.version_crs_coordinate_revision USING btree (_revision_created, 
_revision_expired);
CREATE INDEX idx_crs_coordinate_revision_expired ON 
version.version_crs_coordinate_revision USING btree (_revision_expired);
CREATE INDEX idx_crs_coordinate_revision_expired_created ON 
version.version_crs_coordinate_revision USING btree (_revision_expired, 
_revision_created);
CREATE INDEX idx_crs_coordinate_revision_expired_id ON 
version.version_crs_coordinate_revision USING btree (_revision_expired, id);
CREATE INDEX idx_crs_coordinate_revision_id ON 
version.version_crs_coordinate_revision USING btree (id);
CREATE INDEX idx_crs_coordinate_revision_id_created ON 
version.version_crs_coordinate_revision USING btree (id, _revision_created); 


The distribution of the data is that all but 120,000 rows have null values in 
the _revision_expired column.

The query itself that I'm trying to optimise is below:

EXPLAIN
SELECT * FROM (
SELECT
row_number() OVER (PARTITION BY id ORDER BY _revision_created DESC) as 
row_number,
* 
FROM
version_crs_coordinate_revision
WHERE (
(_revision_created = 16 AND _revision_expired  16 AND 
_revision_expired = 40) OR 
(_revision_created  16 AND _revision_created = 40)
)
) AS T 
WHERE row_number = 1;

Subquery Scan t  (cost=170692.25..175678.27 rows=767 width=205)
  Filter: (t.row_number = 1)
  -  WindowAgg  (cost=170692.25..173760.57 rows=153416 width=86)
-  Sort  (cost=170692.25..171075.79 rows=153416 width=86)
  Sort Key: version_crs_coordinate_revision.id, 
version_crs_coordinate_revision._revision_created
  -  Bitmap Heap Scan on version_crs_coordinate_revision  
(cost=3319.13..157477.69 rows=153416 width=86)
Recheck Cond: (((_revision_expired  16) AND 
(_revision_expired = 40)) OR ((_revision_created  16) AND (_revision_created 
= 40)))
Filter: (((_revision_created = 16) AND (_revision_expired 
 16) AND (_revision_expired = 40)) OR ((_revision_created  16) AND 
(_revision_created = 40)))
-  BitmapOr  (cost=3319.13..3319.13 rows=154372 width=0)
  -  Bitmap Index Scan on 
idx_crs_coordinate_revision_expired  (cost=0.00..2331.76 rows=111041 width=0)
Index Cond: ((_revision_expired  16) AND 
(_revision_expired = 40))
  -  Bitmap Index Scan on 
idx_crs_coordinate_revision_created  (cost=0.00..910.66 rows=43331 width=0)
Index Cond: ((_revision_created  16) AND 
(_revision_created = 40))


One thought I have is that maybe the 
idx_crs_coordinate_revision_expired_created index could be used instead of 
idx_crs_coordinate_revision_expired.

Does anyone have any suggestions what I could do to improve the plan? Or how I 
could force the use of the idx_crs_coordinate_revision_expired_created index to 
see if that is better.

Thanks
Jeremy


__

This message contains information, which is confidential and may be subject to 
legal privilege. 
If you are not the intended recipient, you must not peruse, use, disseminate, 
distribute or 

Re: [PERFORM] Possible to improve query plan?

2011-01-16 Thread Andy Colson

On 01/16/2011 09:21 PM, Jeremy Palmer wrote:

Hi all,

I've come to a dead end in trying to get a commonly used query to perform 
better. The query is against one table with 10 million rows. This table has 
been analysed. The table definition is:

CREATE TABLE version_crs_coordinate_revision
(
   _revision_created integer NOT NULL,
   _revision_expired integer,
   id integer NOT NULL,
   cos_id integer NOT NULL,
   nod_id integer NOT NULL,
   ort_type_1 character varying(4),
   ort_type_2 character varying(4),
   ort_type_3 character varying(4),
   status character varying(4) NOT NULL,
   sdc_status character(1) NOT NULL,
   source character varying(4),
   value1 numeric(22,12),
   value2 numeric(22,12),
   value3 numeric(22,12),
   wrk_id_created integer,
   cor_id integer,
   audit_id integer NOT NULL,
   CONSTRAINT pkey_version_crs_coordinate_revision PRIMARY KEY 
(_revision_created, id),
   CONSTRAINT version_crs_coordinate_revision_revision_created_fkey FOREIGN KEY 
(_revision_created)
   REFERENCES revision (id) MATCH SIMPLE
   ON UPDATE NO ACTION ON DELETE NO ACTION,
   CONSTRAINT version_crs_coordinate_revision_revision_expired_fkey FOREIGN KEY 
(_revision_expired)
   REFERENCES revision (id) MATCH SIMPLE
   ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
   OIDS=FALSE
);
ALTER TABLE version.version_crs_coordinate_revision ALTER COLUMN 
_revision_created SET STATISTICS 1000;
ALTER TABLE version.version_crs_coordinate_revision ALTER COLUMN 
_revision_expired SET STATISTICS 1000;
ALTER TABLE version.version_crs_coordinate_revision ALTER COLUMN id SET 
STATISTICS 1000;

CREATE INDEX idx_crs_coordinate_revision_created ON 
version.version_crs_coordinate_revision USING btree (_revision_created);
CREATE INDEX idx_crs_coordinate_revision_created_expired ON 
version.version_crs_coordinate_revision USING btree (_revision_created, 
_revision_expired);
CREATE INDEX idx_crs_coordinate_revision_expired ON 
version.version_crs_coordinate_revision USING btree (_revision_expired);
CREATE INDEX idx_crs_coordinate_revision_expired_created ON 
version.version_crs_coordinate_revision USING btree (_revision_expired, 
_revision_created);
CREATE INDEX idx_crs_coordinate_revision_expired_id ON 
version.version_crs_coordinate_revision USING btree (_revision_expired, id);
CREATE INDEX idx_crs_coordinate_revision_id ON 
version.version_crs_coordinate_revision USING btree (id);
CREATE INDEX idx_crs_coordinate_revision_id_created ON 
version.version_crs_coordinate_revision USING btree (id, _revision_created);


The distribution of the data is that all but 120,000 rows have null values in 
the _revision_expired column.

The query itself that I'm trying to optimise is below:

EXPLAIN
SELECT * FROM (
 SELECT
 row_number() OVER (PARTITION BY id ORDER BY _revision_created DESC) as 
row_number,
 *
 FROM
 version_crs_coordinate_revision
 WHERE (
 (_revision_created= 16 AND _revision_expired  16 AND 
_revision_expired= 40) OR
 (_revision_created  16 AND _revision_created= 40)
 )
) AS T
WHERE row_number = 1;

Subquery Scan t  (cost=170692.25..175678.27 rows=767 width=205)
   Filter: (t.row_number = 1)
   -   WindowAgg  (cost=170692.25..173760.57 rows=153416 width=86)
 -   Sort  (cost=170692.25..171075.79 rows=153416 width=86)
   Sort Key: version_crs_coordinate_revision.id, 
version_crs_coordinate_revision._revision_created
   -   Bitmap Heap Scan on version_crs_coordinate_revision  
(cost=3319.13..157477.69 rows=153416 width=86)
 Recheck Cond: (((_revision_expired  16) AND 
(_revision_expired= 40)) OR ((_revision_created  16) AND (_revision_created= 
40)))
 Filter: (((_revision_created= 16) AND (_revision_expired  16) AND 
(_revision_expired= 40)) OR ((_revision_created  16) AND (_revision_created= 40)))
 -   BitmapOr  (cost=3319.13..3319.13 rows=154372 width=0)
   -   Bitmap Index Scan on 
idx_crs_coordinate_revision_expired  (cost=0.00..2331.76 rows=111041 width=0)
 Index Cond: ((_revision_expired  16) AND 
(_revision_expired= 40))
   -   Bitmap Index Scan on 
idx_crs_coordinate_revision_created  (cost=0.00..910.66 rows=43331 width=0)
 Index Cond: ((_revision_created  16) AND 
(_revision_created= 40))


One thought I have is that maybe the 
idx_crs_coordinate_revision_expired_created index could be used instead of 
idx_crs_coordinate_revision_expired.

Does anyone have any suggestions what I could do to improve the plan? Or how I 
could force the use of the idx_crs_coordinate_revision_expired_created index to 
see if that is better.

Thanks
Jeremy


First, wow, those are long names... I had a hard time keeping track.

Second: you have lots of duplicated indexes. I count _revision_created in 4 
indexes?  Not sure what other sql you are using, but have you 

[PERFORM] Possible to improve query plan?

2011-01-16 Thread Jeremy Palmer
Hi Andy,

Yeah sorry about the long name, there are all generated by function as part of 
a table versioning system. And yes I placed all possible indexes on the table 
to see which would be used by the planner. In production I will drop the unused 
indexes. 

Yes simple drop the extra index :P I have dropped the index and it made the 
query slower :(

Here is the explain analyse:

Subquery Scan t  (cost=170692.25..175678.27 rows=767 width=205) (actual 
time=13762.783..14322.315 rows=106299 loops=1)'
  Filter: (t.row_number = 1)'
  -  WindowAgg  (cost=170692.25..173760.57 rows=153416 width=86) (actual 
time=13762.774..14208.522 rows=149557 loops=1)'
-  Sort  (cost=170692.25..171075.79 rows=153416 width=86) (actual 
time=13762.745..13828.584 rows=149557 loops=1)'
  Sort Key: version_crs_coordinate_revision.id, 
version_crs_coordinate_revision._revision_created'
  Sort Method:  quicksort  Memory: 23960kB
  -  Bitmap Heap Scan on version_crs_coordinate_revision  
(cost=3319.13..157477.69 rows=153416 width=86) (actual time=70.925..13531.720 
rows=149557 loops=1)
Recheck Cond: (((_revision_expired  16) AND 
(_revision_expired = 40)) OR ((_revision_created  16) AND (_revision_created 
= 40)))
Filter: (((_revision_created = 16) AND (_revision_expired 
 16) AND (_revision_expired = 40)) OR ((_revision_created  16) AND 
(_revision_created = 40)))
-  BitmapOr  (cost=3319.13..3319.13 rows=154372 width=0) 
(actual time=53.650..53.650 rows=0 loops=1)
  -  Bitmap Index Scan on 
idx_crs_coordinate_revision_expired  (cost=0.00..2331.76 rows=111041 width=0) 
(actual time=37.773..37.773 rows=110326 loops=1)
Index Cond: ((_revision_expired  16) AND 
(_revision_expired = 40))
  -  Bitmap Index Scan on 
idx_crs_coordinate_revision_created  (cost=0.00..910.66 rows=43331 width=0) 
(actual time=15.872..15.872 rows=43258 loops=1)
Index Cond: ((_revision_created  16) AND 
(_revision_created = 40))
Total runtime: 14359.747 ms

http://explain.depesz.com/s/qpL says that the bitmap heap scan is bad. Not sure 
what to do about it.

Thanks,
Jeremy

-Original Message-
From: Andy Colson [mailto:a...@squeakycode.net] 
Sent: Monday, 17 January 2011 5:22 p.m.
To: Jeremy Palmer
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Possible to improve query plan?


First, wow, those are long names... I had a hard time keeping track.

Second: you have lots of duplicated indexes. I count _revision_created in 4 
indexes?  Not sure what other sql you are using, but have you tried one index 
for one column?  PG will be able to Bitmap them together if it thinks it can 
use more than one.  Was that because you were testing?

Third: any chance we can get an explain analyze?  It give's more info.  
(Also, have you seen http://explain.depesz.com/)

Last: If you wanted to force the index usage, for a test, you could drop the 
other indexes.  I assume this is on a test box so it should be ok.  If its 
live, you could wrap it in a BEGIN ... ROLLBACK (in theory... never tried it 
myself)

-Andy
__

This message contains information, which is confidential and may be subject to 
legal privilege. 
If you are not the intended recipient, you must not peruse, use, disseminate, 
distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 
0800 665 463 or i...@linz.govt.nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any 
attachments, after its transmission from LINZ.

Thank you.
__

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-16 Thread Andy Colson

-Original Message-
From: Andy Colson [mailto:a...@squeakycode.net]
Sent: Monday, 17 January 2011 5:22 p.m.
To: Jeremy Palmer
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Possible to improve query plan?


First, wow, those are long names... I had a hard time keeping track.

Second: you have lots of duplicated indexes. I count _revision_created in 4 
indexes?  Not sure what other sql you are using, but have you tried one index 
for one column?  PG will be able to Bitmap them together if it thinks it can 
use more than one.  Was that because you were testing?

Third: any chance we can get an explain analyze?  It give's more info.  
(Also, have you seen http://explain.depesz.com/)

Last: If you wanted to force the index usage, for a test, you could drop the 
other indexes.  I assume this is on a test box so it should be ok.  If its 
live, you could wrap it in a BEGIN ... ROLLBACK (in theory... never tried it 
myself)

-Andy


On 01/16/2011 10:43 PM, Jeremy Palmer wrote:

Hi Andy,

Yeah sorry about the long name, there are all generated by function as part of 
a table versioning system. And yes I placed all possible indexes on the table 
to see which would be used by the planner. In production I will drop the unused 
indexes.

Yes simple drop the extra index :P I have dropped the index and it made the 
query slower :(

Here is the explain analyse:

Subquery Scan t  (cost=170692.25..175678.27 rows=767 width=205) (actual 
time=13762.783..14322.315 rows=106299 loops=1)'
   Filter: (t.row_number = 1)'
   -   WindowAgg  (cost=170692.25..173760.57 rows=153416 width=86) (actual 
time=13762.774..14208.522 rows=149557 loops=1)'
 -   Sort  (cost=170692.25..171075.79 rows=153416 width=86) (actual 
time=13762.745..13828.584 rows=149557 loops=1)'
   Sort Key: version_crs_coordinate_revision.id, 
version_crs_coordinate_revision._revision_created'
   Sort Method:  quicksort  Memory: 23960kB
   -   Bitmap Heap Scan on version_crs_coordinate_revision  
(cost=3319.13..157477.69 rows=153416 width=86) (actual time=70.925..13531.720 
rows=149557 loops=1)
 Recheck Cond: (((_revision_expired  16) AND 
(_revision_expired= 40)) OR ((_revision_created  16) AND (_revision_created= 
40)))
 Filter: (((_revision_created= 16) AND (_revision_expired  16) AND 
(_revision_expired= 40)) OR ((_revision_created  16) AND (_revision_created= 40)))
 -   BitmapOr  (cost=3319.13..3319.13 rows=154372 width=0) 
(actual time=53.650..53.650 rows=0 loops=1)
   -   Bitmap Index Scan on 
idx_crs_coordinate_revision_expired  (cost=0.00..2331.76 rows=111041 width=0) 
(actual time=37.773..37.773 rows=110326 loops=1)
 Index Cond: ((_revision_expired  16) AND 
(_revision_expired= 40))
   -   Bitmap Index Scan on 
idx_crs_coordinate_revision_created  (cost=0.00..910.66 rows=43331 width=0) 
(actual time=15.872..15.872 rows=43258 loops=1)
 Index Cond: ((_revision_created  16) AND 
(_revision_created= 40))
Total runtime: 14359.747 ms

http://explain.depesz.com/s/qpL says that the bitmap heap scan is bad. Not sure 
what to do about it.

Thanks,
Jeremy




Hum.. yeah it looks like it takes no time at all to pull data from the 
individual indexes, and them bitmap them.  I'm not sure what the bitmap heap 
scan is, or why its slow.  Hopefully someone smarter will come along.

Also its weird that explain.depesz.com didnt parse and show your entire plan.  
Hum.. you seem to have ending quotes on some of the lines?

One other though: quicksort  Memory: 23960kB
It needs 20Meg to sort... It could be your sort is swapping to disk.

What sort of PG version is this?
What are you using for work_mem?  (you could try to bump it up a little (its 
possible to set for session only, no need for server restart) and see if that'd 
help.

And sorry, but its my bedtime, good luck though.

-Andy


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Possible to improve query plan?

2011-01-16 Thread Jeremy Palmer
Hi Andy,

Yes important omissions:

Server version: 8.4.6
OS Windows Server 2003 Standard Ed :(
The work mem is 50mb.

I tried setting the work_mem to 500mb, but it didn't make a huge difference in 
query execution time. But then again the OS disk caching is probably taking 
over here.

Ok here's the new plan with work_mem = 50mb:

http://explain.depesz.com/s/xwv

And here another plan with work_mem = 500mb:

http://explain.depesz.com/s/VmO

Thanks,
Jeremy

-Original Message-
From: Andy Colson [mailto:a...@squeakycode.net] 
Sent: Monday, 17 January 2011 5:57 p.m.
To: Jeremy Palmer
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Possible to improve query plan?


Hum.. yeah it looks like it takes no time at all to pull data from the 
individual indexes, and them bitmap them.  I'm not sure what the bitmap heap 
scan is, or why its slow.  Hopefully someone smarter will come along.

Also its weird that explain.depesz.com didnt parse and show your entire plan.  
Hum.. you seem to have ending quotes on some of the lines?

One other though: quicksort  Memory: 23960kB
It needs 20Meg to sort... It could be your sort is swapping to disk.

What sort of PG version is this?
What are you using for work_mem?  (you could try to bump it up a little (its 
possible to set for session only, no need for server restart) and see if that'd 
help.

And sorry, but its my bedtime, good luck though.

-Andy

__

This message contains information, which is confidential and may be subject to 
legal privilege. 
If you are not the intended recipient, you must not peruse, use, disseminate, 
distribute or copy this message.
If you have received this message in error, please notify us immediately (Phone 
0800 665 463 or i...@linz.govt.nz) and destroy the original message.
LINZ accepts no responsibility for changes to this email, or for any 
attachments, after its transmission from LINZ.

Thank you.
__

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance