[GENERAL] partitioning for speed, but query planner ignores

2013-10-02 Thread David Rysdam
We have a by-our-standards large table (about 40e6 rows). Since it is
the bottleneck in some places, I thought I'd experiment with
partitioning. I'm following the instructions here:

http://www.postgresql.org/docs/current/static/ddl-partitioning.html

The table holds data about certain objects, each of which has an object
number and some number of historical entries (like account activity at a
bank, say). The typical usage pattern is: relatively rare inserts that
happen in the background via an automated process (meaning I don't care
if they take a little longer) and frequent querying, including some
where a human is sitting in front of it (i.e. I'd like it to be a lot
faster).

Our most frequent queries either select all history for object N or
most recent item for some subset of objects.

Because object number figure so prominently, I thought I'd partition on
that. To me, it makes the most sense from a load-balancing perspective
to partition on the mod of the object number (for this test, evens vs
odds, but planning to go up to mod 10 or even mod 100). Lower numbers
are going to be queried much less often than higher numbers. This scheme
also means I never have to add partitions in the future.

I set up my check constraints ((objnum % 2) = 0 and (objnum % 2) = 1 on
the relevant tables) and turned constraint_exclusion to 'partition' in
postgresql.conf. I also turned it to 'on' in my psql interface.

However, when I run an explain or an explain analyze, I still seeing it
checking both partitions. Is this because the query planner doesn't want
to do a mod? Should I go with simple ranges, even though this adds a
maintenance task? 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] partitioning for speed, but query planner ignores

2013-10-02 Thread David Rysdam
On Wed, 2 Oct 2013 08:34:44 -0400, David Rysdam drys...@ll.mit.edu wrote:
 However, when I run an explain or an explain analyze, I still seeing it
 checking both partitions. Is this because the query planner doesn't want
 to do a mod? Should I go with simple ranges, even though this adds a
 maintenance task? 

I guess I should give some administrivia as well: Server is 9.2.1
running Linux. The configuration is otherwise pretty vanilla with only
minor, and poorly-understood, conf changes.


smime.p7s
Description: S/MIME cryptographic signature


Re: [GENERAL] partitioning for speed, but query planner ignores

2013-10-02 Thread Bill Moran
On Wed, 2 Oct 2013 08:34:44 -0400
David Rysdam drys...@ll.mit.edu wrote:

 We have a by-our-standards large table (about 40e6 rows). Since it is
 the bottleneck in some places, I thought I'd experiment with
 partitioning. I'm following the instructions here:
 
 http://www.postgresql.org/docs/current/static/ddl-partitioning.html
 
 The table holds data about certain objects, each of which has an object
 number and some number of historical entries (like account activity at a
 bank, say). The typical usage pattern is: relatively rare inserts that
 happen in the background via an automated process (meaning I don't care
 if they take a little longer) and frequent querying, including some
 where a human is sitting in front of it (i.e. I'd like it to be a lot
 faster).
 
 Our most frequent queries either select all history for object N or
 most recent item for some subset of objects.
 
 Because object number figure so prominently, I thought I'd partition on
 that. To me, it makes the most sense from a load-balancing perspective
 to partition on the mod of the object number (for this test, evens vs
 odds, but planning to go up to mod 10 or even mod 100). Lower numbers
 are going to be queried much less often than higher numbers. This scheme
 also means I never have to add partitions in the future.
 
 I set up my check constraints ((objnum % 2) = 0 and (objnum % 2) = 1 on
 the relevant tables) and turned constraint_exclusion to 'partition' in
 postgresql.conf. I also turned it to 'on' in my psql interface.
 
 However, when I run an explain or an explain analyze, I still seeing it
 checking both partitions. Is this because the query planner doesn't want
 to do a mod? Should I go with simple ranges, even though this adds a
 maintenance task? 

Last I looked, the partitioning mechanism isn't _quite_ as smart as could
be desired.  For example:
SELECT * FROM table WHERE objnum = 5; -- will not take advantage of partition
You have to give the planner a little more hint as to the fact that it can
take advantage of the partition:
SELECT * FROM table WHERE (objnum % 2) = 1 AND objnum = 5;
As silly as it seems, this is enough information for the planner to know
that it only needs to scan one partition.

If this doesn't answer your question, you should probably provide some
more details (actual query and actual explain output, for example) to
help people better help you.

-- 
Bill Moran wmo...@potentialtech.com


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] partitioning for speed, but query planner ignores

2013-10-02 Thread David Rysdam
On Wed, 2 Oct 2013 09:12:02 -0400, Bill Moran wmo...@potentialtech.com wrote:
 Last I looked, the partitioning mechanism isn't _quite_ as smart as could
 be desired.  For example:
 SELECT * FROM table WHERE objnum = 5; -- will not take advantage of partition
 You have to give the planner a little more hint as to the fact that it can
 take advantage of the partition:
 SELECT * FROM table WHERE (objnum % 2) = 1 AND objnum = 5;
 As silly as it seems, this is enough information for the planner to know
 that it only needs to scan one partition.

This seemed ridiculously silly until I thought about it. I guess it has
no way of unwrapping my constraint and figuring out what to do. Would
this also apply if I did ranges or is that a common enough constraint
that it *can* figure it out without me having to modify all my queries?


smime.p7s
Description: S/MIME cryptographic signature


Re: [GENERAL] partitioning for speed, but query planner ignores

2013-10-02 Thread Kevin Grittner
David Rysdam drys...@ll.mit.edu wrote:

 Would the planner be smart enough to figure out ranges without me
 having to hint my queries?

Yes, it handles ranges well.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] partitioning for speed, but query planner ignores

2013-10-02 Thread bricklen
On Wed, Oct 2, 2013 at 9:01 AM, David Rysdam drys...@ll.mit.edu wrote:

  I had some vague notion of tables doing work but really if it can load
 one
 partition into RAM and get most of my hits from there, it'd be a big
 win.


The same concept applies to the frequently-used indexes on that partition.