Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-26 Thread Francisco Olarte
Hi Daniel:

On Mon, May 25, 2015 at 10:03 PM, Daniel Begin jfd...@hotmail.com wrote:
...
 Even after doing all this, I did not find any improvement in execution times 
 between my original fat table and the partitioned version (sometime even 
 worst). If partitioning the table has improved significantly queries running 
 times, I could have partitioned the tables differently to accommodate other 
 query types I will have to run later in my research (I have the same problem 
 for half a dozen tables).
 Since it does not seem that partitioning will do the job, I will get back to 
 the original table to run my queries...

Well, at least you've learnt some things about it and you can expect
to repeat the measurements faster shoudl you need it.

 However, just in case someone knows a magical trick that can improve 
 significantly the speed of my queries (but haven't told me yet!-) here are 
 the details about the concerned table/indexes
.
 Table size: 369GB
 Indexes size: 425GB
 I am running all this on my personal PC:  Windows 64b, i7 chip, 16GB ram.

Supposing you can dedicate about 12 Gb to shared buffers / caches,
your caches are going to get trashed often with real work, that is why
we recommended repeating the queries.

Anyway, one last remark. Partition is not magic, it helps a lot
depending on the access records. For workloads like mine they help a
lot  ( call records, where I insert frequently ( so position
correlates strongly with indexes ), nearly never update ( and I work
with high fill factors ), and query frequently for unindexed
conditions plus partition-related ranges they work great ( a big table
forces index scans, which due to correlation are fast, but indexed
anyways, plus filters on extra conditions, partitions normally go to
sequential partition scans plus filters, and sequential scans are way
faster, plus the normal queries go normally to the last things
inserted, so partitions help to keep them cached ).

For queries like the ones you've timed/shown ( partition on an integer
PK of unknown origin, queries for single values ( IN queries are
normally just several single repeated ) or small ranges, big table is
normally gonna beat partition hands down ( except if you have
locality, like you are inserting serials and querying frequently in
the vicinity of the inserted ones, in this case partitions keeps used
tables small and cacheable and may give you a small edge ).

 I am using PostgreSQL 9.3 and the database cluster is spread over 2X3TB 
 external drives with write caching.

Well, from your last measurements it seems your disk systems is awful
for database work. I do not know what you mean by external drives (
eSata? firewire? Usb2? usb3? also, any kind of volume management ) but
in your fist query:


 db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id 
 IN(10005000,105000,205000,305000);
  Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 
 rows=6726 width=66) (actual time=52.226..288.700 rows=6 loops=1)
Index Cond: (id = ANY 
 ('{10005000,105000,205000,305000}'::bigint[]))
  Total runtime: 288.732 ms
 --Second attempt;
 db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id 
 IN(10005000,105000,205000,305000);
  Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 
 rows=6726 width=66) (actual time=0.014..0.035 rows=6 loops=1)
Index Cond: (id = ANY 
 ('{10005000,105000,205000,305000}'::bigint[]))
  Total runtime: 0.056 ms

288 ms for a query which should do 8-10 disk reads seems too slow to
me. And you get nearly the same on the second cache. I would normally
expect 100ms for any reasonable drive, and 50 for any db tuned
disks. I do not remember the exact parameters, but if your samples
reprensent your future workload you need to tune well for disk access
time. It's specially noticeable in the last example ( query with ID
from a subquery ), where you got the times:

 --Select 
 ids-
 --Explain analyse on original table for a query that will look into one 
 partition on the new table but list of ids provided through a select statement
 --First attempt;
  Total runtime: 2290.122 ms
 --Second attempt;
  Total runtime: 26.005 ms

Warm caches help you a lot here.

 --Explain analyse on partitioned table for a query that will look into one 
 partition but list of ids provided through a select statement
 --First attempt;
  Total runtime: 19142.983 ms
 --Second attempt;
  Total runtime: 1383.929 ms

And here too, and also, as parition means always hitting more blocks (
see it in reverse, if you collapse partitions maybe some data ends in
the same block and you save some reads, or not, but you will never be
worse ), slow disks hurt you more.

One last remark. I have not seen your data, but from what you've told
and the shown queries I would go for the 

Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-25 Thread Daniel Begin
Thank for your patience :-)

- About using PgAdmin, anecdotal problems or not, I did the whole tests again 
in plain postgresql.

- About running queries once or not, Bill and Francisco both pointed out 
somehow that I should run each query multiple times to get appropriate 
statistics. I did it for all queries - First trial always longer, all other 
stabilise around the same values. The EXPLAIN ANALYSE for first and second 
trial on each query I ran on original table and on the partitioned one can be 
found below.

However, in my case, I will have to run most of my queries only once since I 
simply need to extract sample data for a research topic - there is no 
insert/update in the DB (that is why I thought looking at first trial was more 
appropriate). 

- About adding the exclusion check constraint, thank for remembering me such a 
simple thing that could have caused all this!-) but sadly, it was set to 
partition, as expected. However, I have decided to run all the queries after 
having set the parameter to ON and restarted the database, just in case.

Even after doing all this, I did not find any improvement in execution times 
between my original fat table and the partitioned version (sometime even 
worst). If partitioning the table has improved significantly queries running 
times, I could have partitioned the tables differently to accommodate other 
query types I will have to run later in my research (I have the same problem 
for half a dozen tables). 

Since it does not seem that partitioning will do the job, I will get back to 
the original table to run my queries...

However, just in case someone knows a magical trick that can improve 
significantly the speed of my queries (but haven't told me yet!-) here are the 
details about the concerned table/indexes 

(Using https://wiki.postgresql.org/wiki/Index_Maintenance query ...)
Number of records: 387013
Table size: 369GB
Indexes size: 425GB
 - nodes_idversion_pk: 125GB 
 - nodes_changesetid_idx: 86GB
 - nodes_geom_idx: 241GB

Each record has 3 bigint, 2 boolean, 1 timestamp and 1 geography type.

I am running all this on my personal PC:  Windows 64b, i7 chip, 16GB ram.
I am using PostgreSQL 9.3 and the database cluster is spread over 2X3TB 
external drives with write caching.

Best regards,
Daniel


Results/explain/analyse follow...

--Constant 
ids-
--Explain analyse on original table for a query that will look into different 
partitions on the new table
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id 
IN(10005000,105000,205000,305000);
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 
rows=6726 width=66) (actual time=52.226..288.700 rows=6 loops=1)
   Index Cond: (id = ANY 
('{10005000,105000,205000,305000}'::bigint[]))
 Total runtime: 288.732 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id 
IN(10005000,105000,205000,305000);
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 
rows=6726 width=66) (actual time=0.014..0.035 rows=6 loops=1)
   Index Cond: (id = ANY 
('{10005000,105000,205000,305000}'::bigint[]))
 Total runtime: 0.056 ms

-- Explain analyse on partitioned table for a query that will look into 
different partitions
--First attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id 
IN(10005000,105000,205000,305000); 
 Append  (cost=0.00..933.40 rows=223 width=66) (actual time=108.903..287.068 
rows=6 loops=1)
   -  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66) (actual 
time=0.001..0.001 rows=0 loops=1)
 Filter: (id = ANY 
('{10005000,105000,205000,305000}'::bigint[]))
   -  Index Scan using nodes01_idversion_pk on nodes_01  (cost=0.57..622.78 
rows=156 width=66) (actual time=108.900..108.916 rows=1 loops=1)
 Index Cond: (id = ANY 
('{10005000,105000,205000,305000}'::bigint[]))
   -  Index Scan using nodes38_idversion_pk on nodes_38  (cost=0.57..138.25 
rows=31 width=66) (actual time=89.523..89.543 rows=1 loops=1)
 Index Cond: (id = ANY 
('{10005000,105000,205000,305000}'::bigint[]))
   -  Index Scan using nodes63_idversion_pk on nodes_63  (cost=0.57..119.01 
rows=26 width=66) (actual time=49.978..49.998 rows=3 loops=1)
 Index Cond: (id = ANY 
('{10005000,105000,205000,305000}'::bigint[]))
   -  Index Scan using nodes85_idversion_pk on nodes_85  (cost=0.57..53.37 
rows=9 width=66) (actual time=38.600..38.603 rows=1 loops=1)
 Index Cond: (id = ANY 
('{10005000,105000,205000,305000}'::bigint[]))
 Total runtime: 287.144 ms
--Second attempt;
db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id 
IN(10005000,105000,205000,305000); 
 Append  (cost=0.00..933.40 rows=223 width=66) (actual time=0.012..0.065 rows=6 
loops=1)
   -  Seq 

Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-24 Thread Francisco Olarte
Hi Daniel:

Bill Moran already pointed out some things, I more or less agree with him

On Sun, May 24, 2015 at 12:16 AM, Daniel Begin jfd...@hotmail.com wrote:
..
 I use pgadmin_III sql window.

I'll just point from my experience. At work, more than three fourths
of the problems I've had to help my pgadmin using colleagues solve
where due to their usage of pgadmin. Nearly every time they did dome
db administration task, they did it poorly and begun to do it
correctly when using plain postgresql. This may be anecdotal.


 I write the following query (I have changed
 the id to make sure it does not use previous results still in memory)...

That is exactly the opposite of what you should do, unless you are
able to thorougly clean the server caches ( not as easy as it sounds )
between runs. You want warm caches first, to account for the times in
planning, hashing and joining and so. Once you've got that part ready,
you can move on to real queries, busy server, etc for more fine
tuning.


 Now I select explain query from the menu and I get the following result...
...
 Now, I select run and I get one record as a result and the following
 message in history tab...
.

As I never use pgadmin, I cannot tell you too much, but one thing I
know is NONE of MY users was able to tell me what pgadmin does when
you hit run and/or explain query. OTOH, I know what psql does. And, if
you use EXPLAIN ANALYZE, as suggested by my previous message, you'll
be better of ( as in this case you just get a single query result from
the server and you do not have to care about what your tool of choice
does in the middle ).

lotta deletions,

 I really need to get much faster results with my queries on this large table
 and partitioning was my last option...

Make sure you are not having an XY-problem. Also, from the type of
queries you've shown us, I do not think partitioning is the way to go.
It will only give you marginal improvements when querying for single
or range of ids ( and small ones if your ranges of ids are somehow
clustered in your tables ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-24 Thread Francisco Olarte
Hi Daniel:


On Sat, May 23, 2015 at 8:37 PM, Daniel Begin jfd...@hotmail.com wrote:
 Following Francisco suggestion, I was able to do some tests earlier this 
 morning when the partitioning process completed and all the resulting tables 
 analyzed.

 Here is what I got on both the original table and its partitioned counterpart 
 while running the same queries. I tested them only for a couple of values but 
 in summary...
 Using a constant id:
 All the queries I tried took longer on the partitioned table! I got similar 
 results for multiple records using IN (id value1, id value2 ...)
 Using a range of ids:
 Surprisingly again, all the queries I tried took longer on the partitioned 
 table!
 Using a list of ids from a select clause:
 More surprisingly, the queries I tried took less time on the partitioned 
 table at least when using the primary key. Using an indexed field took so 
 long compared to the old table that I cancelled the execution for the new one!

 Guess what, I will get back to my old fat table unless someone tells me I 
 missed something obvious!

 Note: Tables/indexes description, queries and execution plans are below.






 Tables/indexes description 
 --
 The original table has 387013 records. Primary key/index on each 
 partition queries are
 ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, 
 version);
 CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id);

 The partitioned table has 387013 records distributed over 87 partitions. 
 Primary key/index on each partition queries are
 ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY 
 (id, version);
 CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id);
 Where xx is the partition's number suffix

It is missing here, but I supose you did not forget to add the
exclusion check constraint.


 constant id 
 ---
 select * from oldtable where id=123456789;
 Index Scan using oldtable_idversion_pk on oldtable  (cost=0.70..4437.15 
 rows=1682 width=66)
   Index Cond: (id = 123456789::bigint)
 --Total query runtime: 62 ms. 1 rows retrieved
 select * from newtable where id=123456789;
 Append  (cost=0.00..20.19 rows=5 width=66)
   -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
 Filter: (id = 123456789::bigint)
   -  Index Scan using newtable72_idversion_pk on newtable_72  
 (cost=0.56..20.19 rows=4 width=66)
 Index Cond: (id = 123456789::bigint)
 --Total query runtime: 156 ms. 1 rows retrieved

This is not surprissing, partitions never help much against that type
of queries, with 87 partitions index are not going to be much
shallower and the optimizer has a lot more of work to do. Just a
couple points. When doing a lot of sequential queries on nearby ids (
on the same partition ) it will help ( better data locality,
possibility of having more relevant pages in the cache ). Also, how
did you do your timinngs? ( you should repeat each query 2 or 3 times,
to see if the times go down a lot due to caching and, if you can, time
a couple of explains ( plain, not analyze ) to see how much time the
planner takes. As an example, the explain I sent you in my previous
message takes between 20 and 30 milliseconds and I only have 17 live
partitions ( I routinely un-inherit and move to another schema
partitions older than a two years, to help the planner, and re-inherit
them if needed ) ). Also, your queries seem to be very, very slow for
a single indexed fetch, if I do this on my tables ( similar query to
what I sent, modified to hit just 1 partition ):
select count(*) from carrier_cdrs where setup between
'20150107T123456' and '20150107T22';
It takes 40 ms in the first go, drops down to 27 after that, and I
have ~15 ms RTT to the server ( which is more or less the time
reported when I do a plain 'select 1' ). I mean, I suspect your
measurements are not good, as they seem too slow. Also, when I use
explain analyze on the previous query the server reports 13 ms, which
is more or less one RTT less ( as the server starts measuring after
receiving the query and ends before sending the reply ).

Another thing, how are you getting the explain results ? I would urge
you to use explain analyze, as, apart of seeming too slow, the
stimations seem to be way off ( explain analyze sends you the
estimated and actual results, so it is easy to see, and executes
everything on the server, so result transmission time, which is not of
use as it has to be the same for every method of calculating the same
correct result, is not reported ). For my query I get this:

QUERY PLAN

 Aggregate  

Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-23 Thread Daniel Begin
Hello Bill, 
You wrote that my testing methodology is flawed - I hope you are right!  

However, I am a bit confused about your comments. Yes, I did edited the name
of the tables for clarity but if I miss the point I, I will do it again as I
am writing without modifying anything. Here is the procedure I follow and
results...

I use pgadmin_III sql window. I write the following query (I have changed
the id to make sure it does not use previous results still in memory)...

Select * from nodes where id=345678912; -- nodes is the real partitioned
table name

Now I select explain query from the menu and I get the following result...
Append  (cost=0.00..384.08 rows=99 width=66)
  -  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66)
Filter: (id = 345678912)
  -  Index Scan using nodes19_idversion_pk on nodes_19  (cost=0.56..384.08
rows=98 width=66)
Index Cond: (id = 345678912)

Now, I select run and I get one record as a result and the following
message in history tab...
-- Executing query:
Select * from nodes where id=345678912; 
Total query runtime: 62 ms.
1 row retrieved.

Now, if I use the same query on the original table using the same procedure,
here is what I get...
Select * from old_nodes where id=345678912; -- old_nodes is the real
original table name

Explain gives me the following
Index Scan using nodes_idversion_pk on old_nodes  (cost=0.70..4437.15
rows=1682 width=66)
  Index Cond: (id = 345678912)

Running the query gives me the same record with the following message in
history tab...
-- Executing query:
select * from old_nodes where id=345678912; 
Total query runtime: 62 ms.
1 row retrieved.

This time, the history tab shows that both took the same time to run (an
improvement!?)
Let's try this one using the same procedure...

Select * from old_nodes where id IN
(1005,15,25,35)
Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18
rows=6726 width=66)
  Index Cond: (id = ANY
('{1005,15,25,35}'::bigint[]))
-- Executing query:
Select * from old_nodes where id IN
(1005,15,25,35)
Total query runtime: 171 ms.
5 rows retrieved.

Select * from nodes where id IN (1005,15,25,35)
Append  (cost=0.00..933.40 rows=223 width=66)
  -  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66)
Filter: (id = ANY
('{1005,15,25,35}'::bigint[]))
  -  Index Scan using nodes01_idversion_pk on nodes_01  (cost=0.57..622.78
rows=156 width=66)
Index Cond: (id = ANY
('{1005,15,25,35}'::bigint[]))
  -  Index Scan using nodes38_idversion_pk on nodes_38  (cost=0.57..138.25
rows=31 width=66)
Index Cond: (id = ANY
('{1005,15,25,35}'::bigint[]))
  -  Index Scan using nodes63_idversion_pk on nodes_63  (cost=0.57..119.01
rows=26 width=66)
Index Cond: (id = ANY
('{1005,15,25,35}'::bigint[]))
  -  Index Scan using nodes85_idversion_pk on nodes_85  (cost=0.57..53.37
rows=9 width=66)
Index Cond: (id = ANY
('{1005,15,25,35}'::bigint[]))
-- Executing query:
Select * from nodes where id IN (1005,15,25,35)
Total query runtime: 140 ms.
5 rows retrieved.

This time the history tab shows that the query was shorter to run on
partitioned table (a real improvement!?)

I know, this is different from what I referred to in my original email (at
least both shows similar running time) but I swear, I did not change the
times when editing table names!-)

Do you see any glitch/flaw in the procedure I am using?
Someone has an idea about the problem behind this unexpected behavior? 

I really need to get much faster results with my queries on this large table
and partitioning was my last option...

Best regards,
Daniel

-Original Message-
From: pgsql-general-ow...@postgresql.org
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Bill Moran
Sent: May-23-15 15:23
To: Daniel Begin
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] FW: Constraint exclusion in partitions


A large portion of why you describe below is the exact opposite of my own
testing (unfortunately, I don't have the actual test results any more
because I did the tests for a former employer).

In my tests, single lookups against the same column being used to partition
improved performance in direct proportion to the number of partitions. I.e.
if the tables are partitioned on id, and the lookup is for id, and the table
has 10 partitions, the query is 10x faster on the partitioned version than
the non-partitioned verison.

Queries against indexes not partitioned were slightly slower in my tests,
but nowhere near the degree that you're showing below.

I can't help but think that your testing methodology is flawed, but since
you're not showing us what you actually did, it's difficult to be sure. See
below for some specifics on what I'm

Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-23 Thread Daniel Begin
I am working with postgresql 9.3 and I understand from the documentation that 
constraint_exclusion is set to “partition” by default. Looking at my 
postgres.conf file, the concerned line is “#constraint_exclusion = partition”. 

 

Furthermore, the execution plan shows that constraint_exclusion was used at 
least for constant id and range of ids

 

Daniel

 

 

 

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of melvin6925
Sent: May-23-15 15:15
To: Daniel Begin; pgsql-general@postgresql.org
Subject: Re: [GENERAL] FW: Constraint exclusion in partitions

 

Did you remember to set  constraint_exclusion = on and reload the .conf ?

 

 

 

 

Sent via the Samsung Galaxy S® 6, an ATT 4G LTE smartphone

 Original message 
From: Daniel Begin jfd...@hotmail.com 
Date: 05/23/2015 14:37 (GMT-05:00) 
To: pgsql-general@postgresql.org 
Subject: Re: [GENERAL] FW: Constraint exclusion in partitions 

Following Francisco suggestion, I was able to do some tests earlier this 
morning when the partitioning process completed and all the resulting tables 
analyzed.

Here is what I got on both the original table and its partitioned counterpart 
while running the same queries. I tested them only for a couple of values but 
in summary...

Using a constant id: 
All the queries I tried took longer on the partitioned table! I got similar 
results for multiple records using IN (id value1, id value2 ...)

Using a range of ids: 
Surprisingly again, all the queries I tried took longer on the partitioned 
table!

Using a list of ids from a select clause: 
More surprisingly, the queries I tried took less time on the partitioned table 
at least when using the primary key. Using an indexed field took so long 
compared to the old table that I cancelled the execution for the new one!

Guess what, I will get back to my old fat table unless someone tells me I 
missed something obvious!
Daniel

Note: Tables/indexes description, queries and execution plans are below.






Tables/indexes description 
--
The original table has 387013 records. Primary key/index on each partition 
queries are
ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, 
version); 
CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id);

The partitioned table has 387013 records distributed over 87 partitions. 
Primary key/index on each partition queries are
ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY (id, 
version);
CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id);
Where xx is the partition's number suffix

constant id 
---
select * from oldtable where id=123456789; 
Index Scan using oldtable_idversion_pk on oldtable  (cost=0.70..4437.15 
rows=1682 width=66)
  Index Cond: (id = 123456789::bigint)
--Total query runtime: 62 ms. 1 rows retrieved


select * from newtable where id=123456789; 
Append  (cost=0.00..20.19 rows=5 width=66)
  -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
Filter: (id = 123456789::bigint)
  -  Index Scan using newtable72_idversion_pk on newtable_72  
(cost=0.56..20.19 rows=4 width=66)
Index Cond: (id = 123456789::bigint)
--Total query runtime: 156 ms. 1 rows retrieved

I got similar results for multiple records...
select * from oldtable where id IN(1000,10,20,30); 
Index Scan using oldtable_idversion_pk on oldtable  (cost=0.71..17739.18 
rows=6726 width=66)
  Index Cond: (id = ANY 
('{1000,10,20,30}'::bigint[]))
--Total query runtime: 187 ms. 4 rows retrieved

select * from newtable where id IN(1000,10,20,30); 
Append  (cost=0.00..933.40 rows=223 width=66)
  -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
Filter: (id = ANY 
('{1000,10,20,30}'::bigint[]))
  -  Index Scan using newtable01_idversion_pk on newtable_01  
(cost=0.57..622.78 rows=156 width=66)
Index Cond: (id = ANY 
('{1000,10,20,30}'::bigint[]))
...
  -  Index Scan using newtable85_idversion_pk on newtable_85  
(cost=0.57..53.37 rows=9 width=66)
Index Cond: (id = ANY 
('{1000,10,20,30}'::bigint[]))
--Total query runtime: 421 ms. 4 rows retrieved


range of ids 
---
select * from oldtable where id between 152249 and 152349;
Index Scan using oldtable_idversion_pk on oldtable  (cost=0.70..383.51 
rows=144 width=66)
  Index Cond: ((id = 152249) AND (id = 152349))
Total query runtime: 47 ms. 53 rows retrieved.

select * from newtable where id between 152249 and 152349;
Append  (cost=0.00..408.16 rows=104

Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-23 Thread Daniel Begin
Oops, I was responding to the email below from melvin6925

 

 

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of David G. Johnston
Sent: May-23-15 19:32
To: Daniel Begin
Cc: melvin6925; pgsql-general@postgresql.org
Subject: Re: [GENERAL] [NOVICE] Constraint exclusion in partitions

 

On Saturday, May 23, 2015, Daniel Begin jfd...@hotmail.com wrote:

I am working with postgresql 9.3 and I understand from the documentation that 
constraint_exclusion is set to “partition” by default. Looking at my 
postgres.conf file, the concerned line is “#constraint_exclusion = partition”. 

Furthermore, the execution plan shows that constraint_exclusion was used at 
least for constant id and range of ids

What is your question/concern? 

 

 

Did you remember to set  constraint_exclusion = on and reload the .conf ?

 

 

 

From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of melvin6925
Sent: May-23-15 15:15
To: Daniel Begin; pgsql-general@postgresql.org
Subject: Re: [GENERAL] FW: Constraint exclusion in partitions

 

Did you remember to set  constraint_exclusion = on and reload the .conf ?

 

 

 

 

Sent via the Samsung Galaxy S® 6, an ATT 4G LTE smartphone

 Original message 
From: Daniel Begin jfd...@hotmail.com 
Date: 05/23/2015 14:37 (GMT-05:00) 
To: pgsql-general@postgresql.org 
Subject: Re: [GENERAL] FW: Constraint exclusion in partitions 

Following Francisco suggestion, I was able to do some tests earlier this 
morning when the partitioning process completed and all the resulting tables 
analyzed.

Here is what I got on both the original table and its partitioned counterpart 
while running the same queries. I tested them only for a couple of values but 
in summary...

Using a constant id: 
All the queries I tried took longer on the partitioned table! I got similar 
results for multiple records using IN (id value1, id value2 ...)

Using a range of ids: 
Surprisingly again, all the queries I tried took longer on the partitioned 
table!

Using a list of ids from a select clause: 
More surprisingly, the queries I tried took less time on the partitioned table 
at least when using the primary key. Using an indexed field took so long 
compared to the old table that I cancelled the execution for the new one!

Guess what, I will get back to my old fat table unless someone tells me I 
missed something obvious!
Daniel

Note: Tables/indexes description, queries and execution plans are below.






Tables/indexes description 
--
The original table has 387013 records. Primary key/index on each partition 
queries are
ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, 
version); 
CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id);

The partitioned table has 387013 records distributed over 87 partitions. 
Primary key/index on each partition queries are
ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY (id, 
version);
CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id);
Where xx is the partition's number suffix

constant id 
---
select * from oldtable where id=123456789; 
Index Scan using oldtable_idversion_pk on oldtable  (cost=0.70..4437.15 
rows=1682 width=66)
  Index Cond: (id = 123456789::bigint)
--Total query runtime: 62 ms. 1 rows retrieved


select * from newtable where id=123456789; 
Append  (cost=0.00..20.19 rows=5 width=66)
  -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
Filter: (id = 123456789::bigint)
  -  Index Scan using newtable72_idversion_pk on newtable_72  
(cost=0.56..20.19 rows=4 width=66)
Index Cond: (id = 123456789::bigint)
--Total query runtime: 156 ms. 1 rows retrieved

I got similar results for multiple records...
select * from oldtable where id IN(1000,10,20,30); 
Index Scan using oldtable_idversion_pk on oldtable  (cost=0.71..17739.18 
rows=6726 width=66)
  Index Cond: (id = ANY 
('{1000,10,20,30}'::bigint[]))
--Total query runtime: 187 ms. 4 rows retrieved

select * from newtable where id IN(1000,10,20,30); 
Append  (cost=0.00..933.40 rows=223 width=66)
  -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
Filter: (id = ANY 
('{1000,10,20,30}'::bigint[]))
  -  Index Scan using newtable01_idversion_pk on newtable_01  
(cost=0.57..622.78 rows=156 width=66)
Index Cond: (id = ANY 
('{1000,10,20,30}'::bigint[]))
...
  -  Index Scan using newtable85_idversion_pk on newtable_85  
(cost=0.57..53.37 rows=9 width=66)
Index Cond: (id = ANY 
('{1000,10,20,30}'::bigint[]))
--Total query runtime: 421

Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-23 Thread Bill Moran
On Sat, 23 May 2015 18:16:43 -0400
Daniel Begin jfd...@hotmail.com wrote:

 Hello Bill, 
 You wrote that my testing methodology is flawed - I hope you are right!  
 
 However, I am a bit confused about your comments. Yes, I did edited the name
 of the tables for clarity but if I miss the point I, I will do it again as I
 am writing without modifying anything. Here is the procedure I follow and
 results...
 
 I use pgadmin_III sql window. I write the following query (I have changed
 the id to make sure it does not use previous results still in memory)...

I didn't realize you were using PGAdmin ... that explains some of it ...
see below:

 Select * from nodes where id=345678912; -- nodes is the real partitioned
 table name
 
 Now I select explain query from the menu and I get the following result...
 Append  (cost=0.00..384.08 rows=99 width=66)
   -  Seq Scan on nodes  (cost=0.00..0.00 rows=1 width=66)
 Filter: (id = 345678912)
   -  Index Scan using nodes19_idversion_pk on nodes_19  (cost=0.56..384.08
 rows=98 width=66)
 Index Cond: (id = 345678912)
 
 Now, I select run and I get one record as a result and the following
 message in history tab...
 -- Executing query:
 Select * from nodes where id=345678912; 
 Total query runtime: 62 ms.
 1 row retrieved.
 
 Now, if I use the same query on the original table using the same procedure,
 here is what I get...
 Select * from old_nodes where id=345678912; -- old_nodes is the real
 original table name
 
 Explain gives me the following
 Index Scan using nodes_idversion_pk on old_nodes  (cost=0.70..4437.15
 rows=1682 width=66)
   Index Cond: (id = 345678912)
 
 Running the query gives me the same record with the following message in
 history tab...
 -- Executing query:
 select * from old_nodes where id=345678912; 
 Total query runtime: 62 ms.
 1 row retrieved.
 
 This time, the history tab shows that both took the same time to run (an
 improvement!?)

If your environment is providing such wildly variant results, then
you need to start running multiple tests instead of assuming that a single
run of a query is indicative of a pattern.

-- 
Bill Moran


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-23 Thread Francisco Olarte
Hi Daniel

On Fri, May 22, 2015 at 7:21 PM, Daniel Begin jfd...@hotmail.com wrote:
 I have split a large table (billions of records) into multiple partitions,
 hoping the access would be faster. I used an ID to make partitions check
 (check (id = 100 AND id  200)…) and created over 80 tables (children) that
 are now filled with data.
...
 I understand that the following query will use constraint exclusion and will
 run faster…
 a-  Select * from parent_table where id =; -- using a constant
 But how constraint exclusion would react with the following queries …
 b-  Select * from parent_table where id between 2345 and 6789; -- using
 a range of ids
 c-   Select * from parent_table where id in(select ids from
 anothertable); -- using a list of ids from a select

Given you have already partitioned it, why don't you just use explain
[ analyze ] on the queries? I.e., in one of my tables, partitioned
monthly by a timestamp ( with time zone ) field I get ( even if I
never use between, a closed interval, for a continuous like type like
timestamp, the optimizer clearly shows it's transfroaming it to the
equivalent AND condition ):


explain select * from carrier_cdrs where setup between
'20150107T123456' and '20150322T22';


QUERY PLAN
---
 Append  (cost=0.00..82202.41 rows=2346599 width=74)
   -  Seq Scan on carrier_cdrs  (cost=0.00..0.00 rows=1 width=184)
 Filter: ((setup = '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup = '2015-03-22 22:22:22+01'::timestamp with time
zone))
   -  Seq Scan on carrier_cdrs_201501  (cost=0.00..30191.10
rows=816551 width=74)
 Filter: ((setup = '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup = '2015-03-22 22:22:22+01'::timestamp with time
zone))
   -  Seq Scan on carrier_cdrs_201502  (cost=0.00..25277.45
rows=872830 width=74)
 Filter: ((setup = '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup = '2015-03-22 22:22:22+01'::timestamp with time
zone))
   -  Seq Scan on carrier_cdrs_201503  (cost=0.00..26733.85
rows=657217 width=74)
 Filter: ((setup = '2015-01-07 12:34:56+01'::timestamp with
time zone) AND (setup = '2015-03-22 22:22:22+01'::timestamp with time
zone))
(9 rows)

 Since I mostly use queries of type b and c, I am wondering if partitioning
 the large table was appropriate and if the queries are going to be longer to
 run…

The problem is gonna be all the extra conditions, so I'll check real
queries. My bet is in a query EXACTLY like b it will use constraint
exclusion, and can be potentially faster if your interval are for just
100 ids, but why speculate when YOU can measure?

Also think if you touch 67-23+1=45 partitions and the DB has to check
other indexed fields it is a lot of work. As I said, it depend on your
actual data, actual indexes, and actual queries, just measure it. In
my excample table I partition the data by the TS, which is the single
indexed field and my tests showed it was faster that way ( but my
queries are normally big scans of date ranges or more selective ones
with narrow TS conditions, and I measured them ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-23 Thread Bill Moran

A large portion of why you describe below is the exact opposite of my
own testing (unfortunately, I don't have the actual test results any
more because I did the tests for a former employer).

In my tests, single lookups against the same column being used to
partition improved performance in direct proportion to the number of
partitions. I.e. if the tables are partitioned on id, and the lookup
is for id, and the table has 10 partitions, the query is 10x faster
on the partitioned version than the non-partitioned verison.

Queries against indexes not partitioned were slightly slower in my
tests, but nowhere near the degree that you're showing below.

I can't help but think that your testing methodology is flawed, but
since you're not showing us what you actually did, it's difficult to
be sure. See below for some specifics on what I'm concerned that you
might be doing wrong ...

On Sat, 23 May 2015 14:37:25 -0400
Daniel Begin jfd...@hotmail.com wrote:

 Following Francisco suggestion, I was able to do some tests earlier this 
 morning when the partitioning process completed and all the resulting tables 
 analyzed.
 
 Here is what I got on both the original table and its partitioned counterpart 
 while running the same queries. I tested them only for a couple of values but 
 in summary...
 
 Using a constant id: 
 All the queries I tried took longer on the partitioned table! I got similar 
 results for multiple records using IN (id value1, id value2 ...)
 
 Using a range of ids: 
 Surprisingly again, all the queries I tried took longer on the partitioned 
 table!
 
 Using a list of ids from a select clause: 
 More surprisingly, the queries I tried took less time on the partitioned 
 table at least when using the primary key. Using an indexed field took so 
 long compared to the old table that I cancelled the execution for the new one!
 
 Guess what, I will get back to my old fat table unless someone tells me I 
 missed something obvious!
 Daniel
 
 Note: Tables/indexes description, queries and execution plans are below.
 
 
 
 
 
 
 Tables/indexes description 
 --
 The original table has 387013 records. Primary key/index on each 
 partition queries are
 ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, 
 version); 
 CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id);
 
 The partitioned table has 387013 records distributed over 87 partitions. 
 Primary key/index on each partition queries are
 ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY 
 (id, version);
 CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id);
 Where xx is the partition's number suffix
 
 constant id 
 ---
 select * from oldtable where id=123456789; 
 Index Scan using oldtable_idversion_pk on oldtable  (cost=0.70..4437.15 
 rows=1682 width=66)
   Index Cond: (id = 123456789::bigint)
 --Total query runtime: 62 ms. 1 rows retrieved
 
 
 select * from newtable where id=123456789; 
 Append  (cost=0.00..20.19 rows=5 width=66)
   -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
 Filter: (id = 123456789::bigint)
   -  Index Scan using newtable72_idversion_pk on newtable_72  
 (cost=0.56..20.19 rows=4 width=66)
 Index Cond: (id = 123456789::bigint)
 --Total query runtime: 156 ms. 1 rows retrieved

Notice in these results that you're not showing the command that was executed.
The output is mostly likely from the command
ANALYZE select * from newtable where id=123456789; but that's not the command
you claim that you ran. In any event, the analyze output doesn't line up with
the times you claim: i.e. Analyze is showing that the first query should take
about 4437 time units to complete, and the second one should take about 20
time units, yet you claim the second one is slower. The other queries below
exhibit a similar pattern.

Are you sure you're not timing ANALYZE itself instead of the query? Because
timing ANALYZE select * from newtable where id=123456789; is not going to
be timing the actual time the query took to run. I would certainly expect the
_planning_ of a query against partitioned tables to take longer than non-
partitioned, but I would also expect the execution time to be the opposite.
Hence my theory that you've accidentally timed the ANALYZE instead of the
actual running of the query. Naturally, the total query time is planning +
execution, and my experience shows that the loss in planning speed is more
than made up for by the gain in execution speed.

Perhaps you should show us the exact output of one of your tests, without
editorializing.

 I got similar results for multiple records...
 select * from oldtable where id 
 IN(1000,10,20,30); 
 Index Scan using oldtable_idversion_pk on oldtable  

Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-23 Thread Daniel Begin
Following Francisco suggestion, I was able to do some tests earlier this 
morning when the partitioning process completed and all the resulting tables 
analyzed.

Here is what I got on both the original table and its partitioned counterpart 
while running the same queries. I tested them only for a couple of values but 
in summary...

Using a constant id: 
All the queries I tried took longer on the partitioned table! I got similar 
results for multiple records using IN (id value1, id value2 ...)

Using a range of ids: 
Surprisingly again, all the queries I tried took longer on the partitioned 
table!

Using a list of ids from a select clause: 
More surprisingly, the queries I tried took less time on the partitioned table 
at least when using the primary key. Using an indexed field took so long 
compared to the old table that I cancelled the execution for the new one!

Guess what, I will get back to my old fat table unless someone tells me I 
missed something obvious!
Daniel

Note: Tables/indexes description, queries and execution plans are below.






Tables/indexes description 
--
The original table has 387013 records. Primary key/index on each partition 
queries are
ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, 
version); 
CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id);

The partitioned table has 387013 records distributed over 87 partitions. 
Primary key/index on each partition queries are
ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY (id, 
version);
CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id);
Where xx is the partition's number suffix

constant id 
---
select * from oldtable where id=123456789; 
Index Scan using oldtable_idversion_pk on oldtable  (cost=0.70..4437.15 
rows=1682 width=66)
  Index Cond: (id = 123456789::bigint)
--Total query runtime: 62 ms. 1 rows retrieved


select * from newtable where id=123456789; 
Append  (cost=0.00..20.19 rows=5 width=66)
  -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
Filter: (id = 123456789::bigint)
  -  Index Scan using newtable72_idversion_pk on newtable_72  
(cost=0.56..20.19 rows=4 width=66)
Index Cond: (id = 123456789::bigint)
--Total query runtime: 156 ms. 1 rows retrieved

I got similar results for multiple records...
select * from oldtable where id IN(1000,10,20,30); 
Index Scan using oldtable_idversion_pk on oldtable  (cost=0.71..17739.18 
rows=6726 width=66)
  Index Cond: (id = ANY 
('{1000,10,20,30}'::bigint[]))
--Total query runtime: 187 ms. 4 rows retrieved

select * from newtable where id IN(1000,10,20,30); 
Append  (cost=0.00..933.40 rows=223 width=66)
  -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
Filter: (id = ANY 
('{1000,10,20,30}'::bigint[]))
  -  Index Scan using newtable01_idversion_pk on newtable_01  
(cost=0.57..622.78 rows=156 width=66)
Index Cond: (id = ANY 
('{1000,10,20,30}'::bigint[]))
...
  -  Index Scan using newtable85_idversion_pk on newtable_85  
(cost=0.57..53.37 rows=9 width=66)
Index Cond: (id = ANY 
('{1000,10,20,30}'::bigint[]))
--Total query runtime: 421 ms. 4 rows retrieved


range of ids 
---
select * from oldtable where id between 152249 and 152349;
Index Scan using oldtable_idversion_pk on oldtable  (cost=0.70..383.51 
rows=144 width=66)
  Index Cond: ((id = 152249) AND (id = 152349))
Total query runtime: 47 ms. 53 rows retrieved.

select * from newtable where id between 152249 and 152349;
Append  (cost=0.00..408.16 rows=104 width=66)
  -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
Filter: ((id = 152249) AND (id = 152349))
  -  Index Scan using newtable51_idversion_pk on newtable_51  
(cost=0.56..183.52 rows=46 width=66)
Index Cond: ((id = 152249) AND (id = 152349))
  -  Index Scan using newtable52_idversion_pk on newtable_52  
(cost=0.56..224.64 rows=57 width=66)
Index Cond: ((id = 152249) AND (id = 152349))
Total query runtime: 78 ms. 53 rows retrieved.


list of ids from a select clause 
---
--Subset provides 4 ids similar but not identical to the previous query
select * from oldtable where id IN (select * from subset); 
Nested Loop  (cost=37.45..886298.00 rows=2028512050 width=66)
  -  HashAggregate  (cost=36.75..38.75 rows=200 width=8)
-  Seq Scan on subset  (cost=0.00..31.40 rows=2140 width=8)
  -  Index 

Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-23 Thread melvin6925
Did you remember to set  constraint_exclusion = on and reload the .conf ?



Sent via the Samsung Galaxy S® 6, an ATT 4G LTE smartphone Original 
message 
From: Daniel Begin jfd...@hotmail.com 
Date: 05/23/2015  14:37  (GMT-05:00) 
To: pgsql-general@postgresql.org 
Subject: Re: [GENERAL] FW: Constraint exclusion in partitions 

Following Francisco suggestion, I was able to do some tests earlier this 
morning when the partitioning process completed and all the resulting tables 
analyzed.

Here is what I got on both the original table and its partitioned counterpart 
while running the same queries. I tested them only for a couple of values but 
in summary...

Using a constant id: 
All the queries I tried took longer on the partitioned table! I got similar 
results for multiple records using IN (id value1, id value2 ...)

Using a range of ids: 
Surprisingly again, all the queries I tried took longer on the partitioned 
table!

Using a list of ids from a select clause: 
More surprisingly, the queries I tried took less time on the partitioned table 
at least when using the primary key. Using an indexed field took so long 
compared to the old table that I cancelled the execution for the new one!

Guess what, I will get back to my old fat table unless someone tells me I 
missed something obvious!
Daniel

Note: Tables/indexes description, queries and execution plans are below.






Tables/indexes description 
--
The original table has 387013 records. Primary key/index on each partition 
queries are
ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, 
version); 
CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id);

The partitioned table has 387013 records distributed over 87 partitions. 
Primary key/index on each partition queries are
ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY (id, 
version);
CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id);
Where xx is the partition's number suffix

constant id 
---
select * from oldtable where id=123456789; 
Index Scan using oldtable_idversion_pk on oldtable  (cost=0.70..4437.15 
rows=1682 width=66)
  Index Cond: (id = 123456789::bigint)
--Total query runtime: 62 ms. 1 rows retrieved


select * from newtable where id=123456789; 
Append  (cost=0.00..20.19 rows=5 width=66)
  -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
    Filter: (id = 123456789::bigint)
  -  Index Scan using newtable72_idversion_pk on newtable_72  
(cost=0.56..20.19 rows=4 width=66)
    Index Cond: (id = 123456789::bigint)
--Total query runtime: 156 ms. 1 rows retrieved

I got similar results for multiple records...
select * from oldtable where id IN(1000,10,20,30); 
Index Scan using oldtable_idversion_pk on oldtable  (cost=0.71..17739.18 
rows=6726 width=66)
  Index Cond: (id = ANY 
('{1000,10,20,30}'::bigint[]))
--Total query runtime: 187 ms. 4 rows retrieved

select * from newtable where id IN(1000,10,20,30); 
Append  (cost=0.00..933.40 rows=223 width=66)
  -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
    Filter: (id = ANY 
('{1000,10,20,30}'::bigint[]))
  -  Index Scan using newtable01_idversion_pk on newtable_01  
(cost=0.57..622.78 rows=156 width=66)
    Index Cond: (id = ANY 
('{1000,10,20,30}'::bigint[]))
...
  -  Index Scan using newtable85_idversion_pk on newtable_85  
(cost=0.57..53.37 rows=9 width=66)
    Index Cond: (id = ANY 
('{1000,10,20,30}'::bigint[]))
--Total query runtime: 421 ms. 4 rows retrieved


range of ids 
---
select * from oldtable where id between 152249 and 152349;
Index Scan using oldtable_idversion_pk on oldtable  (cost=0.70..383.51 
rows=144 width=66)
  Index Cond: ((id = 152249) AND (id = 152349))
Total query runtime: 47 ms. 53 rows retrieved.

select * from newtable where id between 152249 and 152349;
Append  (cost=0.00..408.16 rows=104 width=66)
  -  Seq Scan on newtable  (cost=0.00..0.00 rows=1 width=66)
    Filter: ((id = 152249) AND (id = 152349))
  -  Index Scan using newtable51_idversion_pk on newtable_51  
(cost=0.56..183.52 rows=46 width=66)
    Index Cond: ((id = 152249) AND (id = 152349))
  -  Index Scan using newtable52_idversion_pk on newtable_52  
(cost=0.56..224.64 rows=57 width=66)
    Index Cond: ((id = 152249) AND (id = 152349))
Total query runtime: 78 ms. 53 rows retrieved.


list of ids from a select clause

[GENERAL] FW: Constraint exclusion in partitions

2015-05-22 Thread Daniel Begin
Sent that on pgsql-novice list but did not get any answers yet. 

Maybe someone could help me understand here J

 

 

Hi all, 

 

I have split a large table (billions of records) into multiple partitions,
hoping the access would be faster. I used an ID to make partitions check
(check (id = 100 AND id  200).) and created over 80 tables (children) that
are now filled with data.  

 

However, after I did it, I read a second time the following sentence in the
documentation and started wondering what it actually means .  Constraint
exclusion only works when the query's WHERE clause contains constants (or
externally supplied parameters)

 

I understand that the following query will use constraint exclusion and will
run faster.  

a-  Select * from parent_table where id =; -- using a constant

 

But how constraint exclusion would react with the following queries .

b-  Select * from parent_table where id between 2345 and 6789; -- using
a range of ids

c-   Select * from parent_table where id in(select ids from
anothertable); -- using a list of ids from a select

 

Since I mostly use queries of type b and c, I am wondering if partitioning
the large table was appropriate and if the queries are going to be longer to
run. 

Thank in advance

 

Daniel

 

Doc: http://www.postgresql.org/docs/9.3/static/ddl-partitioning.html



Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-22 Thread David G. Johnston
On Fri, May 22, 2015 at 10:21 AM, Daniel Begin jfd...@hotmail.com wrote:

 But how constraint exclusion would react with the following queries …

 b-  Select * from parent_table where id between 2345 and 6789; --
 using a range of ids

​Not sure...
​

These are constants but I'm not sure how smart the planner is about
figuring out inequalities of this form.

I would guess yes but it should be easy enough to confirm this yourself
using explain...you already have the tables.

 c-   Select * from parent_table where id in(select ids from
 anothertable); -- using a list of ids from a select

Definitely no...

Constraint exclusion is done by the planner before data is read so there is
no possible way for data in a table to be used.​


As for performance the only way to know for sure is to test your usage
patterns and data.  Even without constraint exclusion partitioning can
provide benefits.

David J.