Re: [GENERAL] FW: Constraint exclusion in partitions
Hi Daniel: On Mon, May 25, 2015 at 10:03 PM, Daniel Begin jfd...@hotmail.com wrote: ... Even after doing all this, I did not find any improvement in execution times between my original fat table and the partitioned version (sometime even worst). If partitioning the table has improved significantly queries running times, I could have partitioned the tables differently to accommodate other query types I will have to run later in my research (I have the same problem for half a dozen tables). Since it does not seem that partitioning will do the job, I will get back to the original table to run my queries... Well, at least you've learnt some things about it and you can expect to repeat the measurements faster shoudl you need it. However, just in case someone knows a magical trick that can improve significantly the speed of my queries (but haven't told me yet!-) here are the details about the concerned table/indexes . Table size: 369GB Indexes size: 425GB I am running all this on my personal PC: Windows 64b, i7 chip, 16GB ram. Supposing you can dedicate about 12 Gb to shared buffers / caches, your caches are going to get trashed often with real work, that is why we recommended repeating the queries. Anyway, one last remark. Partition is not magic, it helps a lot depending on the access records. For workloads like mine they help a lot ( call records, where I insert frequently ( so position correlates strongly with indexes ), nearly never update ( and I work with high fill factors ), and query frequently for unindexed conditions plus partition-related ranges they work great ( a big table forces index scans, which due to correlation are fast, but indexed anyways, plus filters on extra conditions, partitions normally go to sequential partition scans plus filters, and sequential scans are way faster, plus the normal queries go normally to the last things inserted, so partitions help to keep them cached ). For queries like the ones you've timed/shown ( partition on an integer PK of unknown origin, queries for single values ( IN queries are normally just several single repeated ) or small ranges, big table is normally gonna beat partition hands down ( except if you have locality, like you are inserting serials and querying frequently in the vicinity of the inserted ones, in this case partitions keeps used tables small and cacheable and may give you a small edge ). I am using PostgreSQL 9.3 and the database cluster is spread over 2X3TB external drives with write caching. Well, from your last measurements it seems your disk systems is awful for database work. I do not know what you mean by external drives ( eSata? firewire? Usb2? usb3? also, any kind of volume management ) but in your fist query: db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(10005000,105000,205000,305000); Index Scan using nodes_idversion_pk on old_nodes (cost=0.71..17739.18 rows=6726 width=66) (actual time=52.226..288.700 rows=6 loops=1) Index Cond: (id = ANY ('{10005000,105000,205000,305000}'::bigint[])) Total runtime: 288.732 ms --Second attempt; db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(10005000,105000,205000,305000); Index Scan using nodes_idversion_pk on old_nodes (cost=0.71..17739.18 rows=6726 width=66) (actual time=0.014..0.035 rows=6 loops=1) Index Cond: (id = ANY ('{10005000,105000,205000,305000}'::bigint[])) Total runtime: 0.056 ms 288 ms for a query which should do 8-10 disk reads seems too slow to me. And you get nearly the same on the second cache. I would normally expect 100ms for any reasonable drive, and 50 for any db tuned disks. I do not remember the exact parameters, but if your samples reprensent your future workload you need to tune well for disk access time. It's specially noticeable in the last example ( query with ID from a subquery ), where you got the times: --Select ids- --Explain analyse on original table for a query that will look into one partition on the new table but list of ids provided through a select statement --First attempt; Total runtime: 2290.122 ms --Second attempt; Total runtime: 26.005 ms Warm caches help you a lot here. --Explain analyse on partitioned table for a query that will look into one partition but list of ids provided through a select statement --First attempt; Total runtime: 19142.983 ms --Second attempt; Total runtime: 1383.929 ms And here too, and also, as parition means always hitting more blocks ( see it in reverse, if you collapse partitions maybe some data ends in the same block and you save some reads, or not, but you will never be worse ), slow disks hurt you more. One last remark. I have not seen your data, but from what you've told and the shown queries I would go for the
Re: [GENERAL] FW: Constraint exclusion in partitions
Thank for your patience :-) - About using PgAdmin, anecdotal problems or not, I did the whole tests again in plain postgresql. - About running queries once or not, Bill and Francisco both pointed out somehow that I should run each query multiple times to get appropriate statistics. I did it for all queries - First trial always longer, all other stabilise around the same values. The EXPLAIN ANALYSE for first and second trial on each query I ran on original table and on the partitioned one can be found below. However, in my case, I will have to run most of my queries only once since I simply need to extract sample data for a research topic - there is no insert/update in the DB (that is why I thought looking at first trial was more appropriate). - About adding the exclusion check constraint, thank for remembering me such a simple thing that could have caused all this!-) but sadly, it was set to partition, as expected. However, I have decided to run all the queries after having set the parameter to ON and restarted the database, just in case. Even after doing all this, I did not find any improvement in execution times between my original fat table and the partitioned version (sometime even worst). If partitioning the table has improved significantly queries running times, I could have partitioned the tables differently to accommodate other query types I will have to run later in my research (I have the same problem for half a dozen tables). Since it does not seem that partitioning will do the job, I will get back to the original table to run my queries... However, just in case someone knows a magical trick that can improve significantly the speed of my queries (but haven't told me yet!-) here are the details about the concerned table/indexes (Using https://wiki.postgresql.org/wiki/Index_Maintenance query ...) Number of records: 387013 Table size: 369GB Indexes size: 425GB - nodes_idversion_pk: 125GB - nodes_changesetid_idx: 86GB - nodes_geom_idx: 241GB Each record has 3 bigint, 2 boolean, 1 timestamp and 1 geography type. I am running all this on my personal PC: Windows 64b, i7 chip, 16GB ram. I am using PostgreSQL 9.3 and the database cluster is spread over 2X3TB external drives with write caching. Best regards, Daniel Results/explain/analyse follow... --Constant ids- --Explain analyse on original table for a query that will look into different partitions on the new table --First attempt; db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(10005000,105000,205000,305000); Index Scan using nodes_idversion_pk on old_nodes (cost=0.71..17739.18 rows=6726 width=66) (actual time=52.226..288.700 rows=6 loops=1) Index Cond: (id = ANY ('{10005000,105000,205000,305000}'::bigint[])) Total runtime: 288.732 ms --Second attempt; db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id IN(10005000,105000,205000,305000); Index Scan using nodes_idversion_pk on old_nodes (cost=0.71..17739.18 rows=6726 width=66) (actual time=0.014..0.035 rows=6 loops=1) Index Cond: (id = ANY ('{10005000,105000,205000,305000}'::bigint[])) Total runtime: 0.056 ms -- Explain analyse on partitioned table for a query that will look into different partitions --First attempt; db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id IN(10005000,105000,205000,305000); Append (cost=0.00..933.40 rows=223 width=66) (actual time=108.903..287.068 rows=6 loops=1) - Seq Scan on nodes (cost=0.00..0.00 rows=1 width=66) (actual time=0.001..0.001 rows=0 loops=1) Filter: (id = ANY ('{10005000,105000,205000,305000}'::bigint[])) - Index Scan using nodes01_idversion_pk on nodes_01 (cost=0.57..622.78 rows=156 width=66) (actual time=108.900..108.916 rows=1 loops=1) Index Cond: (id = ANY ('{10005000,105000,205000,305000}'::bigint[])) - Index Scan using nodes38_idversion_pk on nodes_38 (cost=0.57..138.25 rows=31 width=66) (actual time=89.523..89.543 rows=1 loops=1) Index Cond: (id = ANY ('{10005000,105000,205000,305000}'::bigint[])) - Index Scan using nodes63_idversion_pk on nodes_63 (cost=0.57..119.01 rows=26 width=66) (actual time=49.978..49.998 rows=3 loops=1) Index Cond: (id = ANY ('{10005000,105000,205000,305000}'::bigint[])) - Index Scan using nodes85_idversion_pk on nodes_85 (cost=0.57..53.37 rows=9 width=66) (actual time=38.600..38.603 rows=1 loops=1) Index Cond: (id = ANY ('{10005000,105000,205000,305000}'::bigint[])) Total runtime: 287.144 ms --Second attempt; db=# EXPLAIN ANALYSE SELECT * FROM nodes WHERE id IN(10005000,105000,205000,305000); Append (cost=0.00..933.40 rows=223 width=66) (actual time=0.012..0.065 rows=6 loops=1) - Seq
Re: [GENERAL] FW: Constraint exclusion in partitions
Hi Daniel: Bill Moran already pointed out some things, I more or less agree with him On Sun, May 24, 2015 at 12:16 AM, Daniel Begin jfd...@hotmail.com wrote: .. I use pgadmin_III sql window. I'll just point from my experience. At work, more than three fourths of the problems I've had to help my pgadmin using colleagues solve where due to their usage of pgadmin. Nearly every time they did dome db administration task, they did it poorly and begun to do it correctly when using plain postgresql. This may be anecdotal. I write the following query (I have changed the id to make sure it does not use previous results still in memory)... That is exactly the opposite of what you should do, unless you are able to thorougly clean the server caches ( not as easy as it sounds ) between runs. You want warm caches first, to account for the times in planning, hashing and joining and so. Once you've got that part ready, you can move on to real queries, busy server, etc for more fine tuning. Now I select explain query from the menu and I get the following result... ... Now, I select run and I get one record as a result and the following message in history tab... . As I never use pgadmin, I cannot tell you too much, but one thing I know is NONE of MY users was able to tell me what pgadmin does when you hit run and/or explain query. OTOH, I know what psql does. And, if you use EXPLAIN ANALYZE, as suggested by my previous message, you'll be better of ( as in this case you just get a single query result from the server and you do not have to care about what your tool of choice does in the middle ). lotta deletions, I really need to get much faster results with my queries on this large table and partitioning was my last option... Make sure you are not having an XY-problem. Also, from the type of queries you've shown us, I do not think partitioning is the way to go. It will only give you marginal improvements when querying for single or range of ids ( and small ones if your ranges of ids are somehow clustered in your tables ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] FW: Constraint exclusion in partitions
Hi Daniel: On Sat, May 23, 2015 at 8:37 PM, Daniel Begin jfd...@hotmail.com wrote: Following Francisco suggestion, I was able to do some tests earlier this morning when the partitioning process completed and all the resulting tables analyzed. Here is what I got on both the original table and its partitioned counterpart while running the same queries. I tested them only for a couple of values but in summary... Using a constant id: All the queries I tried took longer on the partitioned table! I got similar results for multiple records using IN (id value1, id value2 ...) Using a range of ids: Surprisingly again, all the queries I tried took longer on the partitioned table! Using a list of ids from a select clause: More surprisingly, the queries I tried took less time on the partitioned table at least when using the primary key. Using an indexed field took so long compared to the old table that I cancelled the execution for the new one! Guess what, I will get back to my old fat table unless someone tells me I missed something obvious! Note: Tables/indexes description, queries and execution plans are below. Tables/indexes description -- The original table has 387013 records. Primary key/index on each partition queries are ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, version); CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id); The partitioned table has 387013 records distributed over 87 partitions. Primary key/index on each partition queries are ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY (id, version); CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id); Where xx is the partition's number suffix It is missing here, but I supose you did not forget to add the exclusion check constraint. constant id --- select * from oldtable where id=123456789; Index Scan using oldtable_idversion_pk on oldtable (cost=0.70..4437.15 rows=1682 width=66) Index Cond: (id = 123456789::bigint) --Total query runtime: 62 ms. 1 rows retrieved select * from newtable where id=123456789; Append (cost=0.00..20.19 rows=5 width=66) - Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66) Filter: (id = 123456789::bigint) - Index Scan using newtable72_idversion_pk on newtable_72 (cost=0.56..20.19 rows=4 width=66) Index Cond: (id = 123456789::bigint) --Total query runtime: 156 ms. 1 rows retrieved This is not surprissing, partitions never help much against that type of queries, with 87 partitions index are not going to be much shallower and the optimizer has a lot more of work to do. Just a couple points. When doing a lot of sequential queries on nearby ids ( on the same partition ) it will help ( better data locality, possibility of having more relevant pages in the cache ). Also, how did you do your timinngs? ( you should repeat each query 2 or 3 times, to see if the times go down a lot due to caching and, if you can, time a couple of explains ( plain, not analyze ) to see how much time the planner takes. As an example, the explain I sent you in my previous message takes between 20 and 30 milliseconds and I only have 17 live partitions ( I routinely un-inherit and move to another schema partitions older than a two years, to help the planner, and re-inherit them if needed ) ). Also, your queries seem to be very, very slow for a single indexed fetch, if I do this on my tables ( similar query to what I sent, modified to hit just 1 partition ): select count(*) from carrier_cdrs where setup between '20150107T123456' and '20150107T22'; It takes 40 ms in the first go, drops down to 27 after that, and I have ~15 ms RTT to the server ( which is more or less the time reported when I do a plain 'select 1' ). I mean, I suspect your measurements are not good, as they seem too slow. Also, when I use explain analyze on the previous query the server reports 13 ms, which is more or less one RTT less ( as the server starts measuring after receiving the query and ends before sending the reply ). Another thing, how are you getting the explain results ? I would urge you to use explain analyze, as, apart of seeming too slow, the stimations seem to be way off ( explain analyze sends you the estimated and actual results, so it is easy to see, and executes everything on the server, so result transmission time, which is not of use as it has to be the same for every method of calculating the same correct result, is not reported ). For my query I get this: QUERY PLAN Aggregate
Re: [GENERAL] FW: Constraint exclusion in partitions
Hello Bill, You wrote that my testing methodology is flawed - I hope you are right! However, I am a bit confused about your comments. Yes, I did edited the name of the tables for clarity but if I miss the point I, I will do it again as I am writing without modifying anything. Here is the procedure I follow and results... I use pgadmin_III sql window. I write the following query (I have changed the id to make sure it does not use previous results still in memory)... Select * from nodes where id=345678912; -- nodes is the real partitioned table name Now I select explain query from the menu and I get the following result... Append (cost=0.00..384.08 rows=99 width=66) - Seq Scan on nodes (cost=0.00..0.00 rows=1 width=66) Filter: (id = 345678912) - Index Scan using nodes19_idversion_pk on nodes_19 (cost=0.56..384.08 rows=98 width=66) Index Cond: (id = 345678912) Now, I select run and I get one record as a result and the following message in history tab... -- Executing query: Select * from nodes where id=345678912; Total query runtime: 62 ms. 1 row retrieved. Now, if I use the same query on the original table using the same procedure, here is what I get... Select * from old_nodes where id=345678912; -- old_nodes is the real original table name Explain gives me the following Index Scan using nodes_idversion_pk on old_nodes (cost=0.70..4437.15 rows=1682 width=66) Index Cond: (id = 345678912) Running the query gives me the same record with the following message in history tab... -- Executing query: select * from old_nodes where id=345678912; Total query runtime: 62 ms. 1 row retrieved. This time, the history tab shows that both took the same time to run (an improvement!?) Let's try this one using the same procedure... Select * from old_nodes where id IN (1005,15,25,35) Index Scan using nodes_idversion_pk on old_nodes (cost=0.71..17739.18 rows=6726 width=66) Index Cond: (id = ANY ('{1005,15,25,35}'::bigint[])) -- Executing query: Select * from old_nodes where id IN (1005,15,25,35) Total query runtime: 171 ms. 5 rows retrieved. Select * from nodes where id IN (1005,15,25,35) Append (cost=0.00..933.40 rows=223 width=66) - Seq Scan on nodes (cost=0.00..0.00 rows=1 width=66) Filter: (id = ANY ('{1005,15,25,35}'::bigint[])) - Index Scan using nodes01_idversion_pk on nodes_01 (cost=0.57..622.78 rows=156 width=66) Index Cond: (id = ANY ('{1005,15,25,35}'::bigint[])) - Index Scan using nodes38_idversion_pk on nodes_38 (cost=0.57..138.25 rows=31 width=66) Index Cond: (id = ANY ('{1005,15,25,35}'::bigint[])) - Index Scan using nodes63_idversion_pk on nodes_63 (cost=0.57..119.01 rows=26 width=66) Index Cond: (id = ANY ('{1005,15,25,35}'::bigint[])) - Index Scan using nodes85_idversion_pk on nodes_85 (cost=0.57..53.37 rows=9 width=66) Index Cond: (id = ANY ('{1005,15,25,35}'::bigint[])) -- Executing query: Select * from nodes where id IN (1005,15,25,35) Total query runtime: 140 ms. 5 rows retrieved. This time the history tab shows that the query was shorter to run on partitioned table (a real improvement!?) I know, this is different from what I referred to in my original email (at least both shows similar running time) but I swear, I did not change the times when editing table names!-) Do you see any glitch/flaw in the procedure I am using? Someone has an idea about the problem behind this unexpected behavior? I really need to get much faster results with my queries on this large table and partitioning was my last option... Best regards, Daniel -Original Message- From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Bill Moran Sent: May-23-15 15:23 To: Daniel Begin Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] FW: Constraint exclusion in partitions A large portion of why you describe below is the exact opposite of my own testing (unfortunately, I don't have the actual test results any more because I did the tests for a former employer). In my tests, single lookups against the same column being used to partition improved performance in direct proportion to the number of partitions. I.e. if the tables are partitioned on id, and the lookup is for id, and the table has 10 partitions, the query is 10x faster on the partitioned version than the non-partitioned verison. Queries against indexes not partitioned were slightly slower in my tests, but nowhere near the degree that you're showing below. I can't help but think that your testing methodology is flawed, but since you're not showing us what you actually did, it's difficult to be sure. See below for some specifics on what I'm
Re: [GENERAL] FW: Constraint exclusion in partitions
I am working with postgresql 9.3 and I understand from the documentation that constraint_exclusion is set to “partition” by default. Looking at my postgres.conf file, the concerned line is “#constraint_exclusion = partition”. Furthermore, the execution plan shows that constraint_exclusion was used at least for constant id and range of ids Daniel From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of melvin6925 Sent: May-23-15 15:15 To: Daniel Begin; pgsql-general@postgresql.org Subject: Re: [GENERAL] FW: Constraint exclusion in partitions Did you remember to set constraint_exclusion = on and reload the .conf ? Sent via the Samsung Galaxy S® 6, an ATT 4G LTE smartphone Original message From: Daniel Begin jfd...@hotmail.com Date: 05/23/2015 14:37 (GMT-05:00) To: pgsql-general@postgresql.org Subject: Re: [GENERAL] FW: Constraint exclusion in partitions Following Francisco suggestion, I was able to do some tests earlier this morning when the partitioning process completed and all the resulting tables analyzed. Here is what I got on both the original table and its partitioned counterpart while running the same queries. I tested them only for a couple of values but in summary... Using a constant id: All the queries I tried took longer on the partitioned table! I got similar results for multiple records using IN (id value1, id value2 ...) Using a range of ids: Surprisingly again, all the queries I tried took longer on the partitioned table! Using a list of ids from a select clause: More surprisingly, the queries I tried took less time on the partitioned table at least when using the primary key. Using an indexed field took so long compared to the old table that I cancelled the execution for the new one! Guess what, I will get back to my old fat table unless someone tells me I missed something obvious! Daniel Note: Tables/indexes description, queries and execution plans are below. Tables/indexes description -- The original table has 387013 records. Primary key/index on each partition queries are ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, version); CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id); The partitioned table has 387013 records distributed over 87 partitions. Primary key/index on each partition queries are ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY (id, version); CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id); Where xx is the partition's number suffix constant id --- select * from oldtable where id=123456789; Index Scan using oldtable_idversion_pk on oldtable (cost=0.70..4437.15 rows=1682 width=66) Index Cond: (id = 123456789::bigint) --Total query runtime: 62 ms. 1 rows retrieved select * from newtable where id=123456789; Append (cost=0.00..20.19 rows=5 width=66) - Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66) Filter: (id = 123456789::bigint) - Index Scan using newtable72_idversion_pk on newtable_72 (cost=0.56..20.19 rows=4 width=66) Index Cond: (id = 123456789::bigint) --Total query runtime: 156 ms. 1 rows retrieved I got similar results for multiple records... select * from oldtable where id IN(1000,10,20,30); Index Scan using oldtable_idversion_pk on oldtable (cost=0.71..17739.18 rows=6726 width=66) Index Cond: (id = ANY ('{1000,10,20,30}'::bigint[])) --Total query runtime: 187 ms. 4 rows retrieved select * from newtable where id IN(1000,10,20,30); Append (cost=0.00..933.40 rows=223 width=66) - Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66) Filter: (id = ANY ('{1000,10,20,30}'::bigint[])) - Index Scan using newtable01_idversion_pk on newtable_01 (cost=0.57..622.78 rows=156 width=66) Index Cond: (id = ANY ('{1000,10,20,30}'::bigint[])) ... - Index Scan using newtable85_idversion_pk on newtable_85 (cost=0.57..53.37 rows=9 width=66) Index Cond: (id = ANY ('{1000,10,20,30}'::bigint[])) --Total query runtime: 421 ms. 4 rows retrieved range of ids --- select * from oldtable where id between 152249 and 152349; Index Scan using oldtable_idversion_pk on oldtable (cost=0.70..383.51 rows=144 width=66) Index Cond: ((id = 152249) AND (id = 152349)) Total query runtime: 47 ms. 53 rows retrieved. select * from newtable where id between 152249 and 152349; Append (cost=0.00..408.16 rows=104
Re: [GENERAL] FW: Constraint exclusion in partitions
Oops, I was responding to the email below from melvin6925 From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of David G. Johnston Sent: May-23-15 19:32 To: Daniel Begin Cc: melvin6925; pgsql-general@postgresql.org Subject: Re: [GENERAL] [NOVICE] Constraint exclusion in partitions On Saturday, May 23, 2015, Daniel Begin jfd...@hotmail.com wrote: I am working with postgresql 9.3 and I understand from the documentation that constraint_exclusion is set to “partition” by default. Looking at my postgres.conf file, the concerned line is “#constraint_exclusion = partition”. Furthermore, the execution plan shows that constraint_exclusion was used at least for constant id and range of ids What is your question/concern? Did you remember to set constraint_exclusion = on and reload the .conf ? From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of melvin6925 Sent: May-23-15 15:15 To: Daniel Begin; pgsql-general@postgresql.org Subject: Re: [GENERAL] FW: Constraint exclusion in partitions Did you remember to set constraint_exclusion = on and reload the .conf ? Sent via the Samsung Galaxy S® 6, an ATT 4G LTE smartphone Original message From: Daniel Begin jfd...@hotmail.com Date: 05/23/2015 14:37 (GMT-05:00) To: pgsql-general@postgresql.org Subject: Re: [GENERAL] FW: Constraint exclusion in partitions Following Francisco suggestion, I was able to do some tests earlier this morning when the partitioning process completed and all the resulting tables analyzed. Here is what I got on both the original table and its partitioned counterpart while running the same queries. I tested them only for a couple of values but in summary... Using a constant id: All the queries I tried took longer on the partitioned table! I got similar results for multiple records using IN (id value1, id value2 ...) Using a range of ids: Surprisingly again, all the queries I tried took longer on the partitioned table! Using a list of ids from a select clause: More surprisingly, the queries I tried took less time on the partitioned table at least when using the primary key. Using an indexed field took so long compared to the old table that I cancelled the execution for the new one! Guess what, I will get back to my old fat table unless someone tells me I missed something obvious! Daniel Note: Tables/indexes description, queries and execution plans are below. Tables/indexes description -- The original table has 387013 records. Primary key/index on each partition queries are ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, version); CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id); The partitioned table has 387013 records distributed over 87 partitions. Primary key/index on each partition queries are ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY (id, version); CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id); Where xx is the partition's number suffix constant id --- select * from oldtable where id=123456789; Index Scan using oldtable_idversion_pk on oldtable (cost=0.70..4437.15 rows=1682 width=66) Index Cond: (id = 123456789::bigint) --Total query runtime: 62 ms. 1 rows retrieved select * from newtable where id=123456789; Append (cost=0.00..20.19 rows=5 width=66) - Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66) Filter: (id = 123456789::bigint) - Index Scan using newtable72_idversion_pk on newtable_72 (cost=0.56..20.19 rows=4 width=66) Index Cond: (id = 123456789::bigint) --Total query runtime: 156 ms. 1 rows retrieved I got similar results for multiple records... select * from oldtable where id IN(1000,10,20,30); Index Scan using oldtable_idversion_pk on oldtable (cost=0.71..17739.18 rows=6726 width=66) Index Cond: (id = ANY ('{1000,10,20,30}'::bigint[])) --Total query runtime: 187 ms. 4 rows retrieved select * from newtable where id IN(1000,10,20,30); Append (cost=0.00..933.40 rows=223 width=66) - Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66) Filter: (id = ANY ('{1000,10,20,30}'::bigint[])) - Index Scan using newtable01_idversion_pk on newtable_01 (cost=0.57..622.78 rows=156 width=66) Index Cond: (id = ANY ('{1000,10,20,30}'::bigint[])) ... - Index Scan using newtable85_idversion_pk on newtable_85 (cost=0.57..53.37 rows=9 width=66) Index Cond: (id = ANY ('{1000,10,20,30}'::bigint[])) --Total query runtime: 421
Re: [GENERAL] FW: Constraint exclusion in partitions
On Sat, 23 May 2015 18:16:43 -0400 Daniel Begin jfd...@hotmail.com wrote: Hello Bill, You wrote that my testing methodology is flawed - I hope you are right! However, I am a bit confused about your comments. Yes, I did edited the name of the tables for clarity but if I miss the point I, I will do it again as I am writing without modifying anything. Here is the procedure I follow and results... I use pgadmin_III sql window. I write the following query (I have changed the id to make sure it does not use previous results still in memory)... I didn't realize you were using PGAdmin ... that explains some of it ... see below: Select * from nodes where id=345678912; -- nodes is the real partitioned table name Now I select explain query from the menu and I get the following result... Append (cost=0.00..384.08 rows=99 width=66) - Seq Scan on nodes (cost=0.00..0.00 rows=1 width=66) Filter: (id = 345678912) - Index Scan using nodes19_idversion_pk on nodes_19 (cost=0.56..384.08 rows=98 width=66) Index Cond: (id = 345678912) Now, I select run and I get one record as a result and the following message in history tab... -- Executing query: Select * from nodes where id=345678912; Total query runtime: 62 ms. 1 row retrieved. Now, if I use the same query on the original table using the same procedure, here is what I get... Select * from old_nodes where id=345678912; -- old_nodes is the real original table name Explain gives me the following Index Scan using nodes_idversion_pk on old_nodes (cost=0.70..4437.15 rows=1682 width=66) Index Cond: (id = 345678912) Running the query gives me the same record with the following message in history tab... -- Executing query: select * from old_nodes where id=345678912; Total query runtime: 62 ms. 1 row retrieved. This time, the history tab shows that both took the same time to run (an improvement!?) If your environment is providing such wildly variant results, then you need to start running multiple tests instead of assuming that a single run of a query is indicative of a pattern. -- Bill Moran -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] FW: Constraint exclusion in partitions
Hi Daniel On Fri, May 22, 2015 at 7:21 PM, Daniel Begin jfd...@hotmail.com wrote: I have split a large table (billions of records) into multiple partitions, hoping the access would be faster. I used an ID to make partitions check (check (id = 100 AND id 200)…) and created over 80 tables (children) that are now filled with data. ... I understand that the following query will use constraint exclusion and will run faster… a- Select * from parent_table where id =; -- using a constant But how constraint exclusion would react with the following queries … b- Select * from parent_table where id between 2345 and 6789; -- using a range of ids c- Select * from parent_table where id in(select ids from anothertable); -- using a list of ids from a select Given you have already partitioned it, why don't you just use explain [ analyze ] on the queries? I.e., in one of my tables, partitioned monthly by a timestamp ( with time zone ) field I get ( even if I never use between, a closed interval, for a continuous like type like timestamp, the optimizer clearly shows it's transfroaming it to the equivalent AND condition ): explain select * from carrier_cdrs where setup between '20150107T123456' and '20150322T22'; QUERY PLAN --- Append (cost=0.00..82202.41 rows=2346599 width=74) - Seq Scan on carrier_cdrs (cost=0.00..0.00 rows=1 width=184) Filter: ((setup = '2015-01-07 12:34:56+01'::timestamp with time zone) AND (setup = '2015-03-22 22:22:22+01'::timestamp with time zone)) - Seq Scan on carrier_cdrs_201501 (cost=0.00..30191.10 rows=816551 width=74) Filter: ((setup = '2015-01-07 12:34:56+01'::timestamp with time zone) AND (setup = '2015-03-22 22:22:22+01'::timestamp with time zone)) - Seq Scan on carrier_cdrs_201502 (cost=0.00..25277.45 rows=872830 width=74) Filter: ((setup = '2015-01-07 12:34:56+01'::timestamp with time zone) AND (setup = '2015-03-22 22:22:22+01'::timestamp with time zone)) - Seq Scan on carrier_cdrs_201503 (cost=0.00..26733.85 rows=657217 width=74) Filter: ((setup = '2015-01-07 12:34:56+01'::timestamp with time zone) AND (setup = '2015-03-22 22:22:22+01'::timestamp with time zone)) (9 rows) Since I mostly use queries of type b and c, I am wondering if partitioning the large table was appropriate and if the queries are going to be longer to run… The problem is gonna be all the extra conditions, so I'll check real queries. My bet is in a query EXACTLY like b it will use constraint exclusion, and can be potentially faster if your interval are for just 100 ids, but why speculate when YOU can measure? Also think if you touch 67-23+1=45 partitions and the DB has to check other indexed fields it is a lot of work. As I said, it depend on your actual data, actual indexes, and actual queries, just measure it. In my excample table I partition the data by the TS, which is the single indexed field and my tests showed it was faster that way ( but my queries are normally big scans of date ranges or more selective ones with narrow TS conditions, and I measured them ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] FW: Constraint exclusion in partitions
A large portion of why you describe below is the exact opposite of my own testing (unfortunately, I don't have the actual test results any more because I did the tests for a former employer). In my tests, single lookups against the same column being used to partition improved performance in direct proportion to the number of partitions. I.e. if the tables are partitioned on id, and the lookup is for id, and the table has 10 partitions, the query is 10x faster on the partitioned version than the non-partitioned verison. Queries against indexes not partitioned were slightly slower in my tests, but nowhere near the degree that you're showing below. I can't help but think that your testing methodology is flawed, but since you're not showing us what you actually did, it's difficult to be sure. See below for some specifics on what I'm concerned that you might be doing wrong ... On Sat, 23 May 2015 14:37:25 -0400 Daniel Begin jfd...@hotmail.com wrote: Following Francisco suggestion, I was able to do some tests earlier this morning when the partitioning process completed and all the resulting tables analyzed. Here is what I got on both the original table and its partitioned counterpart while running the same queries. I tested them only for a couple of values but in summary... Using a constant id: All the queries I tried took longer on the partitioned table! I got similar results for multiple records using IN (id value1, id value2 ...) Using a range of ids: Surprisingly again, all the queries I tried took longer on the partitioned table! Using a list of ids from a select clause: More surprisingly, the queries I tried took less time on the partitioned table at least when using the primary key. Using an indexed field took so long compared to the old table that I cancelled the execution for the new one! Guess what, I will get back to my old fat table unless someone tells me I missed something obvious! Daniel Note: Tables/indexes description, queries and execution plans are below. Tables/indexes description -- The original table has 387013 records. Primary key/index on each partition queries are ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, version); CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id); The partitioned table has 387013 records distributed over 87 partitions. Primary key/index on each partition queries are ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY (id, version); CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id); Where xx is the partition's number suffix constant id --- select * from oldtable where id=123456789; Index Scan using oldtable_idversion_pk on oldtable (cost=0.70..4437.15 rows=1682 width=66) Index Cond: (id = 123456789::bigint) --Total query runtime: 62 ms. 1 rows retrieved select * from newtable where id=123456789; Append (cost=0.00..20.19 rows=5 width=66) - Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66) Filter: (id = 123456789::bigint) - Index Scan using newtable72_idversion_pk on newtable_72 (cost=0.56..20.19 rows=4 width=66) Index Cond: (id = 123456789::bigint) --Total query runtime: 156 ms. 1 rows retrieved Notice in these results that you're not showing the command that was executed. The output is mostly likely from the command ANALYZE select * from newtable where id=123456789; but that's not the command you claim that you ran. In any event, the analyze output doesn't line up with the times you claim: i.e. Analyze is showing that the first query should take about 4437 time units to complete, and the second one should take about 20 time units, yet you claim the second one is slower. The other queries below exhibit a similar pattern. Are you sure you're not timing ANALYZE itself instead of the query? Because timing ANALYZE select * from newtable where id=123456789; is not going to be timing the actual time the query took to run. I would certainly expect the _planning_ of a query against partitioned tables to take longer than non- partitioned, but I would also expect the execution time to be the opposite. Hence my theory that you've accidentally timed the ANALYZE instead of the actual running of the query. Naturally, the total query time is planning + execution, and my experience shows that the loss in planning speed is more than made up for by the gain in execution speed. Perhaps you should show us the exact output of one of your tests, without editorializing. I got similar results for multiple records... select * from oldtable where id IN(1000,10,20,30); Index Scan using oldtable_idversion_pk on oldtable
Re: [GENERAL] FW: Constraint exclusion in partitions
Following Francisco suggestion, I was able to do some tests earlier this morning when the partitioning process completed and all the resulting tables analyzed. Here is what I got on both the original table and its partitioned counterpart while running the same queries. I tested them only for a couple of values but in summary... Using a constant id: All the queries I tried took longer on the partitioned table! I got similar results for multiple records using IN (id value1, id value2 ...) Using a range of ids: Surprisingly again, all the queries I tried took longer on the partitioned table! Using a list of ids from a select clause: More surprisingly, the queries I tried took less time on the partitioned table at least when using the primary key. Using an indexed field took so long compared to the old table that I cancelled the execution for the new one! Guess what, I will get back to my old fat table unless someone tells me I missed something obvious! Daniel Note: Tables/indexes description, queries and execution plans are below. Tables/indexes description -- The original table has 387013 records. Primary key/index on each partition queries are ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, version); CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id); The partitioned table has 387013 records distributed over 87 partitions. Primary key/index on each partition queries are ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY (id, version); CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id); Where xx is the partition's number suffix constant id --- select * from oldtable where id=123456789; Index Scan using oldtable_idversion_pk on oldtable (cost=0.70..4437.15 rows=1682 width=66) Index Cond: (id = 123456789::bigint) --Total query runtime: 62 ms. 1 rows retrieved select * from newtable where id=123456789; Append (cost=0.00..20.19 rows=5 width=66) - Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66) Filter: (id = 123456789::bigint) - Index Scan using newtable72_idversion_pk on newtable_72 (cost=0.56..20.19 rows=4 width=66) Index Cond: (id = 123456789::bigint) --Total query runtime: 156 ms. 1 rows retrieved I got similar results for multiple records... select * from oldtable where id IN(1000,10,20,30); Index Scan using oldtable_idversion_pk on oldtable (cost=0.71..17739.18 rows=6726 width=66) Index Cond: (id = ANY ('{1000,10,20,30}'::bigint[])) --Total query runtime: 187 ms. 4 rows retrieved select * from newtable where id IN(1000,10,20,30); Append (cost=0.00..933.40 rows=223 width=66) - Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66) Filter: (id = ANY ('{1000,10,20,30}'::bigint[])) - Index Scan using newtable01_idversion_pk on newtable_01 (cost=0.57..622.78 rows=156 width=66) Index Cond: (id = ANY ('{1000,10,20,30}'::bigint[])) ... - Index Scan using newtable85_idversion_pk on newtable_85 (cost=0.57..53.37 rows=9 width=66) Index Cond: (id = ANY ('{1000,10,20,30}'::bigint[])) --Total query runtime: 421 ms. 4 rows retrieved range of ids --- select * from oldtable where id between 152249 and 152349; Index Scan using oldtable_idversion_pk on oldtable (cost=0.70..383.51 rows=144 width=66) Index Cond: ((id = 152249) AND (id = 152349)) Total query runtime: 47 ms. 53 rows retrieved. select * from newtable where id between 152249 and 152349; Append (cost=0.00..408.16 rows=104 width=66) - Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66) Filter: ((id = 152249) AND (id = 152349)) - Index Scan using newtable51_idversion_pk on newtable_51 (cost=0.56..183.52 rows=46 width=66) Index Cond: ((id = 152249) AND (id = 152349)) - Index Scan using newtable52_idversion_pk on newtable_52 (cost=0.56..224.64 rows=57 width=66) Index Cond: ((id = 152249) AND (id = 152349)) Total query runtime: 78 ms. 53 rows retrieved. list of ids from a select clause --- --Subset provides 4 ids similar but not identical to the previous query select * from oldtable where id IN (select * from subset); Nested Loop (cost=37.45..886298.00 rows=2028512050 width=66) - HashAggregate (cost=36.75..38.75 rows=200 width=8) - Seq Scan on subset (cost=0.00..31.40 rows=2140 width=8) - Index
Re: [GENERAL] FW: Constraint exclusion in partitions
Did you remember to set constraint_exclusion = on and reload the .conf ? Sent via the Samsung Galaxy S® 6, an ATT 4G LTE smartphone Original message From: Daniel Begin jfd...@hotmail.com Date: 05/23/2015 14:37 (GMT-05:00) To: pgsql-general@postgresql.org Subject: Re: [GENERAL] FW: Constraint exclusion in partitions Following Francisco suggestion, I was able to do some tests earlier this morning when the partitioning process completed and all the resulting tables analyzed. Here is what I got on both the original table and its partitioned counterpart while running the same queries. I tested them only for a couple of values but in summary... Using a constant id: All the queries I tried took longer on the partitioned table! I got similar results for multiple records using IN (id value1, id value2 ...) Using a range of ids: Surprisingly again, all the queries I tried took longer on the partitioned table! Using a list of ids from a select clause: More surprisingly, the queries I tried took less time on the partitioned table at least when using the primary key. Using an indexed field took so long compared to the old table that I cancelled the execution for the new one! Guess what, I will get back to my old fat table unless someone tells me I missed something obvious! Daniel Note: Tables/indexes description, queries and execution plans are below. Tables/indexes description -- The original table has 387013 records. Primary key/index on each partition queries are ALTER TABLE oldtable ADD CONSTRAINT oldtable_idversion_pk PRIMARY KEY (id, version); CREATE INDEX oldtable_groupid_idx ON oldtable USING btree (group_id); The partitioned table has 387013 records distributed over 87 partitions. Primary key/index on each partition queries are ALTER TABLE newtable_xx ADD CONSTRAINT newtablexx_idversion_pk PRIMARY KEY (id, version); CREATE INDEX newtablexx_groupid_idx ON newtable_xx USING btree (group_id); Where xx is the partition's number suffix constant id --- select * from oldtable where id=123456789; Index Scan using oldtable_idversion_pk on oldtable (cost=0.70..4437.15 rows=1682 width=66) Index Cond: (id = 123456789::bigint) --Total query runtime: 62 ms. 1 rows retrieved select * from newtable where id=123456789; Append (cost=0.00..20.19 rows=5 width=66) - Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66) Filter: (id = 123456789::bigint) - Index Scan using newtable72_idversion_pk on newtable_72 (cost=0.56..20.19 rows=4 width=66) Index Cond: (id = 123456789::bigint) --Total query runtime: 156 ms. 1 rows retrieved I got similar results for multiple records... select * from oldtable where id IN(1000,10,20,30); Index Scan using oldtable_idversion_pk on oldtable (cost=0.71..17739.18 rows=6726 width=66) Index Cond: (id = ANY ('{1000,10,20,30}'::bigint[])) --Total query runtime: 187 ms. 4 rows retrieved select * from newtable where id IN(1000,10,20,30); Append (cost=0.00..933.40 rows=223 width=66) - Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66) Filter: (id = ANY ('{1000,10,20,30}'::bigint[])) - Index Scan using newtable01_idversion_pk on newtable_01 (cost=0.57..622.78 rows=156 width=66) Index Cond: (id = ANY ('{1000,10,20,30}'::bigint[])) ... - Index Scan using newtable85_idversion_pk on newtable_85 (cost=0.57..53.37 rows=9 width=66) Index Cond: (id = ANY ('{1000,10,20,30}'::bigint[])) --Total query runtime: 421 ms. 4 rows retrieved range of ids --- select * from oldtable where id between 152249 and 152349; Index Scan using oldtable_idversion_pk on oldtable (cost=0.70..383.51 rows=144 width=66) Index Cond: ((id = 152249) AND (id = 152349)) Total query runtime: 47 ms. 53 rows retrieved. select * from newtable where id between 152249 and 152349; Append (cost=0.00..408.16 rows=104 width=66) - Seq Scan on newtable (cost=0.00..0.00 rows=1 width=66) Filter: ((id = 152249) AND (id = 152349)) - Index Scan using newtable51_idversion_pk on newtable_51 (cost=0.56..183.52 rows=46 width=66) Index Cond: ((id = 152249) AND (id = 152349)) - Index Scan using newtable52_idversion_pk on newtable_52 (cost=0.56..224.64 rows=57 width=66) Index Cond: ((id = 152249) AND (id = 152349)) Total query runtime: 78 ms. 53 rows retrieved. list of ids from a select clause
[GENERAL] FW: Constraint exclusion in partitions
Sent that on pgsql-novice list but did not get any answers yet. Maybe someone could help me understand here J Hi all, I have split a large table (billions of records) into multiple partitions, hoping the access would be faster. I used an ID to make partitions check (check (id = 100 AND id 200).) and created over 80 tables (children) that are now filled with data. However, after I did it, I read a second time the following sentence in the documentation and started wondering what it actually means . Constraint exclusion only works when the query's WHERE clause contains constants (or externally supplied parameters) I understand that the following query will use constraint exclusion and will run faster. a- Select * from parent_table where id =; -- using a constant But how constraint exclusion would react with the following queries . b- Select * from parent_table where id between 2345 and 6789; -- using a range of ids c- Select * from parent_table where id in(select ids from anothertable); -- using a list of ids from a select Since I mostly use queries of type b and c, I am wondering if partitioning the large table was appropriate and if the queries are going to be longer to run. Thank in advance Daniel Doc: http://www.postgresql.org/docs/9.3/static/ddl-partitioning.html
Re: [GENERAL] FW: Constraint exclusion in partitions
On Fri, May 22, 2015 at 10:21 AM, Daniel Begin jfd...@hotmail.com wrote: But how constraint exclusion would react with the following queries … b- Select * from parent_table where id between 2345 and 6789; -- using a range of ids Not sure... These are constants but I'm not sure how smart the planner is about figuring out inequalities of this form. I would guess yes but it should be easy enough to confirm this yourself using explain...you already have the tables. c- Select * from parent_table where id in(select ids from anothertable); -- using a list of ids from a select Definitely no... Constraint exclusion is done by the planner before data is read so there is no possible way for data in a table to be used. As for performance the only way to know for sure is to test your usage patterns and data. Even without constraint exclusion partitioning can provide benefits. David J.