Re: [HACKERS] Auto creation of Partitions
On Saturday 10 March 2007 00:13, NikhilS wrote: Hi, Given that Simon wants to do away with having the master table APPENDed in the planning phase, this would be better. ISTM you're trading appending the master table for appending the DUMP partition, which afaict would give you no gain. If there are entries in the master table, I think it would get appended for all queries regardless of whether we need to examine its contents or not. Segregating dump data into a partition will avoid that. I have seen examples in some other databases wherein a partition specifies a range of someval - MAXINT for instance, to catch such cases. That again means that the onus is on the partition creator most of the times.. *shrug*... we can do that now in pgsql -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Auto creation of Partitions
Since partition is inheritance-based, a simple DROP or NO INHERIT will do the job to deal with the partition. Do we want to reinvent additional syntax when these are around and are documented? Well, if the syntax for adding a new partition eventually ends up as ALTER TABLE ADD PARTITION, then it would make more sense that you remove a partition via ALTER TABLE DROP PARTITION. But DROP PARTITION usually moves the data from this partition to other partitions, so it is something different. Andreas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Auto creation of Partitions
On Fri, 2007-03-09 at 11:48 +0530, NikhilS wrote: Hi, On 3/9/07, Shane Ambler [EMAIL PROTECTED] wrote: Note to Nikhil: Make sure the new syntax doesn't prevent partitions from being placed upon multiple tablespaces in some manner, at CREATE TABLE time. What if the syntax was something like - CREATE TABLE tabname ( ... ... ) PARTITION BY HASH(expr) | RANGE(expr) | LIST(expr) [PARTITIONS num_partitions] /* will apply to HASH only for now*/ [PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname], PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname] ... ]; And (if we use the ALTER TABLE to add partitions) ALTER TABLE tabname ADD PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname]; We could as well drop the USING part. Why would we support HASH partitions? If you did, the full syntax for hash clusters should be supported. If we do the CHECK clauses like that then we still have don't have a guaranteed non-overlap between partitions. It would be easier to use Oracle syntax and then construct the CHECK clauses from that. Also, the syntax needs to be fairly complex to allow for a mixture of modes, e.g. range and list partitioning. That is currently possible today and the syntax for doing that is IMHO much simpler than the Oracle simple way of specifying it. An alternative is to provide a partitioning function which decides which partition each values goes into. PARTITION FUNCTION which_partition(date_col) The partition function must return an unsigned integer 0, which would correspond to particular partitions. Partitions would be numbered 1..N, and named tablename_partM where 1 = M = N. The input and contents of the partition function would be up to the user. e.g. CREATE FUNCTION range_partition(date date_col) { if (date_col D1) return 1; else if (date_col D2) return 2; else if (date_col D3) return 3; return 4; } Doing it this way would allow us to easily join two tables based upon a common partition function. In time, I would suggest we support both ways: declarative and functional. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Auto creation of Partitions
Hi, Why would we support HASH partitions? If you did, the full syntax for hash clusters should be supported. In MySQL, Oracle, the syntax for HASH partitions seems to be similar to the one mentioned. I do not know much about hash clusters though. If we do the CHECK clauses like that then we still have don't have a guaranteed non-overlap between partitions. It would be easier to use Oracle syntax and then construct the CHECK clauses from that. Again Oracle, MySQL use VALUES LESS THAN (expr) format for RANGE partitions. So you mean that they end up creating ranges like MININT - Range1, Range1+1 - Range2 etc for each of the partitions? I think Postgres users are used to the CHECK clauses and I still feel that the onus of distinct partitions lies on the partition creator. Also, the syntax needs to be fairly complex to allow for a mixture of modes, e.g. range and list partitioning. That is currently possible today and the syntax for doing that is IMHO much simpler than the Oracle simple way of specifying it. Subpartitioning is not being targeted right now, but could be put on the TODO list for further enhancements. An alternative is to provide a partitioning function which decides which partition each values goes into. PARTITION FUNCTION which_partition(date_col) The partition function must return an unsigned integer 0, which would correspond to particular partitions. Partitions would be numbered 1..N, and named tablename_partM where 1 = M = N. The input and contents of the partition function would be up to the user. e.g. CREATE FUNCTION range_partition(date date_col) { if (date_col D1) return 1; else if (date_col D2) return 2; else if (date_col D3) return 3; return 4; } Doing it this way would allow us to easily join two tables based upon a common partition function. In time, I would suggest we support both ways: declarative and functional. Till now, we are going the declarative way. Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
Simon, What happens to the data when the function is dropped or replaced? - Luke Msg is shrt cuz m on ma treo -Original Message- From: Simon Riggs [mailto:[EMAIL PROTECTED] Sent: Friday, March 09, 2007 06:20 AM Eastern Standard Time To: NikhilS Cc: Shane Ambler; Luke Lonergan; Zeugswetter Andreas ADI SD; Peter Eisentraut; pgsql-hackers@postgresql.org Subject:Re: [HACKERS] Auto creation of Partitions On Fri, 2007-03-09 at 11:48 +0530, NikhilS wrote: Hi, On 3/9/07, Shane Ambler [EMAIL PROTECTED] wrote: Note to Nikhil: Make sure the new syntax doesn't prevent partitions from being placed upon multiple tablespaces in some manner, at CREATE TABLE time. What if the syntax was something like - CREATE TABLE tabname ( ... ... ) PARTITION BY HASH(expr) | RANGE(expr) | LIST(expr) [PARTITIONS num_partitions] /* will apply to HASH only for now*/ [PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname], PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname] ... ]; And (if we use the ALTER TABLE to add partitions) ALTER TABLE tabname ADD PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname]; We could as well drop the USING part. Why would we support HASH partitions? If you did, the full syntax for hash clusters should be supported. If we do the CHECK clauses like that then we still have don't have a guaranteed non-overlap between partitions. It would be easier to use Oracle syntax and then construct the CHECK clauses from that. Also, the syntax needs to be fairly complex to allow for a mixture of modes, e.g. range and list partitioning. That is currently possible today and the syntax for doing that is IMHO much simpler than the Oracle simple way of specifying it. An alternative is to provide a partitioning function which decides which partition each values goes into. PARTITION FUNCTION which_partition(date_col) The partition function must return an unsigned integer 0, which would correspond to particular partitions. Partitions would be numbered 1..N, and named tablename_partM where 1 = M = N. The input and contents of the partition function would be up to the user. e.g. CREATE FUNCTION range_partition(date date_col) { if (date_col D1) return 1; else if (date_col D2) return 2; else if (date_col D3) return 3; return 4; } Doing it this way would allow us to easily join two tables based upon a common partition function. In time, I would suggest we support both ways: declarative and functional. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
On Fri, 2007-03-09 at 07:40 -0500, Luke Lonergan wrote: What happens to the data when the function is dropped or replaced? Well, that wouldn't happen because you build in a dependency. I'm not working on this, so don't expect lots of detail. The idea is essentially to implement things the way SQLServer does it. The function would need some care. It might even need a function that writes a function. e.g. CreateRangePartitionFunction('{date1,date2,date3}', funcname); ReplaceRangePartitionFunction('{date1,date2,date3}', funcname); which would then give a properly designed function called funcname to be used for partitioning, which would include tests to make sure a partition wasn't inadvertently excluded from the list of existing partitions of any table that used it. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Auto creation of Partitions
On Mar 9, 2007, at 3:31 AM, Zeugswetter Andreas ADI SD wrote: Since partition is inheritance-based, a simple DROP or NO INHERIT will do the job to deal with the partition. Do we want to reinvent additional syntax when these are around and are documented? Well, if the syntax for adding a new partition eventually ends up as ALTER TABLE ADD PARTITION, then it would make more sense that you remove a partition via ALTER TABLE DROP PARTITION. But DROP PARTITION usually moves the data from this partition to other partitions, so it is something different. It does? IIRC every partitioning system I've seen DROP PARTITION drops the data as well. It's up to you to move it somewhere else if you want to keep it. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Auto creation of Partitions
Ühel kenal päeval, R, 2007-03-09 kell 15:41, kirjutas Jim Nasby: On Mar 9, 2007, at 3:31 AM, Zeugswetter Andreas ADI SD wrote: Since partition is inheritance-based, a simple DROP or NO INHERIT will do the job to deal with the partition. Do we want to reinvent additional syntax when these are around and are documented? Well, if the syntax for adding a new partition eventually ends up as ALTER TABLE ADD PARTITION, then it would make more sense that you remove a partition via ALTER TABLE DROP PARTITION. But DROP PARTITION usually moves the data from this partition to other partitions, so it is something different. It does? IIRC every partitioning system I've seen DROP PARTITION drops the data as well. It's up to you to move it somewhere else if you want to keep it. Will this proposed DROP PARTITION just disassociate the table from the master, or will it actually drop the partitions table from the whole database ? -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Auto creation of Partitions
On Friday 09 March 2007 01:23, NikhilS wrote: Hi, This follows on from the suggestion I made - taken along the lines of the subject auto creation of partitions where I suggested the syntax of partition check(month of mydatecol) and have a new partition created as data was entered. With this scenario dropping the partition when it was empty would complement the creation of a new partition as needed. Given that there seems to be no real support of going with auto maintenance were new partitions are added as needed, then the auto dropping of empty partitions would also not apply. Leaving us with only specific add partition / drop partition commands. And have the parent table pick up rows not matching any partition check criteria. I was thinking along the lines of what Jim had suggested earlier regarding overflow partition. Instead of dumping unmatched rows to the master table, we could put them into a default DUMP/DUMB partition. I'm quite content dumping unmatched rows into the master table. This makes it very easy to scan partitioned tables for busted partition setups. Having a DUMP tables seems only different semantically, so why learn new semantics? Given that Simon wants to do away with having the master table APPENDed in the planning phase, this would be better. ISTM you're trading appending the master table for appending the DUMP partition, which afaict would give you no gain. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Auto creation of Partitions
Hi, On 3/10/07, Hannu Krosing [EMAIL PROTECTED] wrote: Ühel kenal päeval, R, 2007-03-09 kell 15:41, kirjutas Jim Nasby: On Mar 9, 2007, at 3:31 AM, Zeugswetter Andreas ADI SD wrote: Since partition is inheritance-based, a simple DROP or NO INHERIT will do the job to deal with the partition. Do we want to reinvent additional syntax when these are around and are documented? Well, if the syntax for adding a new partition eventually ends up as ALTER TABLE ADD PARTITION, then it would make more sense that you remove a partition via ALTER TABLE DROP PARTITION. But DROP PARTITION usually moves the data from this partition to other partitions, so it is something different. It does? IIRC every partitioning system I've seen DROP PARTITION drops the data as well. It's up to you to move it somewhere else if you want to keep it. Will this proposed DROP PARTITION just disassociate the table from the master, or will it actually drop the partitions table from the whole database ? Thats why I would prefer the existing mechanism, there a DROP on the child removes it and a NO INHERIT disassociates it. There might be situations where we would want to just disassociate and not drop. Regards, Nikhils -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
Hi, Given that Simon wants to do away with having the master table APPENDed in the planning phase, this would be better. ISTM you're trading appending the master table for appending the DUMP partition, which afaict would give you no gain. If there are entries in the master table, I think it would get appended for all queries regardless of whether we need to examine its contents or not. Segregating dump data into a partition will avoid that. I have seen examples in some other databases wherein a partition specifies a range of someval - MAXINT for instance, to catch such cases. That again means that the onus is on the partition creator most of the times.. Regards, Nikhils -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
Hi, Maybe, the dummy entry could be extended to contain the bounds (max/min) for each of the other involved partitions and they could be updated each time a DML happens across the partitions. That ways, an update to a particular partition needs to lock out the others, examine the dummy entries in its own index and follow it up with dummy entries update into other partitions if the need be. Ofcourse as you have mentioned all of this so needs to be done after a careful think on the locking/deadlocking etc issues. Regards, Nikhils On 3/7/07, Alvaro Herrera [EMAIL PROTECTED] wrote: I am wondering if we can implement unique indexes across several tables (inheritance hierarchy) not by using a single, big index covering all the tables, but rather by inserting a dummy entry into each partition's unique index. This dummy entry would have an expanded CTID which would include the tableoid, so it's possible to check it (albeit there is a problem in that we may require the opening of another heap to do the actual checking). These dummy entries could be removed by bulkcleanup as soon as the inserting transaction is no longer running, to avoid bloating the index too much. All said dummy index entries would be located at either the rightmost or the leftmost leaf, or close to it, so another idea is to have future inserters reuse the entry for a different key. The obvious problem with this is, naturally, the excess I/O that extra index traversing causes. The not so obvious ones are locking, deadlocking and the opening of other heaps and indexes while you do the insertion, which may be too expensive. On the other hand, maybe this idea is easier to implement than full-fledged cross-table indexes, so we could have richer partitioning earlier than when somebody finally bites the bullet and implements cross-table indexes. Or maybe this is just a dumb idea, but I had to let it out anyway :-) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
Hi, There are 2 other reasons to favor triggers though: 1) People (Josh Drake comes to mind) have found that if you get over a tiny number of partitions, the performance of rules is abysmal. 2) I believe it should be possible to construct an update trigger that allows you to perform updates that will place the row in question into a new partition. While I can see cases for simply disallowing updates to the partitioning key, I think there are also times when being able to do that would be very useful. The consensus seems to be veering towards triggers. I think it'd be great to make adding and removing partitions as simple as ALTER TABLE. I don't think that DELETE should be the mechanism to drop a partition, though. Again, DML statements shouldn't be performing DDL. Since partition is inheritance-based, a simple DROP or NO INHERIT will do the job to deal with the partition. Do we want to reinvent additional syntax when these are around and are documented? Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
On Wed, 2007-03-07 at 22:32 -0500, Luke Lonergan wrote: Andreas, On 3/7/07 11:45 AM, Zeugswetter Andreas ADI SD [EMAIL PROTECTED] wrote: Whoa, do you have anything to back that up ? Sure - when we start to consider designs that implement advanced data management features, we run into problems with the architecture of tables-tables-tables Here are some examples: 1 - people think of partitions as a logical building block for tables, they would like to move partitions around underneath a table without the table definition being involved. In the current implementation, there are explicit linkages between the table definition and the child tables - imagine an ALTER TABLE foo_parent ADD COLUMN and how it would need to cascade to 1,000 child tables and you get the beginning of it - this connection should not exist. The inheritance can work at multiple levels, so its up to you how you manage things. If you want to add a column to only the newest tables, you just add a new mid-level table, add the new column only to that and then make all new partitions inherit from that table rather than the main table. So I don't see the objection here, I see a benefit. 2 - INSERT/UPDATE/DELETE processing through the SQL rewrite layer (rules) is terribly slow and gets slower as you add more partitions. If done closer to the storage layer, this can be done in ways that use access methods shared with other storage entities, e.g. Indices, and the code path would flow more naturally. That bit is just syntactic sugar and unnecessary, for most applications, in my experience - especially because it doesn't work with COPY. People do seem to want it, so as I said upthread, we need a way of speeding up the selection of the appropriate partition, so we can get this to work for individual INSERTs. This needs substantial improvement, but I don't see this challenging the partition == table assumption. 3 - Parallel query can be accomplished more easily by separating scans across relations split among tablespaces. This is more natural than trying to parallelize APPEND nodes within existing plans Tables- Partitions - Tablespaces Changing Partitions from Tables to SomethingElse won't increase the current capability to define the disk layout for concurrent I/O. Parallel Query needs to be planner-aware, so if we change partitions from being tables, then we'd need to reintroduce them to the planner. APPEND is clearly not the only thing that's needed for parallel query. Note to Nikhil: Make sure the new syntax doesn't prevent partitions from being placed upon multiple tablespaces in some manner, at CREATE TABLE time. You would need to elaborate what you actually mean, but I think it is moot. Sure, the constraint technique can be further extended (e.g. during runtime), but imho the approach is very good. Well, it's being used and that's good, but it needs to be better IMO and I think that before we go too far down the current path we should consider the alternatives more carefully. I'm happy to reconsider things, but we need to do that with some clear analysis of what doesn't work yet and how best to implement that. None of what's been mentioned requires us to reconsider the Partition == Table assumption. There were other ways considered, but they didn't win out in the analysis, for documented reasons. If there are benefits to having partitions act like tables, then maybe we can make them behave differently in just those circumstances. Pack animals behave differently in a group, so why not tables, if need be? But when is that exactly? There is a ton of work to make partitioning the elegant beast we'd like it to be, but that seems like extension only, not rip and replace. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Auto creation of Partitions
On Wed, 2007-03-07 at 21:27 -0700, Jim Nasby wrote: On Mar 7, 2007, at 3:26 AM, Simon Riggs wrote: If you know that the constraints on each of the tables is distinct, then building a UNIQUE index on each of the partitions is sufficient to prove that all rows in the combined partitioned table are distinct also. The hard part there is checking that the partition constraints are distinct. If the partition constraints are added one at a time, you can use the predicate testing logic to compare the to-be-added partition's constraint against each of the already added constraints. That becomes an O(N) problem. What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. Directing data to child tables with triggers pretty much necessitates having some way to codify what partition a particular row belongs in. IE: for partitioning by month, you'll see things like naming the partition tables parent_table_name_$YEAR_$MONTH, so the 'partitioning function' takes a date or timestamp and then returns what partition it belongs to. Perhaps there is some way to use that mapping to drive the selection of what partitions could contain a given value? One possibility would be to require 3 functions for a partitioned table: one accepts the partitioning key and tells you what partition it's in, one that tells you what the minimum partitioning key for a partition would be, and one that tells you what the maximum would be. If the user supplied those 3 functions, I think it would be possibly to automatically generate code for the triggers and check constraints. The min/max partition key functions might allow you to more efficiently do partition elimination, too. ISTM this is a good idea. SQLServer uses partitioning functions and I like that approach. It makes it much easier to do partition-wise joins between tables that share partitioning functions. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Auto creation of Partitions
Jim Nasby [EMAIL PROTECTED] writes: One possibility would be to require 3 functions for a partitioned table: one accepts the partitioning key and tells you what partition it's in, one that tells you what the minimum partitioning key for a partition would be, and one that tells you what the maximum would be. If the user supplied those 3 functions, I think it would be possibly to automatically generate code for the triggers and check constraints. The min/max partition key functions might allow you to more efficiently do partition elimination, too. But then it would be harder to tell whether a clause implied a given partition. That is, if you have a partition constraint of col OP const then we can test whether a query clause of col OP2 const2 implies that constraint when planning (or actually whether it implies it's false to exclude the partition). If you have a constraint like P1(const) it'll be pretty hard to do much with that. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Auto creation of Partitions
The hard part there is checking that the partition constraints are distinct. If the partition constraints are added one at a time, you can use the predicate testing logic to compare the to-be-added partition's constraint against each of the already added constraints. That becomes an O(N) problem. Yes, we could preevaluate that check (per index) in the DDL phase and keep the info in a flag. Also interesting info is if there is an order the partitions can be read in to satisfy a particular order by. What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. I do not really see a problem with O(N) since typical N currently range from 10 to 200. N = 1000 is already good for a = 10 TB table. If a 10 GB partition were too large we should imho invest more in the advanced indexing methods that are currently beeing developed. Directing data to child tables with triggers pretty much necessitates having some way to codify what partition a particular row belongs in. IE: for partitioning by month, you'll see things like naming the partition tables parent_table_name_$YEAR_$MONTH, so the 'partitioning function' takes a date or timestamp and then returns what partition it belongs to. Perhaps there is some way to use that mapping to drive the selection of what partitions could contain a given value? You put it in the first partition that has matching constraints. One possibility would be to require 3 functions for a partitioned table: one accepts the partitioning key and tells you what partition it's in, one that tells you what the minimum partitioning key for a partition would be, and one that tells you what the maximum would be. If the user supplied those 3 functions, I think it would be possibly to automatically generate code for the triggers and check constraints. The min/max partition key functions might allow you to more efficiently do partition elimination, too. I can see this as a good optional addition, but it can only be optional else it would pretty much limit the methods that can be used for partitioning. e.g. hash, modulo do not have a min,max per partition. Andreas ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Auto creation of Partitions
On Thu, Mar 08, 2007 at 01:52:27PM +0530, NikhilS wrote: I think it'd be great to make adding and removing partitions as simple as ALTER TABLE. I don't think that DELETE should be the mechanism to drop a partition, though. Again, DML statements shouldn't be performing DDL. Since partition is inheritance-based, a simple DROP or NO INHERIT will do the job to deal with the partition. Do we want to reinvent additional syntax when these are around and are documented? Well, if the syntax for adding a new partition eventually ends up as ALTER TABLE ADD PARTITION, then it would make more sense that you remove a partition via ALTER TABLE DROP PARTITION. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Auto creation of Partitions
On Thu, Mar 08, 2007 at 10:12:30AM +, Gregory Stark wrote: Jim Nasby [EMAIL PROTECTED] writes: One possibility would be to require 3 functions for a partitioned table: one accepts the partitioning key and tells you what partition it's in, one that tells you what the minimum partitioning key for a partition would be, and one that tells you what the maximum would be. If the user supplied those 3 functions, I think it would be possibly to automatically generate code for the triggers and check constraints. The min/max partition key functions might allow you to more efficiently do partition elimination, too. But then it would be harder to tell whether a clause implied a given partition. That is, if you have a partition constraint of col OP const then we can test whether a query clause of col OP2 const2 implies that constraint when planning (or actually whether it implies it's false to exclude the partition). If you have a constraint like P1(const) it'll be pretty hard to do much with that. Well, you could tell what partition 'const' was in; I would think that plus knowledge about OP2 would allow you to decide what partitions you need to look at. There's also nothing to prevent us from also adding the constraints and using constraint exclusion as well. In fact, I think we'd want to have the constraints just so we know that a given partition only contains the data we want it to. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Auto creation of Partitions
One other thought... a lot of this discussion seems to parallel the debate from a while ago about whether SERIAL should act like a macro (ie: it just sets everything up and users are free to monkey under the hood afterwards), or whether it should be it's own 'closed-box' construct. Currently, we seem to be leaning towards partition management being a 'macro', with child tables very exposed, etc. I don't know if that's good or bad, but it's probably worth some thought. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Auto creation of Partitions
Note to Nikhil: Make sure the new syntax doesn't prevent partitions from being placed upon multiple tablespaces in some manner, at CREATE TABLE time. What if the syntax was something like - CREATE TABLE tabname ( ... ... ) PARTITION BY HASH(expr) | RANGE(expr) | LIST(expr) [PARTITIONS num_partitions] /* will apply to HASH only for now*/ [PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname], PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname] ... ]; And (if we use the ALTER TABLE to add partitions) ALTER TABLE tabname ADD PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname]; Of course ALTER TABLE childtable SET TABLESPACE tblspcname; should not cause any probs. -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Auto creation of Partitions
Jim C. Nasby wrote: On Thu, Mar 08, 2007 at 01:52:27PM +0530, NikhilS wrote: I think it'd be great to make adding and removing partitions as simple as ALTER TABLE. I don't think that DELETE should be the mechanism to drop a partition, though. Again, DML statements shouldn't be performing DDL. Since partition is inheritance-based, a simple DROP or NO INHERIT will do the job to deal with the partition. Do we want to reinvent additional syntax when these are around and are documented? Well, if the syntax for adding a new partition eventually ends up as ALTER TABLE ADD PARTITION, then it would make more sense that you remove a partition via ALTER TABLE DROP PARTITION. This follows on from the suggestion I made - taken along the lines of the subject auto creation of partitions where I suggested the syntax of partition check(month of mydatecol) and have a new partition created as data was entered. With this scenario dropping the partition when it was empty would complement the creation of a new partition as needed. Given that there seems to be no real support of going with auto maintenance were new partitions are added as needed, then the auto dropping of empty partitions would also not apply. Leaving us with only specific add partition / drop partition commands. And have the parent table pick up rows not matching any partition check criteria. -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Auto creation of Partitions
Hi, On 3/9/07, Shane Ambler [EMAIL PROTECTED] wrote: Note to Nikhil: Make sure the new syntax doesn't prevent partitions from being placed upon multiple tablespaces in some manner, at CREATE TABLE time. What if the syntax was something like - CREATE TABLE tabname ( ... ... ) PARTITION BY HASH(expr) | RANGE(expr) | LIST(expr) [PARTITIONS num_partitions] /* will apply to HASH only for now*/ [PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname], PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname] ... ]; And (if we use the ALTER TABLE to add partitions) ALTER TABLE tabname ADD PARTITION partition_name CHECK(...) [USING TABLESPACE tblspcname]; We could as well drop the USING part. Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
Hi, This follows on from the suggestion I made - taken along the lines of the subject auto creation of partitions where I suggested the syntax of partition check(month of mydatecol) and have a new partition created as data was entered. With this scenario dropping the partition when it was empty would complement the creation of a new partition as needed. Given that there seems to be no real support of going with auto maintenance were new partitions are added as needed, then the auto dropping of empty partitions would also not apply. Leaving us with only specific add partition / drop partition commands. And have the parent table pick up rows not matching any partition check criteria. I was thinking along the lines of what Jim had suggested earlier regarding overflow partition. Instead of dumping unmatched rows to the master table, we could put them into a default DUMP/DUMB partition. Given that Simon wants to do away with having the master table APPENDed in the planning phase, this would be better. Regards, Nikhils -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
Am Mittwoch, 7. März 2007 07:44 schrieb NikhilS: Sure, but as Chris mentioned earlier, wouldn't it be useful to maintain uniqueness on a partition-by-partition basis too? Many things might be useful, but the aim of the table partitioning venture is believed to be the provision of a transparent interface to the existing do-it-yourself partitioning facilities. Therefore, the logical definition of a table must continue to work unchanged (or alternatively, the use of the feature must be prohibited if that cannot be guaranteed in a particular case). Other features such as uniqueness on a partition basis might also be useful but you can do that today and you don't need partitioning for that. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Auto creation of Partitions
On Tue, 2007-03-06 at 18:31 +0530, NikhilS wrote: On 3/6/07, Peter Eisentraut [EMAIL PROTECTED] wrote: NikhilS wrote: iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass it on to the children tables. How will you maintain a primary key in such a table, considering that indexes can't span multiple tables? We will not (I know its a hard thing to do :) ), the intention is to use this information from the parent and make it a property of the child table. This will avoid the step for the user having to manually specify CREATE INDEX and the likes on all the children tables one-by-one. If you know that the constraints on each of the tables is distinct, then building a UNIQUE index on each of the partitions is sufficient to prove that all rows in the combined partitioned table are distinct also. The hard part there is checking that the partition constraints are distinct. If the partition constraints are added one at a time, you can use the predicate testing logic to compare the to-be-added partition's constraint against each of the already added constraints. That becomes an O(N) problem. What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Auto creation of Partitions
Hi, If you know that the constraints on each of the tables is distinct, then building a UNIQUE index on each of the partitions is sufficient to prove that all rows in the combined partitioned table are distinct also. The hard part there is checking that the partition constraints are distinct. If the partition constraints are added one at a time, you can use the predicate testing logic to compare the to-be-added partition's constraint against each of the already added constraints. That becomes an O(N) problem. What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. If the unique constraint is supposed to be on a column which is NOT being used for the partitioning, then all the above becomes much more difficult. While partitioning, the additional onus on the user is to specify non-conflicting CHECKs for the range/list partitions. Regards, Nikhils -- Simon Riggs EnterpriseDB http://www.enterprisedb.com -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass it on to the children tables. How will you maintain a primary key in such a table, considering that indexes can't span multiple tables? Many partitioning schemes have (or can be made to have) only one possible target partition for the primary key. Thus if you create separate unique indexes on each partition the problem is solved. For a first version I opt, that it is sufficient to disallow creation of a unique index on the master, when the constraints (on columns of this index) do not nail down a specific partition for each row (e.g. a hash or a range on one of the index columns that does not overlap). Later, global index, or indexes with separate partitioning rules can be implemented, that cover the other cases. Andreas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Auto creation of Partitions
Hi, On 3/7/07, Zeugswetter Andreas ADI SD [EMAIL PROTECTED] wrote: iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass it on to the children tables. How will you maintain a primary key in such a table, considering that indexes can't span multiple tables? Many partitioning schemes have (or can be made to have) only one possible target partition for the primary key. Thus if you create separate unique indexes on each partition the problem is solved. For a first version I opt, that it is sufficient to disallow creation of a unique index on the master, when the constraints (on columns of this index) do not nail down a specific partition for each row (e.g. a hash or a range on one of the index columns that does not overlap). Later, global index, or indexes with separate partitioning rules can be implemented, that cover the other cases. Andreas Yes, I agree. For version 1, UNIQUE/PRIMARY indexes will cascade down to the child table, only if the indexed column is present as part of the partitioning rule. Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
I am wondering if we can implement unique indexes across several tables (inheritance hierarchy) not by using a single, big index covering all the tables, but rather by inserting a dummy entry into each partition's unique index. This dummy entry would have an expanded CTID which would include the tableoid, so it's possible to check it (albeit there is a problem in that we may require the opening of another heap to do the actual checking). These dummy entries could be removed by bulkcleanup as soon as the inserting transaction is no longer running, to avoid bloating the index too much. All said dummy index entries would be located at either the rightmost or the leftmost leaf, or close to it, so another idea is to have future inserters reuse the entry for a different key. The obvious problem with this is, naturally, the excess I/O that extra index traversing causes. The not so obvious ones are locking, deadlocking and the opening of other heaps and indexes while you do the insertion, which may be too expensive. On the other hand, maybe this idea is easier to implement than full-fledged cross-table indexes, so we could have richer partitioning earlier than when somebody finally bites the bullet and implements cross-table indexes. Or maybe this is just a dumb idea, but I had to let it out anyway :-) -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Auto creation of Partitions
Simon, On 3/7/07 5:26 AM, Simon Riggs [EMAIL PROTECTED] wrote: What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. I think we need to re-evaluate the inheritance mechanism for partitioning and implement something much closer to the storage layer, similar to Oracle. Having the constraint exclusion occur in the planner is not flexible enough to allow more advanced solutions. - Luke ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Auto creation of Partitions
On Wed, 2007-03-07 at 10:23 -0500, Luke Lonergan wrote: Simon, On 3/7/07 5:26 AM, Simon Riggs [EMAIL PROTECTED] wrote: What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. I think we need to re-evaluate the inheritance mechanism for partitioning and implement something much closer to the storage layer, similar to Oracle. Oracle's implementation is fairly close to ours, actually. Each partition is a segment. SQLServer's is fairly low level. Having the constraint exclusion occur in the planner is not flexible enough to allow more advanced solutions. It depends what those advanced solutions are. I definitely want to be able to do run-time exclusion, push down merge joins and parallelism, but I also think that being able to index only the first 3 partitions is a useful feature too. ISTM that making the Append node responsible for exclusion might be a way to go with this, but various ways are possible, I'm sure. The trick is to find one that does everything you need and that will take some deeper analysis. However you do it, you'll still need a way of deciding which partitions to include/exclude that doesn't involve a sequential scan of all partition constraints. So my comments about a different index structure are still going to be relevant, wherever that lives/executes. I'm not doing anything in this area personally for 8.3 though. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Auto creation of Partitions
What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. I think we need to re-evaluate the inheritance mechanism for partitioning and implement something much closer to the storage layer, similar to Oracle. Having the constraint exclusion occur in the planner is not flexible enough to allow more advanced solutions. Whoa, do you have anything to back that up ? You would need to elaborate what you actually mean, but I think it is moot. Sure, the constraint technique can be further extended (e.g. during runtime), but imho the approach is very good. Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Auto creation of Partitions
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Alvaro Herrera Sent: woensdag 7 maart 2007 15:59 To: NikhilS Cc: Zeugswetter Andreas ADI SD; Peter Eisentraut; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Auto creation of Partitions I am wondering if we can implement unique indexes across several tables (inheritance hierarchy) not by using a single, big index covering all the tables, but rather by inserting a dummy entry into each partition's unique index. I think that one big index is much better in this case. You are already replicating the data and need a lot of work when updating the partition structure (e.g. removing one). [snip] Rather: If we have the partition relations (aka partition), then we can define a unique index on it. This guarentees that there is at most one tuple with the same value (for the specified columns) for every individual partition. Now for the inserts. We already guarentee that is unique within the partition it lives. So we must insert it first. Next its to check the other partitions (in order!) for existence of a row with a similar context. Of course we require full visibility of the data. We do the insert first, as this ensures other will find it. The order is also important, otherwise there can be situations where we can't guarentee the constraint. Updates are exactly the same. Deletes are trivial. Perhaps you can do it without an index, but this is nothing different than for a normal table. - Joris Dobbelsteen ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Auto creation of Partitions
Andreas, On 3/7/07 11:45 AM, Zeugswetter Andreas ADI SD [EMAIL PROTECTED] wrote: Whoa, do you have anything to back that up ? Sure - when we start to consider designs that implement advanced data management features, we run into problems with the architecture of tables-tables-tables Here are some examples: 1 - people think of partitions as a logical building block for tables, they would like to move partitions around underneath a table without the table definition being involved. In the current implementation, there are explicit linkages between the table definition and the child tables - imagine an ALTER TABLE foo_parent ADD COLUMN and how it would need to cascade to 1,000 child tables and you get the beginning of it - this connection should not exist. 2 - INSERT/UPDATE/DELETE processing through the SQL rewrite layer (rules) is terribly slow and gets slower as you add more partitions. If done closer to the storage layer, this can be done in ways that use access methods shared with other storage entities, e.g. Indices, and the code path would flow more naturally. 3 - Parallel query can be accomplished more easily by separating scans across relations split among tablespaces. This is more natural than trying to parallelize APPEND nodes within existing plans You would need to elaborate what you actually mean, but I think it is moot. Sure, the constraint technique can be further extended (e.g. during runtime), but imho the approach is very good. Well, it's being used and that's good, but it needs to be better IMO and I think that before we go too far down the current path we should consider the alternatives more carefully. - Luke ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Auto creation of Partitions
On Mar 6, 2007, at 9:13 AM, Shane Ambler wrote: NikhilS wrote: On 3/6/07, Peter Eisentraut [EMAIL PROTECTED] wrote: 1) Whether we should use triggers/rules for step number (iii) above. Maybe rules is the way to go. Since this would basically be a case of the updatable rules problem, you should review those discussions in the past to check whether the issues mentioned there don't interfere with that plan. The rules mentioned here will be to specify that all the inserts/updates/deletes should go into proper children tables instead of the parent. I do not see the updateable rules problem with regards to this, but will check out the archives for discussion on this related to partitioning. I would think that a trigger would be a better choice as I see the need (or at least the possibility) for more than just a rewrite. When a value is inserted that is outside of a condition currently covered by an existing child table then a new child will need to be spawned to contain the new data. There's no reason a new child has to be spawned, and I don't think that having a DML statement 'automagically' generating DDL is such a hot idea, either. Also, there's nothing inherently wrong with having an 'overflow partition' (to use Oracle syntax) that holds values that don't fall in the range of any other tables. The obvious place to do that with our partitioning is in the parent table. There are 2 other reasons to favor triggers though: 1) People (Josh Drake comes to mind) have found that if you get over a tiny number of partitions, the performance of rules is abysmal. 2) I believe it should be possible to construct an update trigger that allows you to perform updates that will place the row in question into a new partition. While I can see cases for simply disallowing updates to the partitioning key, I think there are also times when being able to do that would be very useful. Will ALTER TABLE be extended to handle partitions? This will allow partitioning existing tables (maybe without blocking access?) and allow things like ALTER TABLE mytable ADD PARTITION (mydate within 200703) and ALTER TABLE mytable DROP PARTITION (mydate within 199912) or would dropping be covered by DELETE FROM mytable where mydate = 199912 ? I think it'd be great to make adding and removing partitions as simple as ALTER TABLE. I don't think that DELETE should be the mechanism to drop a partition, though. Again, DML statements shouldn't be performing DDL. Could such a syntax be devised for date columns? (month of mydate) or similar to auto create partitions based on the year and month of a date column? or will we just do CHECK(mydatecol = 1/3/07 and mydatecol = 31/3/07) for each month of data? Also (day of mydatecol) to partition based on year and day of year. Another syntax possibility - range(myserialcol of 50) where new child tables are created every 50 rows? Maybe I'm looking at auto-maintenance which is beyond any current planning? I don't think it's worth it to burden the database with auto-creating time partitions; it's easy enough to setup a cron job to handle it. It might be more useful to have the database handle things like partitioning on a SERIAL column, though I agree with Nikhils that this should wait. Does any other database support 'automagic' partition creation? I know Oracle 9i doesn't... not sure about 10g or DB2 or MSSQL... -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Auto creation of Partitions
On Mar 6, 2007, at 4:57 AM, NikhilS wrote: iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass it on to the children tables. Since we want to eventually support 'global' indexes, I think we need to be really careful here. There's actually 3 possible cases: 1) Index* should be global (spanning multiple tables) 2) Index* should be inherited by all partitions as they're created 3) Index* should exist only on the parent table * Note that there's really no reason this has to be limit to indexes; it could certainly apply to constraints, or even triggers. IIRC, Oracle has a special syntax for global indexes; any other index defined on a partitioned table is picked up *by newly created partitions*. If you want to add indexes to existing partitions, you have to explicitly add it to each partition. I'd like to eventually see us supporting all 3 options. I'm not sure how much we want to clutter the grammar, though. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Auto creation of Partitions
On Mar 7, 2007, at 3:26 AM, Simon Riggs wrote: If you know that the constraints on each of the tables is distinct, then building a UNIQUE index on each of the partitions is sufficient to prove that all rows in the combined partitioned table are distinct also. The hard part there is checking that the partition constraints are distinct. If the partition constraints are added one at a time, you can use the predicate testing logic to compare the to-be-added partition's constraint against each of the already added constraints. That becomes an O(N) problem. What is really needed is a data structure that allows range partitions to be accessed more efficiently. This could make adding partitions and deciding in which partition a specific value goes an O(logN) operation. Directing data to child tables with triggers pretty much necessitates having some way to codify what partition a particular row belongs in. IE: for partitioning by month, you'll see things like naming the partition tables parent_table_name_$YEAR_$MONTH, so the 'partitioning function' takes a date or timestamp and then returns what partition it belongs to. Perhaps there is some way to use that mapping to drive the selection of what partitions could contain a given value? One possibility would be to require 3 functions for a partitioned table: one accepts the partitioning key and tells you what partition it's in, one that tells you what the minimum partitioning key for a partition would be, and one that tells you what the maximum would be. If the user supplied those 3 functions, I think it would be possibly to automatically generate code for the triggers and check constraints. The min/max partition key functions might allow you to more efficiently do partition elimination, too. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Auto creation of Partitions
Hi, This is to get feedback to meet the following TODO: - Simplify ability to create partitioned tables This would allow creation of partitioned tables without requiring creation of rules for INSERT/UPDATE/DELETE, and constraints for rapid partition selection. Options could include range and hash partition selection. There was some discussion on the pgsql mailing lists, which lead to the above TODO: http://archives.postgresql.org/pgsql-hackers/2006-09/msg00189.php http://archives.postgresql.org/pgsql-hackers/2006-08/msg01874.php We can have the following syntax to support auto creation of partitions in Postgresql: CREATE TABLE tabname ( ... ... ) PARTITION BY HASH(expr) | RANGE(expr) | LIST(expr) [PARTITIONS num_partitions] /* will apply to HASH only for now*/ [PARTITION partition_name CHECK(...), PARTITION partition_name CHECK(...) ... ]; Here expr will be one of the column names as specified for the master table. Once we finalize the syntax, the above statement would end up carrying out the following activities (disclaimer: we might add or remove some activities based on the discussion here). i ) Create master table. ii) Create children tables based on the number of partitions specified and make them inherit from the master table. iii) Auto generate rules (or triggers?) using the checks mentioned for the partitions, to handle INSERTs/DELETEs/UPDATEs to navigate them to the appropriate child. Note that checks specified directly on the master table will get inherited automatically. iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass it on to the children tables. v) If possible add CHECK (false) to the master table to avoid any activity on it. Some questions remain as to: 1) Whether we should use triggers/rules for step number (iii) above. Maybe rules is the way to go. 2) What other attributes (access permissions e.g.) of the master along with the ones specified in (iv) should be passed on to the children. 3) Some implementation specific issue e.g. whether SPI_execute would be a good way of creating these rules. Comments appreciated, Regards, Nikhils EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
NikhilS wrote: iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass it on to the children tables. How will you maintain a primary key in such a table, considering that indexes can't span multiple tables? 1) Whether we should use triggers/rules for step number (iii) above. Maybe rules is the way to go. Since this would basically be a case of the updatable rules problem, you should review those discussions in the past to check whether the issues mentioned there don't interfere with that plan. 2) What other attributes (access permissions e.g.) of the master along with the ones specified in (iv) should be passed on to the children. Moreover, how are later changes of those attributes propagated? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Auto creation of Partitions
Hi, On 3/6/07, Peter Eisentraut [EMAIL PROTECTED] wrote: NikhilS wrote: iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass it on to the children tables. How will you maintain a primary key in such a table, considering that indexes can't span multiple tables? We will not (I know its a hard thing to do :) ), the intention is to use this information from the parent and make it a property of the child table. This will avoid the step for the user having to manually specify CREATE INDEX and the likes on all the children tables one-by-one. 1) Whether we should use triggers/rules for step number (iii) above. Maybe rules is the way to go. Since this would basically be a case of the updatable rules problem, you should review those discussions in the past to check whether the issues mentioned there don't interfere with that plan. The rules mentioned here will be to specify that all the inserts/updates/deletes should go into proper children tables instead of the parent. I do not see the updateable rules problem with regards to this, but will check out the archives for discussion on this related to partitioning. 2) What other attributes (access permissions e.g.) of the master along with the ones specified in (iv) should be passed on to the children. Moreover, how are later changes of those attributes propagated? Once created, this will be a normal inheritance relationship between the tables and all the existing commands will apply to both the parent and the child. The basic idea here is to automate as many things as possible at partition creation time. The user is free to make additional changes to the involved tables later too. Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
NikhilS [EMAIL PROTECTED] writes: the intention is to use this information from the parent and make it a property of the child table. This will avoid the step for the user having to manually specify CREATE INDEX and the likes on all the children tables one-by-one. Missed the start of this thread. A while back I had intended to add WITH INDEXES to CREATE TABLE LIKE. That would let you create a tale LIKE parent WITH CONSTRAINTS WITH INDEXES and get pretty much a perfect table ready for adding to the inheritance structure. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Auto creation of Partitions
On 3/6/07, NikhilS [EMAIL PROTECTED] wrote: Hi, On 3/6/07, Gregory Stark [EMAIL PROTECTED] wrote: NikhilS [EMAIL PROTECTED] writes: the intention is to use this information from the parent and make it a property of the child table. This will avoid the step for the user having to manually specify CREATE INDEX and the likes on all the children tables one-by-one. Missed the start of this thread. A while back I had intended to add WITH INDEXES to CREATE TABLE LIKE. That would let you create a tale LIKE parent WITH CONSTRAINTS WITH INDEXES and get pretty much a perfect table ready for adding to the inheritance structure. Yeah, this one aims to do pretty much the above as part of the auto creation of the inheritance-based partitions. And to add, maybe if there is consensus/demand for the WITH INDEXES idea mentioned above too, I could work on it as well. Regards, Nikhils Regards, Nikhils -- Gregory Stark EnterpriseDB http://www.enterprisedb.com -- EnterpriseDB http://www.enterprisedb.com -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
Hi, On 3/6/07, Gregory Stark [EMAIL PROTECTED] wrote: NikhilS [EMAIL PROTECTED] writes: the intention is to use this information from the parent and make it a property of the child table. This will avoid the step for the user having to manually specify CREATE INDEX and the likes on all the children tables one-by-one. Missed the start of this thread. A while back I had intended to add WITH INDEXES to CREATE TABLE LIKE. That would let you create a tale LIKE parent WITH CONSTRAINTS WITH INDEXES and get pretty much a perfect table ready for adding to the inheritance structure. Yeah, this one aims to do pretty much the above as part of the auto creation of the inheritance-based partitions. Regards, Nikhils -- Gregory Stark EnterpriseDB http://www.enterprisedb.com -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
NikhilS wrote: Hi, On 3/6/07, Peter Eisentraut [EMAIL PROTECTED] wrote: NikhilS wrote: iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass it on to the children tables. How will you maintain a primary key in such a table, considering that indexes can't span multiple tables? We will not (I know its a hard thing to do :) ), the intention is to use this information from the parent and make it a property of the child table. This will avoid the step for the user having to manually specify CREATE INDEX and the likes on all the children tables one-by-one. I think a way can be devised to maintain the primary key and unique constraints. If a search is done on the parent table, the planner knows to rewrite the query as a union (or union all) of all child tables that relate to the where clause, or all child tables if the where clause is not on the column/s used to partition, then this concept should be able to be converted to indexes as well, so that when a primary or unique index from a child table is inserted to, then each of the related child indexes is consulted to ensure uniqueness. This would only apply if the partitioning was not done by the primary or unique column. 1) Whether we should use triggers/rules for step number (iii) above. Maybe rules is the way to go. Since this would basically be a case of the updatable rules problem, you should review those discussions in the past to check whether the issues mentioned there don't interfere with that plan. The rules mentioned here will be to specify that all the inserts/updates/deletes should go into proper children tables instead of the parent. I do not see the updateable rules problem with regards to this, but will check out the archives for discussion on this related to partitioning. I would think that a trigger would be a better choice as I see the need (or at least the possibility) for more than just a rewrite. When a value is inserted that is outside of a condition currently covered by an existing child table then a new child will need to be spawned to contain the new data. So say we partition by year and month of a date column? As new dates are added new child tables would be created each month. Or is this beyond the current plan and left to manual creation? Will ALTER TABLE be extended to handle partitions? This will allow partitioning existing tables (maybe without blocking access?) and allow things like ALTER TABLE mytable ADD PARTITION (mydate within 200703) and ALTER TABLE mytable DROP PARTITION (mydate within 199912) or would dropping be covered by DELETE FROM mytable where mydate = 199912 ? Could such a syntax be devised for date columns? (month of mydate) or similar to auto create partitions based on the year and month of a date column? or will we just do CHECK(mydatecol = 1/3/07 and mydatecol = 31/3/07) for each month of data? Also (day of mydatecol) to partition based on year and day of year. Another syntax possibility - range(myserialcol of 50) where new child tables are created every 50 rows? Maybe I'm looking at auto-maintenance which is beyond any current planning? -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Auto creation of Partitions
[EMAIL PROTECTED] (Peter Eisentraut) writes: NikhilS wrote: iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, pass it on to the children tables. How will you maintain a primary key in such a table, considering that indexes can't span multiple tables? On the one hand, I seem to recall seeing multiple-table-spanning indices being on the TODO list. On the other, it may be that this sort of partitioning is only usable for scenarios where it is only needful to maintain uniqueness on a partition-by-partition basis. 1) Whether we should use triggers/rules for step number (iii) above. Maybe rules is the way to go. Since this would basically be a case of the updatable rules problem, you should review those discussions in the past to check whether the issues mentioned there don't interfere with that plan. 2) What other attributes (access permissions e.g.) of the master along with the ones specified in (iv) should be passed on to the children. Moreover, how are later changes of those attributes propagated? I hear rumour of there being a more comprehensive proposal on this in the works... -- (reverse (concatenate 'string ofni.secnanifxunil @ enworbbc)) http://linuxfinances.info/info/finances.html Rules of the Evil Overlord #189. I will never tell the hero Yes I was the one who did it, but you'll never be able to prove it to that incompetent old fool. Chances are, that incompetent old fool is standing behind the curtain. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Auto creation of Partitions
On Wed, Mar 07, 2007 at 02:43:39AM +1030, Shane Ambler wrote: I think a way can be devised to maintain the primary key and unique constraints. If a search is done on the parent table, the planner knows to rewrite the query as a union (or union all) of all child tables that relate to the where clause, or all child tables if the where clause is not on the column/s used to partition, then this concept should be able to be converted to indexes as well, so that when a primary or unique index from a child table is inserted to, then each of the related child indexes is consulted to ensure uniqueness. But that's where it breaks down: you not only need to check that the row you're inserting is unique, you need to make sure that other people trying to insert the same value see it. After all, a unique index does contain multiple instances of any particular value, it's just that the DB has verified that none of these are visible to the same transaction. If there is a collision a lock is taken that waits until the colliding insert commits or rejects. With b-tree it's easy, every value can only appear in one place, so it's a very simple lock. The reason why GiST doesn't support unique indexes is due to the fact that any particular value could appear in many places, so to stop concurrent inserts you need to lock *every* page in the tree that another backend might want to insert the record. I fear this will also be a problem for indexes over multiple tables, you will need to lock the page in every index that some other DB might want to insert the value into. All while avoiding deadlocks. I think the problems are related. If you can solve it for multiple tables, you can solve it for GiST indexes also. This would only apply if the partitioning was not done by the primary or unique column. That's the easy case, no problem there. Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] Auto creation of Partitions
Martijn van Oosterhout wrote: On Wed, Mar 07, 2007 at 02:43:39AM +1030, Shane Ambler wrote: I think a way can be devised to maintain the primary key and unique constraints. If a search is done on the parent table, the planner knows to rewrite the query as a union (or union all) of all child tables that relate to the where clause, or all child tables if the where clause is not on the column/s used to partition, then this concept should be able to be converted to indexes as well, so that when a primary or unique index from a child table is inserted to, then each of the related child indexes is consulted to ensure uniqueness. But that's where it breaks down: you not only need to check that the row you're inserting is unique, you need to make sure that other people trying to insert the same value see it. This sounds like what is really needed is a way to lock a certain condition, namely the existance or non-existance of a record with certain values in certain fields. This would not only help this case, it would also help RI triggers, because those wouldn't have to acquire a share lock on the referenced rows anymore. As you pointed out, this would also make unique GiST indices possible No real idea how to do this, though :-( greetings, Florian Pfluge ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Auto creation of Partitions
NikhilS wrote: We will not (I know its a hard thing to do :) ), the intention is to use this information from the parent and make it a property of the child table. This will avoid the step for the user having to manually specify CREATE INDEX and the likes on all the children tables one-by-one. But when I say CREATE TABLE ( a int PRIMARY KEY, ... ) PARTITION blah ... then I expect that the primary key will be enforced across all partitions. We currently sidestep that issue by not offering seemingly transparent partitioning. But if you are planning to offer that, the unique index issue needs to be solved, and I see nothing in your plan about that. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Auto creation of Partitions
Florian, This sounds like what is really needed is a way to lock a certain condition, namely the existance or non-existance of a record with certain values in certain fields. This would not only help this case, it would also help RI triggers, because those wouldn't have to acquire a share lock on the referenced rows anymore. That's called predicate locking and it's very, very hard to do. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Auto creation of Partitions
Peter Eisentraut [EMAIL PROTECTED] writes: But when I say CREATE TABLE ( a int PRIMARY KEY, ... ) PARTITION blah ... then I expect that the primary key will be enforced across all partitions. We currently sidestep that issue by not offering seemingly transparent partitioning. But if you are planning to offer that, the unique index issue needs to be solved, and I see nothing in your plan about that. Agreed, it needs to Just Work. I think it'd still be useful though if we only support auto-partitioning on the primary key, and that restriction avoids the indexing problem. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Auto creation of Partitions
Agreed, it needs to Just Work. I think it'd still be useful though if we only support auto-partitioning on the primary key, and that restriction avoids the indexing problem. +1 -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Auto creation of Partitions
Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: But when I say CREATE TABLE ( a int PRIMARY KEY, ... ) PARTITION blah ... then I expect that the primary key will be enforced across all partitions. We currently sidestep that issue by not offering seemingly transparent partitioning. But if you are planning to offer that, the unique index issue needs to be solved, and I see nothing in your plan about that. Agreed, it needs to Just Work. I think it'd still be useful though if we only support auto-partitioning on the primary key, and that restriction avoids the indexing problem. Maybe. The most obvious use for automatic partitioning that I can think of would be based in the value of a timestamptz field rather than any PK. Of course I tend to work more in the OLTP field than in DW type apps, where other considerations might apply. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Auto creation of Partitions
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Josh Berkus Sent: dinsdag 6 maart 2007 19:45 To: pgsql-hackers@postgresql.org Cc: Florian G. Pflug; Martijn van Oosterhout; Shane Ambler; NikhilS; Peter Eisentraut Subject: Re: [HACKERS] Auto creation of Partitions Florian, This sounds like what is really needed is a way to lock a certain condition, namely the existance or non-existance of a record with certain values in certain fields. This would not only help this case, it would also help RI triggers, because those wouldn't have to acquire a share lock on the referenced rows anymore. That's called predicate locking and it's very, very hard to do. That's definitely not needed. Rather something good such that we can finally enforce RI ourselves in the general case. This is currently not possible to do easily, except in C code. This means we need to look at all the rows that exists, but are normally be invisible to our view of the database. Still I'm not sure about all cases, as the MVCC model is quite tricky and I'm not sure whether my idea's about it are valid. The basic idea is that you need to guarentee the constraint for the 'single underlaying model' (with everything visible) and for your view (under your visibility rules). I believe, but are not certain, that under these conditions any (valid) snapshot will obey the desired constraints. - Joris Dobbelsteen ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Auto creation of Partitions
Hi Shane, Maybe I'm looking at auto-maintenance which is beyond any current planning? Many of your suggestions are useful, but auto-maintenance will be beyond the current plan. Regards, Nikhils EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
Hi, On 3/7/07, Tom Lane [EMAIL PROTECTED] wrote: Peter Eisentraut [EMAIL PROTECTED] writes: But when I say CREATE TABLE ( a int PRIMARY KEY, ... ) PARTITION blah ... then I expect that the primary key will be enforced across all partitions. We currently sidestep that issue by not offering seemingly transparent partitioning. But if you are planning to offer that, the unique index issue needs to be solved, and I see nothing in your plan about that. Agreed, it needs to Just Work. I think it'd still be useful though if we only support auto-partitioning on the primary key, and that restriction avoids the indexing problem. regards, tom lane Sure, but as Chris mentioned earlier, wouldn't it be useful to maintain uniqueness on a partition-by-partition basis too? Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Auto creation of Partitions
Andrew Dunstan wrote: Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: But when I say CREATE TABLE ( a int PRIMARY KEY, ... ) PARTITION blah ... then I expect that the primary key will be enforced across all partitions. We currently sidestep that issue by not offering seemingly transparent partitioning. But if you are planning to offer that, the unique index issue needs to be solved, and I see nothing in your plan about that. Agreed, it needs to Just Work. I think it'd still be useful though if we only support auto-partitioning on the primary key, and that restriction avoids the indexing problem. Maybe. The most obvious use for automatic partitioning that I can think of would be based in the value of a timestamptz field rather than any PK. Of course I tend to work more in the OLTP field than in DW type apps, where other considerations might apply. I second that - partitioning on some kind of timestamp field is a common usecase here too ... Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Auto creation of Partitions
Maybe. The most obvious use for automatic partitioning that I can think of would be based in the value of a timestamptz field rather than any PK. Of course I tend to work more in the OLTP field than in DW type apps, where other considerations might apply. I second that - partitioning on some kind of timestamp field is a common usecase here too ... Partitioning period needs to work. It doesn't matter what the user chooses as their partition key. Timestamp is an obvious choice but there are others such as serial where you just partition every million rows (for example) to keep things manageable. Joshua D. Drake Stefan ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: explain analyze is your friend