Re: [HACKERS] Partitioning WIP patch
On Fri, Feb 27, 2015 at 09:09:35AM +0900, Amit Langote wrote: > On 27-02-2015 AM 03:24, Andres Freund wrote: > > On 2015-02-26 12:15:17 +0900, Amit Langote wrote: > >> On 26-02-2015 AM 05:15, Josh Berkus wrote: > >>> I would love to have it for 9.5, but I guess the > >>> patch isn't nearly baked enough for that? > > > >> I'm not quite sure what would qualify as baked enough for 9.5 though we > >> can surely try to reach some consensus on various implementation aspects > >> and perhaps even get it ready in time for 9.5. > > > > I think it's absolutely unrealistic to get this into 9.5. There's barely > > been any progress on the current (last!) commitfest - where on earth > > should the energy come to make this patch ready? And why would that be > > fair against all the others that have submitted in time? > > > > I realize and I apologize that it was irresponsible of me to have said > that; maybe got a bit too excited. I do not want to unduly draw people's > time on something that's not quite ready while there are other things > people have worked hard on to get in time. In all earnestness, I say we > spend time perfecting those things. > > I'll add this into CF-JUN'15. I will keep posting updates meanwhile so > that when that commitfest finally starts, we will have something worth > considering. I am _very_ glad you have started on this. There is a huge need for this, and I am certainly excited about it. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 27-02-2015 AM 03:01, Jim Nasby wrote: > On 2/26/15 3:09 AM, Amit Langote wrote: >> Unless I'm missing something again, isn't allowing partitions to have >> heterogeneous rowtypes a problem in the long run? I'm afraid I'm >> confused as to your stand regarding inheritance vs. new partitioning. To >> be specific, children with heterogeneous schemas sounds much like what >> inheritance would be good for as you say. But then isn't that why we >> have to plan children individually which you said new partitioning >> should get away from? > > Apologies if I haven't been clear enough. What I'd like to see is the > best of both worlds; fast partitioning when not using inheritance, and > perhaps somewhat slower when using inheritance, but still with the > features partitioning gives you. > I get the distinction, thanks. Actually I wasn't quite thinking of altering the way any part of the current partitioning based on inheritance works nor am I proposing to get rid of it. It all stays as is. Not sure how we could say if it will support features of the new partitioning before those features actually begin to materialize. > But my bigger concern from a project standpoint is that we not put > ourselves in a position of supporting something that we really don't > want to support (a partitioning system that's got inheritance mixed in). > As much as I'd personally like to have both features together, I think > it would be bad for the community to go down that road without careful > thought. > >> Yes, it does. In fact, I do intend to keep them separate the first >> attempt of which is to choose to NOT transform a PARTITION OF parent >> clause into INHERITS parent. Any code that may look like it's trying to >> do that is because the patch is not fully baked yet. > > Ok, good to know. That's why I was asking about ALTER TABLE ADD COLUMN > on a partition. If we release something without that being restricted > it'll probably cause trouble later on. Yes, I agree. More generally, I think the patch/approach is in need of a clear separation of internal implementation concerns and user-facing notions even at this point. This may be one of them. For example, with the patch, a partition is defined as: CREATE "TABLE" name PARTITION OF parent ... Unless that turns into something like: CREATE PARTITION name OF parent ... we may not be able to put all the restrictions we'd want to put on a partition for the sake of what would be partitioning internals. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On Fri, Feb 27, 2015 at 3:24 AM, Andres Freund wrote: > On 2015-02-26 12:15:17 +0900, Amit Langote wrote: >> On 26-02-2015 AM 05:15, Josh Berkus wrote: >> > On 02/24/2015 12:13 AM, Amit Langote wrote: >> >> Here is an experimental patch that attempts to implement this. > >> > I would love to have it for 9.5, but I guess the >> > patch isn't nearly baked enough for that? > >> I'm not quite sure what would qualify as baked enough for 9.5 though we >> can surely try to reach some consensus on various implementation aspects >> and perhaps even get it ready in time for 9.5. > > I think it's absolutely unrealistic to get this into 9.5. There's barely > been any progress on the current (last!) commitfest - where on earth > should the energy come to make this patch ready? And why would that be > fair against all the others that have submitted in time? +1. There are many other patches pending the in CF app waiting for feedback, while this one showed up after the last CF deadline for 9.5 and needs design and spec decisions that should not be taken lightly at the end of a major release development cycle. Please let's not rush into something we may regret. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 27-02-2015 AM 03:18, Josh Berkus wrote: > On 02/25/2015 07:15 PM, Amit Langote wrote: >> I'm not quite sure what would qualify as baked enough for 9.5 though we >> can surely try to reach some consensus on various implementation aspects >> and perhaps even get it ready in time for 9.5. > > Well, we don't have long at all to do that. I guess I'm asking what > kind of completeness of code we have; is this basically done pending API > changes and bugs, or are there major bits (like, say, pg_dump and > EXPLAIN support) which are completely unimplemented? > I would say I am not entirely sure/satisfied about some decisions I have made (or not) when writing even the basic patch. Yes, pg_dump/EXPLAIN/psql, etc. are not touched. So, it seems it might not be fair to claim it's actually something for 9.5. Let me just call it WIP for a while while keep I working on it and receive feedback. >> Only one concern I can remember someone had raised is that having to >> evaluate an expression for every row during bulk-inserts may end up >> being pretty expensive. Though, we might have to live with that. > > Well, it's not more expensive than having to materialize the value from > a trigger and store it on disk. The leading one here would be functions > over timestamp; for example, the data has a timestamptz, but you want to > partition by week. > >> >> I think one idea is to learn from ability to use expressions in indexes. > > Sure. So a feature to implement for the 2nd release. Actually, I'm trying to add that and see how it works. I will post an updated patch soon if it looks good enough. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 27-02-2015 AM 03:24, Andres Freund wrote: > On 2015-02-26 12:15:17 +0900, Amit Langote wrote: >> On 26-02-2015 AM 05:15, Josh Berkus wrote: >>> I would love to have it for 9.5, but I guess the >>> patch isn't nearly baked enough for that? > >> I'm not quite sure what would qualify as baked enough for 9.5 though we >> can surely try to reach some consensus on various implementation aspects >> and perhaps even get it ready in time for 9.5. > > I think it's absolutely unrealistic to get this into 9.5. There's barely > been any progress on the current (last!) commitfest - where on earth > should the energy come to make this patch ready? And why would that be > fair against all the others that have submitted in time? > I realize and I apologize that it was irresponsible of me to have said that; maybe got a bit too excited. I do not want to unduly draw people's time on something that's not quite ready while there are other things people have worked hard on to get in time. In all earnestness, I say we spend time perfecting those things. I'll add this into CF-JUN'15. I will keep posting updates meanwhile so that when that commitfest finally starts, we will have something worth considering. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 2015-02-26 12:15:17 +0900, Amit Langote wrote: > On 26-02-2015 AM 05:15, Josh Berkus wrote: > > On 02/24/2015 12:13 AM, Amit Langote wrote: > >> Here is an experimental patch that attempts to implement this. > > I would love to have it for 9.5, but I guess the > > patch isn't nearly baked enough for that? > I'm not quite sure what would qualify as baked enough for 9.5 though we > can surely try to reach some consensus on various implementation aspects > and perhaps even get it ready in time for 9.5. I think it's absolutely unrealistic to get this into 9.5. There's barely been any progress on the current (last!) commitfest - where on earth should the energy come to make this patch ready? And why would that be fair against all the others that have submitted in time? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 02/25/2015 07:15 PM, Amit Langote wrote: > On 26-02-2015 AM 05:15, Josh Berkus wrote: >> On 02/24/2015 12:13 AM, Amit Langote wrote: >>> Here is an experimental patch that attempts to implement this. >> >> This looks awesome. > > Thanks! > >> I would love to have it for 9.5, but I guess the >> patch isn't nearly baked enough for that? >> > > I'm not quite sure what would qualify as baked enough for 9.5 though we > can surely try to reach some consensus on various implementation aspects > and perhaps even get it ready in time for 9.5. Well, we don't have long at all to do that. I guess I'm asking what kind of completeness of code we have; is this basically done pending API changes and bugs, or are there major bits (like, say, pg_dump and EXPLAIN support) which are completely unimplemented? >>> where key_spec consists of partition key column names and optional >>> operator class per column. Currently, there are restrictions on the >>> key_spec such as allowing only column names (not arbitrary expressions >>> of them), only one column for list strategy, etc. >> >> What's the obstacle to supporting expressions and/or IMMUTABLE >> functions? I think it's fine to add this feature without them >> initially, I'm just asking about the roadmap for eventually supporting >> expressions in the key spec. >> > > Only one concern I can remember someone had raised is that having to > evaluate an expression for every row during bulk-inserts may end up > being pretty expensive. Though, we might have to live with that. Well, it's not more expensive than having to materialize the value from a trigger and store it on disk. The leading one here would be functions over timestamp; for example, the data has a timestamptz, but you want to partition by week. > > I think one idea is to learn from ability to use expressions in indexes. Sure. So a feature to implement for the 2nd release. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 2/26/15 3:22 AM, Andres Freund wrote: On 2015-02-26 02:20:21 -0600, Jim Nasby wrote: The reason I'd like to do this with partitioning vs plain inheritance is presumably as we build out partitioning we'll get very useful things like the ability to have FKs to properly partitioned tables. Insert tuple routing could also be useful. The problem there imo isn't so much inheritance, but lack of working unique checks across partitions. That's something we can implement independent of this, it's just not trivial. There's been discussion of allowing for uniqueness when we can guarantee no overlap between partitions, and the partition key is part of the unique constraint. That's the particular use case I was thinking of. I suspect there's other partitioning features that would be useful in a generic inheritance setup as well; that's why I'd love to see both features work together... but I fear there's enough work to get there that it may not happen, and I don't want us to accidentally start mixing the two and have users start relying on it. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 2/26/15 3:09 AM, Amit Langote wrote: Yes. If it helps, the exact use-case I have in mind is using list-based >partitioning + additional columns in some/all children (different >between children). For example, if you need to track different types of >customer payment methods, you'd have a payment parent table, a list >partition for credit & debit cards, a different list partition for bank >accounts, etc. > >The reason I'd like to do this with partitioning vs plain inheritance is >presumably as we build out partitioning we'll get very useful things >like the ability to have FKs to properly partitioned tables. Insert >tuple routing could also be useful. > Unless I'm missing something again, isn't allowing partitions to have heterogeneous rowtypes a problem in the long run? I'm afraid I'm confused as to your stand regarding inheritance vs. new partitioning. To be specific, children with heterogeneous schemas sounds much like what inheritance would be good for as you say. But then isn't that why we have to plan children individually which you said new partitioning should get away from? Apologies if I haven't been clear enough. What I'd like to see is the best of both worlds; fast partitioning when not using inheritance, and perhaps somewhat slower when using inheritance, but still with the features partitioning gives you. But my bigger concern from a project standpoint is that we not put ourselves in a position of supporting something that we really don't want to support (a partitioning system that's got inheritance mixed in). As much as I'd personally like to have both features together, I think it would be bad for the community to go down that road without careful thought. >>With explicit partitioning, shouldn't we go in direction where we remove >>some restrictions imposed by inheritance (think multiple inheritance)? I >>recall a link Alvaro had started the discussion with think link to a >>Tom's remark about something very related: >> >>http://www.postgresql.org/message-id/1598.1399826...@sss.pgh.pa.us > >That post looks like Tom figured out a way to eliminate a problem that >hurts inheritance, so that's good. > >My fear is that at some point we'll hit a problem with partitioning that >we can't solve in the inheritance model. If we allow inheritance >features into partitioning now we'll painted into a corner. If we >disallow those features now we can always re-enable them if we get to >the point where we're in the clear. > >Does that make sense? Yes, it does. In fact, I do intend to keep them separate the first attempt of which is to choose to NOT transform a PARTITION OF parent clause into INHERITS parent. Any code that may look like it's trying to do that is because the patch is not fully baked yet. Ok, good to know. That's why I was asking about ALTER TABLE ADD COLUMN on a partition. If we release something without that being restricted it'll probably cause trouble later on. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 2015-02-26 02:20:21 -0600, Jim Nasby wrote: > The reason I'd like to do this with partitioning vs plain inheritance is > presumably as we build out partitioning we'll get very useful things like > the ability to have FKs to properly partitioned tables. Insert tuple routing > could also be useful. The problem there imo isn't so much inheritance, but lack of working unique checks across partitions. That's something we can implement independent of this, it's just not trivial. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 26-02-2015 PM 05:20, Jim Nasby wrote: > On 2/25/15 7:57 PM, Amit Langote wrote: >> AIUI, as far as we stay with a design where partitions (children) are >> individually planned, that might be OK. But, I guess things will get >> more complicated. I think the role of a parent in planning would remain >> limited to drive partition-pruning. Am I missing something? > > Isn't the point of adding explicit partitioning to make it faster than > plain inheritance? Presumably as part of that we'll eventually want to > NOT plan children individually. > Yes, we'd definitely want to get to a point where planning children individually is not necessary. But I am afraid we will have to get there a step at a time. IMHO, solving one problem of partition-pruning would be a good start. And that will definitely be part of parent's planning using partition bounds list (not pruning children one-by-one with relation_excluded_by_constraints()). >>> I would certainly prefer that we support the capabilities of inheritance >>> along with partitioning (because in some cases you want both). But it's >>> going to limit what we can do internally. >> >> Just to clarify are you referring to inheritance relationship between a >> partitioned table and partitions? > > Yes. If it helps, the exact use-case I have in mind is using list-based > partitioning + additional columns in some/all children (different > between children). For example, if you need to track different types of > customer payment methods, you'd have a payment parent table, a list > partition for credit & debit cards, a different list partition for bank > accounts, etc. > > The reason I'd like to do this with partitioning vs plain inheritance is > presumably as we build out partitioning we'll get very useful things > like the ability to have FKs to properly partitioned tables. Insert > tuple routing could also be useful. > Unless I'm missing something again, isn't allowing partitions to have heterogeneous rowtypes a problem in the long run? I'm afraid I'm confused as to your stand regarding inheritance vs. new partitioning. To be specific, children with heterogeneous schemas sounds much like what inheritance would be good for as you say. But then isn't that why we have to plan children individually which you said new partitioning should get away from? >> With explicit partitioning, shouldn't we go in direction where we remove >> some restrictions imposed by inheritance (think multiple inheritance)? I >> recall a link Alvaro had started the discussion with think link to a >> Tom's remark about something very related: >> >> http://www.postgresql.org/message-id/1598.1399826...@sss.pgh.pa.us > > That post looks like Tom figured out a way to eliminate a problem that > hurts inheritance, so that's good. > > My fear is that at some point we'll hit a problem with partitioning that > we can't solve in the inheritance model. If we allow inheritance > features into partitioning now we'll painted into a corner. If we > disallow those features now we can always re-enable them if we get to > the point where we're in the clear. > > Does that make sense? Yes, it does. In fact, I do intend to keep them separate the first attempt of which is to choose to NOT transform a PARTITION OF parent clause into INHERITS parent. Any code that may look like it's trying to do that is because the patch is not fully baked yet. Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 2/25/15 7:57 PM, Amit Langote wrote: On 26-02-2015 AM 10:31, Jim Nasby wrote: On 2/25/15 7:24 PM, Amit Langote wrote: Does ALTER TABLE parent_monthly_x_201401 ADD COLUMN foo still operate the same as today? I'd like to see us continue to support that, but perhaps it would be wise to not paint ourselves into that corner just yet. Nothing prevents that from working, at least at the moment. Ok, but is that what we really want? If we release it that way we'll be stuck with it forever. AIUI, as far as we stay with a design where partitions (children) are individually planned, that might be OK. But, I guess things will get more complicated. I think the role of a parent in planning would remain limited to drive partition-pruning. Am I missing something? Isn't the point of adding explicit partitioning to make it faster than plain inheritance? Presumably as part of that we'll eventually want to NOT plan children individually. I would certainly prefer that we support the capabilities of inheritance along with partitioning (because in some cases you want both). But it's going to limit what we can do internally. Just to clarify are you referring to inheritance relationship between a partitioned table and partitions? Yes. If it helps, the exact use-case I have in mind is using list-based partitioning + additional columns in some/all children (different between children). For example, if you need to track different types of customer payment methods, you'd have a payment parent table, a list partition for credit & debit cards, a different list partition for bank accounts, etc. The reason I'd like to do this with partitioning vs plain inheritance is presumably as we build out partitioning we'll get very useful things like the ability to have FKs to properly partitioned tables. Insert tuple routing could also be useful. With explicit partitioning, shouldn't we go in direction where we remove some restrictions imposed by inheritance (think multiple inheritance)? I recall a link Alvaro had started the discussion with think link to a Tom's remark about something very related: http://www.postgresql.org/message-id/1598.1399826...@sss.pgh.pa.us That post looks like Tom figured out a way to eliminate a problem that hurts inheritance, so that's good. My fear is that at some point we'll hit a problem with partitioning that we can't solve in the inheritance model. If we allow inheritance features into partitioning now we'll painted into a corner. If we disallow those features now we can always re-enable them if we get to the point where we're in the clear. Does that make sense? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 26-02-2015 AM 05:15, Josh Berkus wrote: > On 02/24/2015 12:13 AM, Amit Langote wrote: >> Here is an experimental patch that attempts to implement this. > > This looks awesome. Thanks! > I would love to have it for 9.5, but I guess the > patch isn't nearly baked enough for that? > I'm not quite sure what would qualify as baked enough for 9.5 though we can surely try to reach some consensus on various implementation aspects and perhaps even get it ready in time for 9.5. >> where key_spec consists of partition key column names and optional >> operator class per column. Currently, there are restrictions on the >> key_spec such as allowing only column names (not arbitrary expressions >> of them), only one column for list strategy, etc. > > What's the obstacle to supporting expressions and/or IMMUTABLE > functions? I think it's fine to add this feature without them > initially, I'm just asking about the roadmap for eventually supporting > expressions in the key spec. > Only one concern I can remember someone had raised is that having to evaluate an expression for every row during bulk-inserts may end up being pretty expensive. Though, we might have to live with that. I think one idea is to learn from ability to use expressions in indexes. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 26-02-2015 AM 10:31, Jim Nasby wrote: > On 2/25/15 7:24 PM, Amit Langote wrote: >>> >Does ALTER TABLE parent_monthly_x_201401 ADD COLUMN foo still >>> >operate the same as today? I'd like to see us continue to support that, >>> >but perhaps it would be wise to not paint ourselves into that corner >>> >just yet. >> Nothing prevents that from working, at least at the moment. > > Ok, but is that what we really want? If we release it that way we'll be > stuck with it forever. > AIUI, as far as we stay with a design where partitions (children) are individually planned, that might be OK. But, I guess things will get more complicated. I think the role of a parent in planning would remain limited to drive partition-pruning. Am I missing something? > I would certainly prefer that we support the capabilities of inheritance > along with partitioning (because in some cases you want both). But it's > going to limit what we can do internally. Just to clarify are you referring to inheritance relationship between a partitioned table and partitions? With explicit partitioning, shouldn't we go in direction where we remove some restrictions imposed by inheritance (think multiple inheritance)? I recall a link Alvaro had started the discussion with think link to a Tom's remark about something very related: http://www.postgresql.org/message-id/1598.1399826...@sss.pgh.pa.us Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 2/25/15 7:24 PM, Amit Langote wrote: >Does ALTER TABLE parent_monthly_x_201401 ADD COLUMN foo still >operate the same as today? I'd like to see us continue to support that, >but perhaps it would be wise to not paint ourselves into that corner >just yet. Nothing prevents that from working, at least at the moment. Ok, but is that what we really want? If we release it that way we'll be stuck with it forever. I would certainly prefer that we support the capabilities of inheritance along with partitioning (because in some cases you want both). But it's going to limit what we can do internally. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 26-02-2015 AM 10:24, Amit Langote wrote: > To clarify things a bit more, transformCreateStmt() transforms PARTITION > OF into LIKE INCLUDING ALL, not INHERITS. And as mentioned, for Append > to work, I have made ATExecAddInherit() to do some of the things > ATExecAttachPartition() does. Again, that is a temporary arrangement. > I misspoke. Should have said: ... for Append to work, I have made ATExecAttachPartition() to do some of the things ATExecAddInherit() does. Again, that is a temporary arrangement. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 26-02-2015 AM 09:28, Jim Nasby wrote: > On 2/24/15 2:13 AM, Amit Langote wrote: >> -- a plain table >> CREATE TABLE parent_monthly(year int, month int, day int); >> >> -- a partitioned table >> -- x: number of partitions >> CREATE TABLE parent_monthly_x(LIKE parent_monthly) PARTITION BY >> RANGE ON(year, month); > > To be clear, in this example parent_table_x is in no way related to > parent_monthly, just like a normal CREATE TABLE (LIKE table), right? > Yes, there is no relation at all. I was maybe just trying to save few keystrokes. Sorry, that may be confusing. parent_monthly is just a regular table, part of the example. >> -- partitions >> CREATE TABLE parent_monthly_x_201401 PARTITION OF >> parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2); > > And the partitions are still inheritance children? > At this point, they *almost* are; more for the sake of Append. Though sooner than later, we will have to invent a version of Append for partitioned tables that does more than just append the outputs of underlying plans. For example, it would use partitioninfo cached in relation descriptor of the parent to drive partition-pruning for starters. Pruning child relations individually by way of constraint_exclusion doesn't scale as is well known. To clarify things a bit more, transformCreateStmt() transforms PARTITION OF into LIKE INCLUDING ALL, not INHERITS. And as mentioned, for Append to work, I have made ATExecAddInherit() to do some of the things ATExecAttachPartition() does. Again, that is a temporary arrangement. > Does ALTER TABLE parent_monthly_x_201401 ADD COLUMN foo still > operate the same as today? I'd like to see us continue to support that, > but perhaps it would be wise to not paint ourselves into that corner > just yet. Nothing prevents that from working, at least at the moment. CREATE TABLE parent_monthly_00012(LIKE parent_monthly) PARTITION BY RANGE ON(year, month); CREATE TABLE parent_monthly_00012_201401 PARTITION OF parent_monthly_00012 FOR VALUES BETWEEN (2014, 1) AND (2014, 2); CREATE TABLE parent_monthly_00012_201412 PARTITION OF parent_monthly_00012 FOR VALUES BETWEEN (2014, 12) AND (2015, 1); # INSERT INTO parent_monthly_00012 VALUES (2014, 07, 01); INSERT 0 1 # SELECT * FROM parent_monthly_00012; year | month | day --+---+- 2014 | 7 | 1 (1 row) # INSERT INTO parent_monthly_00012 VALUES (2014, 08, 01); INSERT 0 1 # ALTER TABLE parent_monthly_00012_201408 ADD COLUMN hour int; ALTER TABLE # INSERT INTO parent_monthly_00012_201408 VALUES (2014, 08, 01, 10); INSERT 0 1 # SELECT * FROM parent_monthly_00012; year | month | day --+---+- 2014 | 7 | 1 2014 | 8 | 1 2014 | 8 | 1 (3 rows) # SELECT * FROM parent_monthly_00012_201408; year | month | day | hour --+---+-+-- 2014 | 8 | 1 | 2014 | 8 | 1 | 10 (2 rows) Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch (was: Partitioning: issues/ideas)
On 2/24/15 2:13 AM, Amit Langote wrote: -- a plain table CREATE TABLE parent_monthly(year int, month int, day int); -- a partitioned table -- x: number of partitions CREATE TABLE parent_monthly_x(LIKE parent_monthly) PARTITION BY RANGE ON(year, month); To be clear, in this example parent_table_x is in no way related to parent_monthly, just like a normal CREATE TABLE (LIKE table), right? -- partitions CREATE TABLE parent_monthly_x_201401 PARTITION OF parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2); And the partitions are still inheritance children? Does ALTER TABLE parent_monthly_x_201401 ADD COLUMN foo still operate the same as today? I'd like to see us continue to support that, but perhaps it would be wise to not paint ourselves into that corner just yet. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch
On 02/24/2015 12:13 AM, Amit Langote wrote: > Here is an experimental patch that attempts to implement this. This looks awesome. I would love to have it for 9.5, but I guess the patch isn't nearly baked enough for that? > It implements the following syntax: > > * Syntax for defining partition key: > CREATE TABLE table_name(columns)PARTITION BY {RANGE|LIST} ON (key_spec); > > where key_spec consists of partition key column names and optional > operator class per column. Currently, there are restrictions on the > key_spec such as allowing only column names (not arbitrary expressions > of them), only one column for list strategy, etc. What's the obstacle to supporting expressions and/or IMMUTABLE functions? I think it's fine to add this feature without them initially, I'm just asking about the roadmap for eventually supporting expressions in the key spec. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch (was: Partitioning: issues/ideas)
On 24-02-2015 PM 05:13, Amit Langote wrote: > On 21-01-2015 PM 07:26, Amit Langote wrote: >> >> Ok, I will limit myself to focusing on following things at the moment: >> >> * Provide syntax in CREATE TABLE to declare partition key >> * Provide syntax in CREATE TABLE to declare a table as partition of a >> partitioned table and values it contains >> * Arrange to have partition key and values stored in appropriate >> catalogs (existing or new) >> * Arrange to cache partitioning info of partitioned tables in relcache >> > > Here is an experimental patch that attempts to implement this. I divided the patch into two for convenience: 1) 0001_partition_syntax_catalog - adds commands, catalog and partitioned table relation descriptor related WIP code 2) 0002_tuple-routing-poc - an experimental patch to test how well binary search approach works for tuple routing in ExecInsert(). Please take a look. Thanks, Amit diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile index a403c64..bf02730 100644 --- a/src/backend/catalog/Makefile +++ b/src/backend/catalog/Makefile @@ -14,7 +14,7 @@ OBJS = catalog.o dependency.o heap.o index.o indexing.o namespace.o aclchk.o \ objectaccess.o objectaddress.o pg_aggregate.o pg_collation.o \ pg_constraint.o pg_conversion.o \ pg_depend.o pg_enum.o pg_inherits.o pg_largeobject.o pg_namespace.o \ - pg_operator.o pg_proc.o pg_range.o pg_db_role_setting.o pg_shdepend.o \ + pg_operator.o pg_partition.o pg_proc.o pg_range.o pg_db_role_setting.o pg_shdepend.o \ pg_type.o storage.o toasting.o BKIFILES = postgres.bki postgres.description postgres.shdescription @@ -41,6 +41,7 @@ POSTGRES_BKI_SRCS = $(addprefix $(top_srcdir)/src/include/catalog/,\ pg_foreign_data_wrapper.h pg_foreign_server.h pg_user_mapping.h \ pg_foreign_table.h pg_policy.h \ pg_default_acl.h pg_seclabel.h pg_shseclabel.h pg_collation.h pg_range.h \ + pg_partitioned_rel.h pg_partition.h\ toasting.h indexing.h \ ) diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index bacb242..8bdb34b 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -43,6 +43,8 @@ #include "catalog/pg_opclass.h" #include "catalog/pg_operator.h" #include "catalog/pg_opfamily.h" +#include "catalog/pg_partition.h" +#include "catalog/pg_partitioned_rel.h" #include "catalog/pg_policy.h" #include "catalog/pg_proc.h" #include "catalog/pg_rewrite.h" @@ -157,7 +159,9 @@ static const Oid object_classes[MAX_OCLASS] = { DefaultAclRelationId, /* OCLASS_DEFACL */ ExtensionRelationId, /* OCLASS_EXTENSION */ EventTriggerRelationId, /* OCLASS_EVENT_TRIGGER */ - PolicyRelationId /* OCLASS_POLICY */ + PolicyRelationId, /* OCLASS_POLICY */ + PartitionedRelRelationId, /* OCLASS_PARTITIONED_REL */ + PartitionRelationId /* OCLASS_PARTITION */ }; @@ -1265,6 +1269,14 @@ doDeletion(const ObjectAddress *object, int flags) RemovePolicyById(object->objectId); break; + case OCLASS_PARTITIONED_REL: + RemovePartitionKeyByRelId(object->objectId); + break; + + case OCLASS_PARTITION: + RemovePartitionDefByRelId(object->objectId); + break; + default: elog(ERROR, "unrecognized object class: %u", object->classId); @@ -2373,6 +2385,12 @@ getObjectClass(const ObjectAddress *object) case PolicyRelationId: return OCLASS_POLICY; + + case PartitionedRelRelationId: + return OCLASS_PARTITIONED_REL; + + case PartitionRelationId: + return OCLASS_PARTITION; } /* shouldn't get here */ diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 17f7266..0308c0b 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -47,6 +47,8 @@ #include "catalog/pg_foreign_table.h" #include "catalog/pg_inherits.h" #include "catalog/pg_namespace.h" +#include "catalog/pg_partition.h" +#include "catalog/pg_partitioned_rel.h" #include "catalog/pg_statistic.h" #include "catalog/pg_tablespace.h" #include "catalog/pg_type.h" @@ -110,6 +112,7 @@ static Node *cookConstraint(ParseState *pstate, Node *raw_constraint, char *relname); static List *insert_ordered_unique_oid(List *list, Oid datum); +static void SetRelationIsPartitioned(Oid relationId, bool relispartitioned); /* @@ -2968,3 +2971,306 @@ insert_ordered_unique_oid(List *list, Oid datum) lappend_cell_oid(list, prev, datum); return list; } + +/* + * StorePartitionKey + * + * Store the partition key of relation rel into system catalog + * pg_partitioned_rel + */ +void +StorePartitionKey(Relation rel, int nattrs, +AttrNumber *partKeyAttrNumbers, +Oid *partClassOids, +char strategy) +{ + int i; + int2vector *partkey; + oidvector *partclass; + Datum values[Natts_pg_partitioned_rel]; + bool nulls[Natts_pg_partitioned_rel]; + Relation pg_partitioned_rel; + HeapTuple tuple; + ObjectAddress myself; + ObjectAddr
Re: [HACKERS] Partitioning WIP patch (was: Partitioning: issues/ideas)
On 25-02-2015 AM 01:13, Corey Huinker wrote: > I think it's confusing to use BETWEEN to mean [low,high) when it already > means [low,high] in WHERE clauses. > Yeah, I'm not really attached to that syntax. > Why not leverage range notation instead? > > CREATE TABLE parent_monthly_x_201401 PARTITION OF > parent_monthly_x FOR VALUES IN RANGE '[2014-04-01,2014-05-01)' > > "IN RANGE" could easily be "WITHIN RANGE" or "WITHIN" or something else. > > Clearly, this example above assumes that the partitioning is on a single > column. > > For partitioning on a set of columns you're essentially creating a custom > composite type with major-minor collation, could that custom type be > created at table creation time? Could an existing composite type be > declared as the partition key? > The answer to the latter is yes as long as there is an operator class that supports a strategy compatible with the chosen partitioning strategy. For example, record/composite type has built-in support for various btree strategies via record_ops. As for the former, I tend to think creating new user-space types/operators transparently might not be a very good idea. One concern would be pg_dump related. Though I also agree there is some duplication of major-minor collation logic in case of multi-column keys. > CREATE TYPE year_month( year int, month int ); > > (CREATE OPERATOR... for < = > ) > > CREATE TABLE parent_monthly(year int, month int, day int) PARTITION BY > RANGE ON year_month(year, month); > Regards, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch (was: Partitioning: issues/ideas)
I think it's confusing to use BETWEEN to mean [low,high) when it already means [low,high] in WHERE clauses. Why not leverage range notation instead? CREATE TABLE parent_monthly_x_201401 PARTITION OF parent_monthly_x FOR VALUES IN RANGE '[2014-04-01,2014-05-01)' "IN RANGE" could easily be "WITHIN RANGE" or "WITHIN" or something else. Clearly, this example above assumes that the partitioning is on a single column. For partitioning on a set of columns you're essentially creating a custom composite type with major-minor collation, could that custom type be created at table creation time? Could an existing composite type be declared as the partition key? CREATE TYPE year_month( year int, month int ); (CREATE OPERATOR... for < = > ) CREATE TABLE parent_monthly(year int, month int, day int) PARTITION BY RANGE ON year_month(year, month); On Tue, Feb 24, 2015 at 5:53 AM, Amit Langote wrote: > On 24-02-2015 PM 05:13, Amit Langote wrote: > > Additionally, a partition can itself be further partitioned (though I > > have not worked on the implementation details of multilevel partitioning > > yet): > > > > CREATE TABLE table_name PARTITION OF parent_name PARTITION BY > > {RANGE|LIST} ON(key_columns) FOR VALUES values_clause; > > One more blunder, supposed to be: > > CREATE TABLE table_name PARTITION OF parent_name FOR VALUES > values_clause PARTITION BY {RANGE|LIST} ON(key_columns); > > Thanks, > Amit > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Partitioning WIP patch (was: Partitioning: issues/ideas)
On 24-02-2015 PM 05:13, Amit Langote wrote: > Additionally, a partition can itself be further partitioned (though I > have not worked on the implementation details of multilevel partitioning > yet): > > CREATE TABLE table_name PARTITION OF parent_name PARTITION BY > {RANGE|LIST} ON(key_columns) FOR VALUES values_clause; One more blunder, supposed to be: CREATE TABLE table_name PARTITION OF parent_name FOR VALUES values_clause PARTITION BY {RANGE|LIST} ON(key_columns); Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning WIP patch (was: Partitioning: issues/ideas)
On 24-02-2015 PM 05:13, Amit Langote wrote: > -- partitions > CREATE TABLE parent_monthly_x_201401 PARTITION OF > parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2); > > CREATE TABLE parent_monthly_x_201402 PARTITION OF > parent_monthly_00100_201402 FOR VALUES BETWEEN (2014, 2) AND (2014, 3); > > CREATE TABLE parent_monthly_x_201403 PARTITION OF > parent_monthly_00100_201403 FOR VALUES BETWEEN (2014, 3) AND (2014, 4); > > > > CREATE TABLE parent_monthly_x_mm PARTITION OF > parent_monthly_00100_mm FOR VALUES BETWEEN (, mm AND (, mm); > Oops, hand-edited "PARTITION OF parent_monthly_" wrongly. Supposed to be - CREATE TABLE parent_monthly_x_201401 PARTITION OF parent_monthly_x FOR VALUES BETWEEN (2014, 1) AND (2014, 2); and so on. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Partitioning WIP patch (was: Partitioning: issues/ideas)
On 21-01-2015 PM 07:26, Amit Langote wrote: > > Ok, I will limit myself to focusing on following things at the moment: > > * Provide syntax in CREATE TABLE to declare partition key > * Provide syntax in CREATE TABLE to declare a table as partition of a > partitioned table and values it contains > * Arrange to have partition key and values stored in appropriate > catalogs (existing or new) > * Arrange to cache partitioning info of partitioned tables in relcache > Here is an experimental patch that attempts to implement this. It implements the following syntax: * Syntax for defining partition key: CREATE TABLE table_name(columns)PARTITION BY {RANGE|LIST} ON (key_spec); where key_spec consists of partition key column names and optional operator class per column. Currently, there are restrictions on the key_spec such as allowing only column names (not arbitrary expressions of them), only one column for list strategy, etc. * Syntax for declaring a table as partition of a partitioned table: CREATE TABLE table_name PARTITION OF parent_name FOR VALUES values_clause; where values_clause can be: IN (list_of_values), or BETWEEN (range_lower_bounds) AND (range_upper_bounds); The semantics for a range is [range_lower_bounds,range_upper_bounds), that is, lower inclusive, upper exclusive. (this might later change subject to choice regarding preferred/desired syntax) Additionally, a partition can itself be further partitioned (though I have not worked on the implementation details of multilevel partitioning yet): CREATE TABLE table_name PARTITION OF parent_name PARTITION BY {RANGE|LIST} ON(key_columns) FOR VALUES values_clause; There are two system catalogs pg_partitioned_rel and pg_partition which store partition key spec and partition value spec, respectively. (remember to initdb if interested in trying) Please note that the above syntax and/or catalog may not be very appealing nor that they won't change/disappear. I am posting the patch more for examining the approach of internal representation of the metadata for partitioning and get some general comments. The approach I have implemented in this patch consists of loading the partition key info and a list of partitions into the relation descriptor for a partitioned table. In case of range partitioning, this list is sorted on the range max bound. To see if that works any good, I hacked ExecInsert() to make it find a partition for a tuple by adding a ExecFindPartition() just before heap_insert(). It accepts resultRelInfo of the parent and a tuple slot. It binary-searches for and returns the descriptor of the chosen partition which ExecInsert() then uses to perform heap_insert() and inserting index tuples. If no partition is found, parent relation itself is used. heap_insert() and ExecInsertIndexTuples() are the only things for which partition relation is used. All of this is just experimental and most probably wrong in details; but is done just to see what kind of performance to expect from the chosen internal representation. Another thing is the approach that tuple-routing (& partition-pruning) is a function of partitioned relation and the tuple (or restrict quals). It will be more significant when we'll get to implementing a partition-pruning function. See below an example to show that having an extra ExecFindPartition() does not degrade the performance of inserting a tuple much: -- a plain table CREATE TABLE parent_monthly(year int, month int, day int); -- a partitioned table -- x: number of partitions CREATE TABLE parent_monthly_x(LIKE parent_monthly) PARTITION BY RANGE ON(year, month); -- partitions CREATE TABLE parent_monthly_x_201401 PARTITION OF parent_monthly_00100_201401 FOR VALUES BETWEEN (2014, 1) AND (2014, 2); CREATE TABLE parent_monthly_x_201402 PARTITION OF parent_monthly_00100_201402 FOR VALUES BETWEEN (2014, 2) AND (2014, 3); CREATE TABLE parent_monthly_x_201403 PARTITION OF parent_monthly_00100_201403 FOR VALUES BETWEEN (2014, 3) AND (2014, 4); CREATE TABLE parent_monthly_x_mm PARTITION OF parent_monthly_00100_mm FOR VALUES BETWEEN (, mm AND (, mm); -- insert 1 tuple into the plain table INSERT INTO parent_monthly VALUES (2013, 12, 01); INSERT 0 1 Time: 3.303 ms -- insert 1 tuple into the partitioned table -- #part: number of partitions -- case 1: find no valid partition -- case 2: find a valid partition #parts case 1 case 2 10 3.248 ms3.509 ms 100 3.546 ms3.269 ms 500 3.497 ms3.048 ms 10003.364 ms5.379 ms 1 4.943 ms5.076 ms Thoughts? Thanks, Amit diff --git a/src/backend/catalog/Makefile b/src/backend/catalog/Makefile index a403c64..bf02730 100644 --- a/src/backend/catalog/Makefile +++ b/src/backend/catalog/Makefile @@ -14,7 +14,7 @@ OBJS = catalog.o dependency.o heap.o index.o indexing.o namespace.o aclchk.o \ objec