Re: [HACKERS] Partitioning option for COPY
Hannu Krosing wrote: After triggers can't change tuple, thus cant change routing. An after trigger can always issue an update of the tuple but that should be trapped by the regular mechanism that will deal with updates (when we have it available). Also the description omits the execution of before and after statement triggers. While those can apply to the parent table (but the same question about what happens if the after statement modifies routing decision still applies), what does it mean in the case of COPY to have statement triggers on the child tables? What statement triggers do you mean ? I don't think we have ON COPY triggers ? I mean CREATE TRIGGER /name/ { BEFORE | AFTER } /event/ ON /table/ FOR EACH STATEMENT Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com
Re: [HACKERS] Partitioning option for COPY
On Tue, 2009-11-24 at 10:08 -0500, Emmanuel Cecchet wrote: Itagaki Takahiro wrote: I just edited a wiki page for this discussion. I hope it can be a help. http://wiki.postgresql.org/wiki/Table_partitioning I guess the problem of handling user triggers is still open. If we allow triggers on partitions, badly written logic could lead to infinite loops in routing. In the case of COPY, an after statement trigger could change all the routing decisions taken for each row. A simple update to the row can cause it to move between partitions, no ? I am not sure what the semantic should be if you have triggers defined on the parent and child tables. Which triggers do you fire if the insert is on the parent table but the tuple ends up in a child table? I'd propose that triggers on both parent table and selected child are executed. 1. first you execute before triggers on parent table, which may change which partition the row belongs to 2. then you execute before triggers on selected child table 2.1 if this changes the child table selection repeat from 2. 3. save the tuple in child table 4. execute after triggers of the final selected child table 5. execute after triggers of parent table order of 4. and 5. is selected arbitrarily, others are determined by flow. If the new implementation hides the child tables, If you hide child tables, you suddenly need a lot of new syntax to unhide them, so that partitions can be manipulated. Currently it is easy to do it with INHERIT / NO INHERIT. it might be safer to not allow triggers on child tables altogether and use the parent table as the single point of entry to access the partition (and define triggers). With the current proposed implementation, would it be possible to define a view using child tables? the child tables are there, and they _are_ defined, either implicitly (using constraints, which constraint exclusion resolves to a set of child tables) or explicitly, using child table names directly. -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Hannu Krosing wrote: On Tue, 2009-11-24 at 10:08 -0500, Emmanuel Cecchet wrote: Itagaki Takahiro wrote: I just edited a wiki page for this discussion. I hope it can be a help. http://wiki.postgresql.org/wiki/Table_partitioning I guess the problem of handling user triggers is still open. If we allow triggers on partitions, badly written logic could lead to infinite loops in routing. In the case of COPY, an after statement trigger could change all the routing decisions taken for each row. A simple update to the row can cause it to move between partitions, no ? Yes. I am not sure what the semantic should be if you have triggers defined on the parent and child tables. Which triggers do you fire if the insert is on the parent table but the tuple ends up in a child table? I'd propose that triggers on both parent table and selected child are executed. 1. first you execute before triggers on parent table, which may change which partition the row belongs to 2. then you execute before triggers on selected child table 2.1 if this changes the child table selection repeat from 2. 3. save the tuple in child table 4. execute after triggers of the final selected child table What if that trigger changes again the child table selection? 5. execute after triggers of parent table Same here, what if the trigger changes the child table selection. Do we re-execute triggers on the new child table? Also it is debatable whether we should execute an after trigger on a table where nothing was really inserted. order of 4. and 5. is selected arbitrarily, others are determined by flow. Also the description omits the execution of before and after statement triggers. While those can apply to the parent table (but the same question about what happens if the after statement modifies routing decision still applies), what does it mean in the case of COPY to have statement triggers on the child tables? You cannot know in advance where the tuples are going to go and fire the before statement triggers. If you had to fire after statement triggers, in which order would you fire them? If the new implementation hides the child tables, If you hide child tables, you suddenly need a lot of new syntax to unhide them, so that partitions can be manipulated. Currently it is easy to do it with INHERIT / NO INHERIT. Agreed, but I think that we will discover some restrictions that will apply to child tables. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Wed, Nov 25, 2009 at 5:03 AM, Hannu Krosing ha...@2ndquadrant.com wrote: I'd propose that triggers on both parent table and selected child are executed. I was thinking we should make the partitioning decision FIRST, before any triggers are fired, and then fire only those triggers relevant to the selected partition. If the BEFORE triggers on the partition modify the tuple in a way that makes it incompatible with the table constraints on that partition, the insert (or update) fails. Firing triggers on more than one table is pretty substantially incompatible with what we do elsewhere and I'm not clear what we get in exchange. What is the use case for this? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Robert Haas wrote: On Wed, Nov 25, 2009 at 5:03 AM, Hannu Krosing ha...@2ndquadrant.com wrote: I'd propose that triggers on both parent table and selected child are executed. I was thinking we should make the partitioning decision FIRST, before any triggers are fired, and then fire only those triggers relevant to the selected partition. If the BEFORE triggers on the partition modify the tuple in a way that makes it incompatible with the table constraints on that partition, the insert (or update) fails. Firing triggers on more than one table is pretty substantially incompatible with what we do elsewhere and I'm not clear what we get in exchange. What is the use case for this? I don't have a use case for this but I was puzzled with that when I had to implement trigger support in COPY with partitioning. I came to the same conclusion as you and made the operation fail if the trigger was trying to move the tuple to another partition. However, I had a problem with after row triggers that I had to call synchronously to be able to detect the change. We will need something to tell us that an after row trigger did not mess with the routing decision. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Wed, Nov 25, 2009 at 9:21 AM, Emmanuel Cecchet m...@asterdata.com wrote: Robert Haas wrote: On Wed, Nov 25, 2009 at 5:03 AM, Hannu Krosing ha...@2ndquadrant.com wrote: I'd propose that triggers on both parent table and selected child are executed. I was thinking we should make the partitioning decision FIRST, before any triggers are fired, and then fire only those triggers relevant to the selected partition. If the BEFORE triggers on the partition modify the tuple in a way that makes it incompatible with the table constraints on that partition, the insert (or update) fails. Firing triggers on more than one table is pretty substantially incompatible with what we do elsewhere and I'm not clear what we get in exchange. What is the use case for this? I don't have a use case for this but I was puzzled with that when I had to implement trigger support in COPY with partitioning. I came to the same conclusion as you and made the operation fail if the trigger was trying to move the tuple to another partition. However, I had a problem with after row triggers that I had to call synchronously to be able to detect the change. We will need something to tell us that an after row trigger did not mess with the routing decision. *scratches head* I'm confused. Only a BEFORE ROW trigger can possibly change anything... the return value of an AFTER ROW trigger is ignored. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
It seems like the easiest way to resolve this without weird corner cases is to say that we fire triggers belonging to the parent table. The individual partition child tables either shouldn't have triggers at all, or we should restrict the cases in which those are considered applicable. As an example, what are you going to do with statement-level triggers? Fire them for *every* child whether it receives a row or not? Doesn't seem like the right thing. Again, this solution presupposes an explicit concept of partitioned tables within the system... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Wed, 2009-11-25 at 08:39 -0500, Emmanuel Cecchet wrote: Hannu Krosing wrote: On Tue, 2009-11-24 at 10:08 -0500, Emmanuel Cecchet wrote: Itagaki Takahiro wrote: I just edited a wiki page for this discussion. I hope it can be a help. http://wiki.postgresql.org/wiki/Table_partitioning I guess the problem of handling user triggers is still open. If we allow triggers on partitions, badly written logic could lead to infinite loops in routing. In the case of COPY, an after statement trigger could change all the routing decisions taken for each row. A simple update to the row can cause it to move between partitions, no ? Yes. I am not sure what the semantic should be if you have triggers defined on the parent and child tables. Which triggers do you fire if the insert is on the parent table but the tuple ends up in a child table? I'd propose that triggers on both parent table and selected child are executed. 1. first you execute before triggers on parent table, which may change which partition the row belongs to 2. then you execute before triggers on selected child table 2.1 if this changes the child table selection repeat from 2. 3. save the tuple in child table 4. execute after triggers of the final selected child table What if that trigger changes again the child table selection? 5. execute after triggers of parent table Same here, what if the trigger changes the child table selection. Do we re-execute triggers on the new child table? After triggers can't change tuple, thus cant change routing. Also it is debatable whether we should execute an after trigger on a table where nothing was really inserted. order of 4. and 5. is selected arbitrarily, others are determined by flow. Also the description omits the execution of before and after statement triggers. While those can apply to the parent table (but the same question about what happens if the after statement modifies routing decision still applies), what does it mean in the case of COPY to have statement triggers on the child tables? What statement triggers do you mean ? I don't think we have ON COPY triggers ? You cannot know in advance where the tuples are going to go and fire the before statement triggers. If you had to fire after statement triggers, in which order would you fire them? If the new implementation hides the child tables, If you hide child tables, you suddenly need a lot of new syntax to unhide them, so that partitions can be manipulated. Currently it is easy to do it with INHERIT / NO INHERIT. Agreed, but I think that we will discover some restrictions that will apply to child tables. I think we should keep the possibility to populate partitions offline and then plug then into table as partitions (current INHERIT) and also to extract partition into separate table (NO INHERIT). -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Wed, 2009-11-25 at 11:30 -0500, Tom Lane wrote: It seems like the easiest way to resolve this without weird corner cases is to say that we fire triggers belonging to the parent table. The individual partition child tables either shouldn't have triggers at all, or we should restrict the cases in which those are considered applicable. Agreed. maybe allow only ROW-level AFTER triggers (for logging late arrivals and updates on tables partitioned on time for example ) As an example, what are you going to do with statement-level triggers? Fire them for *every* child whether it receives a row or not? Doesn't seem like the right thing. Again, this solution presupposes an explicit concept of partitioned tables within the system... For explicit partitioned tables with hidden partitions it is of course best to not add extra effort for allowing triggers to be defined on those (hidden) partitions. If the partition tables are visible, some trigger support would be good. regards, tom lane -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Wed, Nov 25, 2009 at 11:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: It seems like the easiest way to resolve this without weird corner cases is to say that we fire triggers belonging to the parent table. The individual partition child tables either shouldn't have triggers at all, or we should restrict the cases in which those are considered applicable. As an example, what are you going to do with statement-level triggers? Fire them for *every* child whether it receives a row or not? Doesn't seem like the right thing. Just the tables that get a row? I don't know, your way may be best, but it seems like tables on individual partitions might be useful in some situations. Again, this solution presupposes an explicit concept of partitioned tables within the system... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Mon, 2009-11-23 at 10:24 -0500, Emmanuel Cecchet wrote: But it looks like it is a waste of everybody's time to continue this discussion further. Just move the patch to the rejected patches and let's wait for Itagaki's implementation. Emmanuel, please try to work together with Itagaki san on getting the bigger vision implemented, as this is a thing that can benefit a lot from more people who have taken the time to learn about the parts of code involved. Even though this patch will not get in, most of the effort in developing it is not actual coding, but familiarizing yourself with the other code involved. Coding actual patches should be easy once you know the code _and_ the desired result. You probably already know a lot of what is required to help us to common goal of a clean implementation of partitioning. -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Hannu Krosing http://www.2ndQuadrant.com PostgreSQL Scalability and Availability Services, Consulting and Training -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Hannu Krosing ha...@2ndquadrant.com wrote: Even though this patch will not get in, most of the effort in developing it is not actual coding, but familiarizing yourself with the other code involved. I just edited a wiki page for this discussion. I hope it can be a help. http://wiki.postgresql.org/wiki/Table_partitioning Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Tue, 2009-11-24 at 17:30 +0900, Itagaki Takahiro wrote: Hannu Krosing ha...@2ndquadrant.com wrote: Even though this patch will not get in, most of the effort in developing it is not actual coding, but familiarizing yourself with the other code involved. I just edited a wiki page for this discussion. I hope it can be a help. http://wiki.postgresql.org/wiki/Table_partitioning Good job. Looks like a clear path forwards to me. I've made a couple of minor clarifications. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Itagaki Takahiro wrote: I just edited a wiki page for this discussion. I hope it can be a help. http://wiki.postgresql.org/wiki/Table_partitioning I guess the problem of handling user triggers is still open. If we allow triggers on partitions, badly written logic could lead to infinite loops in routing. In the case of COPY, an after statement trigger could change all the routing decisions taken for each row. I am not sure what the semantic should be if you have triggers defined on the parent and child tables. Which triggers do you fire if the insert is on the parent table but the tuple ends up in a child table? If the new implementation hides the child tables, it might be safer to not allow triggers on child tables altogether and use the parent table as the single point of entry to access the partition (and define triggers). With the current proposed implementation, would it be possible to define a view using child tables? Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Emmanuel Cecchet m...@asterdata.com wrote: I guess the problem of handling user triggers is still open. If we allow triggers on partitions, badly written logic could lead to infinite loops in routing. Infinite loops are not a partition-related problem, no? We can also find infinite loops in user defined functions, recursive queries, etc. I think the only thing we can do for it is to *stop* loops instead of prevention, like max_stack_depth. With the current proposed implementation, would it be possible to define a view using child tables? No, if you mean using a partition-view. I'm thinking we are moving our implementation of partitioning from view-based to built-in feature. Do you have any use-cases that requires view-based partitioning? Was the inheritance-based partitioning not enough for it? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Itagaki Takahiro wrote: Emmanuel Cecchet m...@asterdata.com wrote: I guess the problem of handling user triggers is still open. If we allow triggers on partitions, badly written logic could lead to infinite loops in routing. Infinite loops are not a partition-related problem, no? We can also find infinite loops in user defined functions, recursive queries, etc. I think the only thing we can do for it is to *stop* loops instead of prevention, like max_stack_depth. I was thinking a trigger on child1 updating the partition key forcing the tuple to move to child2. And then a trigger on child2 updating the key again to move the tuple back to child1. You end up with an infinite loop. With the current proposed implementation, would it be possible to define a view using child tables? No, if you mean using a partition-view. I'm thinking we are moving our implementation of partitioning from view-based to built-in feature. Do you have any use-cases that requires view-based partitioning? Was the inheritance-based partitioning not enough for it? Nevermind, I was thinking about the implications of materialized views but Postgres does not have materialized views! I have other questions related to create table but I will post them in the 'syntax for partitioning' thread. Thanks Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote: Hi, I have extracted the partitioning option for COPY (removed the error logging part) from the previous patch. We can use an INSERT trigger to route tuples into partitions even now. Why do you need an additional router for COPY? Tom has already explained on the list why using a trigger was a bad idea (and I know we can use a trigger since I am the one who wrote it). If you look at the code you will see that you can do optimizations in the COPY code that you cannot do in the trigger. Also, it would be nicer that the router can works not only in COPY but also in INSERT. As 8.5 will at best provide a syntactic hack on top of the existing constraint implementation, I think that it will not hurt to have routing in COPY since we will not have it anywhere otherwise. BTW, I'm working on meta data of partitioning now. Your partitioning option in COPY could be replaced with the catalog. This implementation is only for the current 8.5 and it will not be needed anymore once we get a fully functional partitioning in Postgres which seems to be for a future version. Yes, the trigger way of doing this is a bad way. I regret to say that the way proposed here isn't much better, AFAICS. Let me explain why I think that, but -1 to anyone applying this patch. This patch proposes keeping a cache of last visited partitions to reduce the overhead of data routing. What I've requested is that partitioning work by using a data structure held in relcache for inheritance parents. This differs in 3 ways from this patch a) it has a clearly defined location for the cached metadata, with clearly identified and well worked out mechanisms for cache invalidation b) the cache can be built once when it is first needed, not slowly grown as parts of the metadata are used c) it would be available for all parts of the server, not just COPY. The easiest way to build that metadata is when structured partitioning info is available. i.e. the best next action is to complete and commit Itagaki's partitioning syntax patch. Then we can easily build the metadata for partitioning, which can then be used in COPY for data routing. Anyway, I want data routing, as is the intention of this patch. I just don't think this patch is a useful way to do it. It is too narrow in its scope and potentially buggy in its approach to developing a cache and using trigger-like stuff. ISTM that with the right metadata in the right place, a cleaner and easier solution is still possible for 8.5. The code within COPY should really just reduce to a small piece of code to derive the correct relation for the desired row and then use that during heap_insert(). I have just discussed partitioning with Itagaki-san at JPUG, so I know his plans. Itagaki-san and Manu, please can you work together to make this work for 8.5? --- A more detailed explanation of Partitioning Metadata: Partitioning Metadata is information held on the relcache for a table that has child partitions. Currently, a table does not cache info about its children, which prevents various optimisations. We would have an extra pointer on the Relation struct that points to a PartitioningMetadata struct. We can fill in this information when we construct the relcache for a relation, or we can populate it on demand the first time we attempt to use that information (if it exists). We want to hold an array of partition boundary values. This will then allow us to use bsearch to find the partition that a specific value applies to. Thus it can be used for routing data from INSERTs or COPY, can be used for identifying which partitions need to be included/excluded from an APPEND node. Using this will be O(logN) rather than O(N), so allowing us to have much larger number of partitions when required. Note that it can also be used within the executor to perform dynamic partition elimination, thus allowing us to easily implement partition aware joins etc. To construct the array we must sort the partition boundary values and prove that the partition definitions do not overlap. That is much easier to do when the partitions are explicitly defined. (Plus, there is no requirement to have, or mechanism to specify, unique partitions currently, although most users assume this in their usage). I imagine we would have an API called something like RelationIdentifyPartition() where we provide value(s) for the PartitioningKey column(s) and we then return the Oid of the partition that holds that value. That function would build the metadata, if not already cached, then bsearch it to provide the Oid. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Simon, I think you should read the thread and the patch before making any false statements like you did in your email. 1. The patch does not use any trigger for routing. 2. This is just an option for COPY that is useful for loading operations in the datawarehouse world. It is not meant to implement full partitioning as explained many times already in this thread. 3. This patch elaborates on existing mechanisms and cannot rely on a meta-data representation of partitions which does not exist yet and will probably not exist in 8.5 You should justify your statements when you say 'potentially buggy in its approach to developing a cache and using trigger-like stuff'. I understand that you don't like it because this is not what you want but this is not my fault. This is not an implementation of partitioning like COPY does not do update/delete/alter/... And yes the use case is 'narrow' like any option in COPY. It is like complaining that the CSV option is not useful because you want to load binary dumps. If Itagaki gets the support of the community to get his implementation accepted, I will gladly use it. Contributing? If Aster is willing to contribute a code monkey to implement your specs, why not but you will have to convince them. You should really think twice about the style of your emails that cast a detestable tone to discussions on pg-hackers. Emmanuel On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote: Hi, I have extracted the partitioning option for COPY (removed the error logging part) from the previous patch. We can use an INSERT trigger to route tuples into partitions even now. Why do you need an additional router for COPY? Tom has already explained on the list why using a trigger was a bad idea (and I know we can use a trigger since I am the one who wrote it). If you look at the code you will see that you can do optimizations in the COPY code that you cannot do in the trigger. Also, it would be nicer that the router can works not only in COPY but also in INSERT. As 8.5 will at best provide a syntactic hack on top of the existing constraint implementation, I think that it will not hurt to have routing in COPY since we will not have it anywhere otherwise. BTW, I'm working on meta data of partitioning now. Your partitioning option in COPY could be replaced with the catalog. This implementation is only for the current 8.5 and it will not be needed anymore once we get a fully functional partitioning in Postgres which seems to be for a future version. Yes, the trigger way of doing this is a bad way. I regret to say that the way proposed here isn't much better, AFAICS. Let me explain why I think that, but -1 to anyone applying this patch. This patch proposes keeping a cache of last visited partitions to reduce the overhead of data routing. What I've requested is that partitioning work by using a data structure held in relcache for inheritance parents. This differs in 3 ways from this patch a) it has a clearly defined location for the cached metadata, with clearly identified and well worked out mechanisms for cache invalidation b) the cache can be built once when it is first needed, not slowly grown as parts of the metadata are used c) it would be available for all parts of the server, not just COPY. The easiest way to build that metadata is when structured partitioning info is available. i.e. the best next action is to complete and commit Itagaki's partitioning syntax patch. Then we can easily build the metadata for partitioning, which can then be used in COPY for data routing. Anyway, I want data routing, as is the intention of this patch. I just don't think this patch is a useful way to do it. It is too narrow in its scope and potentially buggy in its approach to developing a cache and using trigger-like stuff. ISTM that with the right metadata in the right place, a cleaner and easier solution is still possible for 8.5. The code within COPY should really just reduce to a small piece of code to derive the correct relation for the desired row and then use that during heap_insert(). I have just discussed partitioning with Itagaki-san at JPUG, so I know his plans. Itagaki-san and Manu, please can you work together to make this work for 8.5? --- A more detailed explanation of Partitioning Metadata: Partitioning Metadata is information held on the relcache for a table that has child partitions. Currently, a table does not cache info about its children, which prevents various optimisations. We would have an extra pointer on the Relation struct that points to a PartitioningMetadata struct. We can fill in this information when we construct the relcache for a relation, or we can populate it on demand the first time we attempt to use that information (if it exists). We want to hold an array of partition boundary values. This will then allow us to use bsearch to find the partition
Re: [HACKERS] Partitioning option for COPY
On Mon, Nov 23, 2009 at 9:39 AM, Emmanuel Cecchet m...@asterdata.com wrote: I think you should read the thread and the patch before making any false statements like you did in your email. 1. The patch does not use any trigger for routing. Whoa, whoa! I don't think that Simon said that it did. But even if I am wrong and he did... You should really think twice about the style of your emails that cast a detestable tone to discussions on pg-hackers. ...I certainly don't think this comment is justified. This is a technical discussion about the best way of solving a certain problem, and I don't believe that any of the discussion up to this point has been anything other than civil. I can tell that you are frustrated that your patch is not getting the support you would like to see it get, but launching ad hominem attacks on Simon or anyone else is not going to help. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Robert Haas wrote: On Mon, Nov 23, 2009 at 9:39 AM, Emmanuel Cecchet m...@asterdata.com wrote: I think you should read the thread and the patch before making any false statements like you did in your email. 1. The patch does not use any trigger for routing. Whoa, whoa! I don't think that Simon said that it did. But even if I am wrong and he did... Quote from Simon's email: It is too narrow in its scope and potentially buggy in its approach to developing a cache and using trigger-like stuff. You should really think twice about the style of your emails that cast a detestable tone to discussions on pg-hackers. ...I certainly don't think this comment is justified. This is a technical discussion about the best way of solving a certain problem, and I don't believe that any of the discussion up to this point has been anything other than civil. I can tell that you are frustrated that your patch is not getting the support you would like to see it get, but launching ad hominem attacks on Simon or anyone else is not going to help We certainly don't live in the same civilization then. I am not frustrated if my patch does not get in because of technical considerations and I am happy so far with Jan's feedback that helped a lot. I think there is a misunderstanding between what Simon wants ('Anyway, I want data routing, as is the intention of this patch.') and what this patch is about. This patch is just supposed to load tuples in a hierarchy of tables as this is a recurrent use case in datawarehouse scenarios. It is not supposed to solve data routing in general (otherwise that would be integrated standard in COPY and not as an option). But it looks like it is a waste of everybody's time to continue this discussion further. Just move the patch to the rejected patches and let's wait for Itagaki's implementation. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Mon, 2009-11-23 at 09:39 -0500, Emmanuel Cecchet wrote: I think you should read the thread and the patch I did read the thread and patch in full before posting. My opinions are given to help you and the community towards a desirable common goal. I was unaware you were developing these ideas and so was unable to provide comments until now. My review of Kedar's patch in July did lay out in general terms a specific implementation route for future work on partitioning. I had thought I might not have made those comments clearly enough, so gave a more specific description of what I consider to be a more workable and general solution for cacheing and using partitioning metadata. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Mon, 2009-11-23 at 09:39 -0500, Emmanuel Cecchet wrote: I think you should read the thread and the patch I did read the thread and patch in full before posting. My opinions are given to help you and the community towards a desirable common goal. I was unaware you were developing these ideas and so was unable to provide comments until now. My review of Kedar's patch in July did lay out in general terms a specific implementation route for future work on partitioning. I had thought I might not have made those comments clearly enough, so gave a more specific description of what I consider to be a more workable and general solution for cacheing and using partitioning metadata. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Simon Riggs wrote: I was unaware you were developing these ideas and so was unable to provide comments until now. The first patch was published to this list on September 10 (almost 2.5 months ago) along with the wiki page describing the problem and the solution. What should I have done to raise awareness further? /E -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Mon, 2009-11-23 at 10:24 -0500, Emmanuel Cecchet wrote: I think there is a misunderstanding between what Simon wants ('Anyway, I want data routing, as is the intention of this patch.') and what this patch is about. This patch is just supposed to load tuples in a hierarchy of tables as this is a recurrent use case in datawarehouse scenarios. It is not supposed to solve data routing in general (otherwise that would be integrated standard in COPY and not as an option). I have not misunderstood. You wish to solve a very specific problem, with very specific code. I've done that myself on occasion. My opinion is that we should solve many of the partitioning problems with one set of central, common code. If we do not do this we will need 3-4 times as much code, most of which will be similar and yet must be exactly the same. That alone is enough to block the patch's proposed method (IMHO). But it looks like it is a waste of everybody's time to continue this discussion further. Just move the patch to the rejected patches and let's wait for Itagaki's implementation. The lack of discussion and design in this area has held back the last few patches, by various authors; we should learn from that. Also, working in isolation on narrow problems will not move us forwards as fast as if we all work together on pieces of the whole vision for partitioning. My piece was to think through how to link each of the different aspects of partitioning and to propose a solution. Please join with Itagaki to move this forwards - your further contributions will be valuable. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Mon, 2009-11-23 at 10:43 -0500, Emmanuel Cecchet wrote: Simon Riggs wrote: I was unaware you were developing these ideas and so was unable to provide comments until now. The first patch was published to this list on September 10 (almost 2.5 months ago) along with the wiki page describing the problem and the solution. What should I have done to raise awareness further? ...Read my detailed comments in response to Kedar's patch and post comments on that thread to say you didn't agree with that proposal and that you were thinking of another way entirely. ~14 July. 4 months ago. ...Contact me personally when you saw that I hadn't responded to your later posts, knowing that I have recent track record as a reviewer of partitioning patches. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Simon Riggs si...@2ndquadrant.com writes: Anyway, I want data routing, as is the intention of this patch. I just don't think this patch is a useful way to do it. It is too narrow in its scope and potentially buggy in its approach to developing a cache and using trigger-like stuff. FWIW, I agree --- there are two really fundamental problems with this patch: * It only applies to COPY. You'd certainly want routing for INSERT as well. And it shouldn't be necessary to specify an option. * Building this type of infrastructure on top of independent, not guaranteed consistent table constraints is just throwing more work into a dead end. The patch is already full of special-case errors for possible inconsistency of the constraints, and I don't think it's bulletproof even so (what if someone is altering the constraints concurrently? What if there's more than one legal destination?) And the performance necessarily sucks. What we need first is an explicit representation of partitioning, and then to build routing code on top of that. I haven't looked at Itagaki-san's syntax patch at all, but I think it's at least starting in a sensible place. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Simon Riggs wrote: ...Read my detailed comments in response to Kedar's patch and post comments on that thread to say you didn't agree with that proposal and that you were thinking of another way entirely. Useful background here is: http://wiki.postgresql.org/wiki/Table_partitioning http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php http://archives.postgresql.org/message-id/bd8134a40906080702s96c90a9q3bbb581b9bd0d...@mail.gmail.com http://archives.postgresql.org/message-id/1247564358.11347.1308.ca...@ebony.2ndquadrant The basic problem here is that Emmanuel and Aster developed a useful answer to one of the more pressing implementation details needed here, but did so without being involved in the much larger discussion of how to implement general, more automated partitioning in PostgreSQL that (as you can see from the date of the first links there) has been going on for years already. What we did wrong as a community is not more explicitly tell Emmanuel the above when he first submitted code a few months ago, before he'd invested more time on a subset implementation that was unlikely to be committed. As I already commented upthread, I was just happy to see coding progress being made on part of the design that nobody had hacked on before to my knowledge; I didn't consider then how Emmanuel was going to be disappointed by the slow rate that code would be assimilated into the design going on in this area. What would probably be helpful here is to take the mess of raw data above and turn it into a simpler partitioning roadmap. There's a stack of useful patches here, multiple contributors who have gotten familiar with the implementation details required, and enough time left that it's possible to pull something together in time for 8.5--but only if everyone is clear on exactly what direction to push toward. I'm going to reread the history here myself and see if I can write something helpful here. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Mon, 2009-11-23 at 12:23 -0500, Greg Smith wrote: What would probably be helpful here is to take the mess of raw data above and turn it into a simpler partitioning roadmap. Thanks for summarising. I briefly tried to do that on the thread for Itagaki-san's patch. That's a first stab at things, at least. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Tom Lane t...@sss.pgh.pa.us wrote: What we need first is an explicit representation of partitioning, and then to build routing code on top of that. I haven't looked at Itagaki-san's syntax patch at all, but I think it's at least starting in a sensible place. I have the following development plan for partitioning. I'll continue to use inherits-based partitioning... at least in 8.5. 8.5 Alpha 3: Syntax and catalog changes (on-disk structure). I think pg_dump is the biggest stopper in the phase. 8.5 Alpha 4: Internal representation (on-memory structure), that will replace insert-triggers first, and also replace CHECK constraints if possible (but probably non-INSERT optimizations will slide to 8.6). The internal representation of RANGE partitions will be an array of pairs of { upper-value, parition-relid } for each parent table. An insert target partition are determined using binary search on insert. It will be faster than sequential checks of CHECK constraint especially in large number of child tables. The array will be kept in CacheMemoryContext or query context to reduce access to the system catalog. RelationData or TupleDesc will have an additional field for it. * It only applies to COPY. You'd certainly want routing for INSERT as well. And it shouldn't be necessary to specify an option. Sure. We need the routingin both INSERT and COPY. Even if Emmanuel-san's patch will be committed in Alpha 3, the code would be discarded in Alpha 4. * Building this type of infrastructure on top of independent, not guaranteed consistent table constraints is just throwing more work into a dead end. I think the current approach is not necessarily wrong for CHECK-based partitioning, but I'd like to have more specialized or generalized functionality for the replacement of triggers. If we will take specialized approach, triggers will be replaced with built-in feature. We can only use RANGE and LIST partitions. On the other hand, it might be interesting to take some generalized approach; For example, spliting BEFORE INSERT triggers into 3 phases: 1. Can cancel the insert and modify the new tuple. 2. Can cancel the insert, but cannot modify tuple. 3. Neigher can cancel nor modify. We call triggers in the number order. INSERT TRIGGERs are implemented in 2nd phase, so we're not afraid of modifing partition keys. (3rd phase will be used for replication trigger.) However, I think generalized one is overkill. A specialized approach would be enough. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Hi, What would probably be helpful here is to take the mess of raw data above and turn it into a simpler partitioning roadmap. Thanks for summarising. Yeah, excellent summary Greg. As you rightly pointed out, partitioning needs a broad roadmap so that the community can contribute in unison. That ways we can in future avoid decent efforts like Manu's which might not bear any fruit because of the prevailing confusion today.. I briefly tried to do that on the thread for Itagaki-san's patch. That's a first stab at things, at least. +1. Itagaki-san's patch seems like a firm foot forward. Regards, Nikhils -- http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Emmanuel Cecchet wrote: Jan, A couple of nitpicks first: o) the route_tuple_to_child recurses to child tables of child tables, which is undocumented and requires a check_stack_depth() call if it's really desirable The recursive call is as deep as the inheritance hierarchy. I am not sure what we are supposed to do if check_stack_depth() fails. I think that check_stack_depth() just throws an elog(ERROR) when the stack depth is exceeded, so you can just add a check_stack_depth() call somewhere at the beginning of route_tuple_to_child and that's it. o) the error messages given when a trigger modifies the tuple should be one sentence, I suggest dropping the Aborting insert part Where are those rules about error messages specified? http://www.postgresql.org/docs/current/static/error-style-guide.html Dropping Aborting insert is just a suggestion, it's possible the error message will sound OK to a native English speaker. o) there are two places with Close the relation but keep the lock comments. Why is in necessary to keep the locks? I confess I don't know why *wouldn't* it be necessary, but maybe the comment could explain that? Or is it just my lack of understanding and it should be obvious that the lock needs to be kept? As we did write to the table, we must maintain the lock on it until the operation or transaction is complete. OK, understood. o) the result of relation_open is explicitly cast to Relation, the result of try_relation_open is not (a minor gripe) The first cast was unnecessary, I removed it. OK. o) the code added in trigger.c (ExecARInsertTriggersNow) is copy/pasted from just above, I guess there was a reason why you needed that code, but I also suspect that's a string indication that something's wrong with the abstractions in your patch. As I explained to Tom, if the after row trigger is called asynchronously I get a relcache leak on the child table at the end of the copy operation. If the trigger is called synchronously (like a before row trigger) it works fine. Also calling the after row trigger synchronously allows me to detect any potential problem between the actions of the trigger and the routing decision. I am open to any suggestion for a more elegant solution. OK, my competence ends here :-) Someone with a better knowledge of the code should comment on that, I certainly don't have a better proposal. Oh, actually, the cache is outright *wrong*, as the attached test6.sql shows. Ugh, let's just forget about that LRU cache for now. Point taken, I have removed the cache from the GUC variables and it is now only used for the duration of the COPY operation. OK, that looks better. o) the patch could use some more docs, especially about descending into child tables. Do you mean an overall comment explaining the design? Otherwise there is a comment for every single 'if' and block of code in the patch. Be more specific if you have a special location where you think comments are missing or too vague. I was thinking more about SGML docs. They could mention that BEFORE triggers are fired both for the parent table and for the child table, while AFTER triggers will only be called on the target table. I'd add a sentence or two explaining what happens if you have a three-level inheritance hierarchy (that the tuple will be inserted in the bottommost table of the hierarchy). o) my main concern is still valid: the design was never agreed upon. I already commented on that part in another message and this is not related to that patch but to the politics of implementing partitioning in Postgres. Now if the rejection of the patch is based on political stances rather than technical once, I can understand that too. Sure, sorry if it sounded harsh. As I said I have virtually no field experience with PG, so I might have a wrong perspective. I also don't feel particulary eligible to judge your approach of handling automatic partitioning designwise. Except for the really minor things like checking stack depth and adding a few sentences to the SGML docs, I think it's time someone more qualified looked at the patch. If you'd like to send a new version, I'll wait for it and mark it as ready for committer review. Thanks for persisting with the patch and sorry for nitpicking so much :-) Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Sun, 22 Nov 2009, Emmanuel Cecchet wrote: As I explained to Tom, if the after row trigger is called asynchronously I get a relcache leak on the child table at the end of the copy operation. If the trigger is called synchronously (like a before row trigger) it works fine. Also calling the after row trigger synchronously allows me to detect any potential problem between the actions of the trigger and the routing decision. I am open to any suggestion for a more elegant solution. Well, I think there are still some issues there that at least need to be better documented. For example, create or replace function fi() returns trigger as ' begin if (NEW.p is not null) then if (select count(*) from i where i.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end; ' language 'plpgsql'; create or replace function fc() returns trigger as ' begin if (NEW.p is not null) then if (select count(*) from c where c.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end; ' language 'plpgsql'; create or replace function fp() returns trigger as ' begin if (NEW.p is not null) then if (select count(*) from p where p.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end; ' language 'plpgsql'; drop table i; drop table c; drop table p cascade; create table i(i int, p int); create trigger tri after insert on i for each row execute procedure fi(); create table c(i int, p int); create trigger trc after insert on c for each row execute procedure fc(); create table p(i int, p int); create table p1 (check (i 0 and i = 10)) inherits (p); create table p2 (check (i 10 and i = 20)) inherits (p); create table p3 (check (i 20 and i = 30)) inherits (p); create trigger trp1 after insert on p1 for each row execute procedure fp(); create trigger trp2 after insert on p2 for each row execute procedure fp(); create trigger trp3 after insert on p3 for each row execute procedure fp(); insert into i values (1,3),(2,1),(3,NULL); copy c from stdin; 1 3 2 1 3 \N \. copy p from stdin with (partitioning); 1 3 2 1 3 \N \. gives me a successful load into i and c, but not into p with the current patch AFAICS while a load where the 3 row is first does load. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Stephan Szabo wrote: On Sun, 22 Nov 2009, Emmanuel Cecchet wrote: As I explained to Tom, if the after row trigger is called asynchronously I get a relcache leak on the child table at the end of the copy operation. If the trigger is called synchronously (like a before row trigger) it works fine. Also calling the after row trigger synchronously allows me to detect any potential problem between the actions of the trigger and the routing decision. I am open to any suggestion for a more elegant solution. Well, I think there are still some issues there that at least need to be better documented. For example, create or replace function fi() returns trigger as ' begin if (NEW.p is not null) then if (select count(*) from i where i.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end; ' language 'plpgsql'; create or replace function fc() returns trigger as ' begin if (NEW.p is not null) then if (select count(*) from c where c.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end; ' language 'plpgsql'; create or replace function fp() returns trigger as ' begin if (NEW.p is not null) then if (select count(*) from p where p.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end; ' language 'plpgsql'; drop table i; drop table c; drop table p cascade; create table i(i int, p int); create trigger tri after insert on i for each row execute procedure fi(); create table c(i int, p int); create trigger trc after insert on c for each row execute procedure fc(); create table p(i int, p int); create table p1 (check (i 0 and i = 10)) inherits (p); create table p2 (check (i 10 and i = 20)) inherits (p); create table p3 (check (i 20 and i = 30)) inherits (p); create trigger trp1 after insert on p1 for each row execute procedure fp(); create trigger trp2 after insert on p2 for each row execute procedure fp(); create trigger trp3 after insert on p3 for each row execute procedure fp(); insert into i values (1,3),(2,1),(3,NULL); copy c from stdin; 1 3 2 1 3 \N \. copy p from stdin with (partitioning); 1 3 2 1 3 \N \. gives me a successful load into i and c, but not into p with the current patch AFAICS while a load where the 3 row is first does load. Well, if you don't insert anything in p (the table, try to avoid using the same name for the table and the column in an example), copy will insert (1,3) in p1 and then the trigger will evaluate select count(*) from p where p.i = NEW.p = NEW.p is 3 and the only p.i available is 1. This should return 0 rows and raise the exception. This seems normal to me. The only reason it works for i is because you inserted the values before the copy. Am I missing something? Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Sun, 22 Nov 2009, Emmanuel Cecchet wrote: Stephan Szabo wrote: On Sun, 22 Nov 2009, Emmanuel Cecchet wrote: As I explained to Tom, if the after row trigger is called asynchronously I get a relcache leak on the child table at the end of the copy operation. If the trigger is called synchronously (like a before row trigger) it works fine. Also calling the after row trigger synchronously allows me to detect any potential problem between the actions of the trigger and the routing decision. I am open to any suggestion for a more elegant solution. Well, I think there are still some issues there that at least need to be better documented. For example, create or replace function fi() returns trigger as ' begin if (NEW.p is not null) then if (select count(*) from i where i.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end; ' language 'plpgsql'; create or replace function fc() returns trigger as ' begin if (NEW.p is not null) then if (select count(*) from c where c.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end; ' language 'plpgsql'; create or replace function fp() returns trigger as ' begin if (NEW.p is not null) then if (select count(*) from p where p.i = NEW.p) = 0 then raise exception ''No parent''; end if; end if; return NEW; end; ' language 'plpgsql'; drop table i; drop table c; drop table p cascade; create table i(i int, p int); create trigger tri after insert on i for each row execute procedure fi(); create table c(i int, p int); create trigger trc after insert on c for each row execute procedure fc(); create table p(i int, p int); create table p1 (check (i 0 and i = 10)) inherits (p); create table p2 (check (i 10 and i = 20)) inherits (p); create table p3 (check (i 20 and i = 30)) inherits (p); create trigger trp1 after insert on p1 for each row execute procedure fp(); create trigger trp2 after insert on p2 for each row execute procedure fp(); create trigger trp3 after insert on p3 for each row execute procedure fp(); insert into i values (1,3),(2,1),(3,NULL); copy c from stdin; 1 3 2 1 3 \N \. copy p from stdin with (partitioning); 1 3 2 1 3 \N \. gives me a successful load into i and c, but not into p with the current patch AFAICS while a load where the 3 row is first does load. Well, if you don't insert anything in p (the table, try to avoid using the same name for the table and the column in an example), copy will insert (1,3) in p1 and then the trigger will evaluate select count(*) from p where p.i = NEW.p = NEW.p is 3 and the only p.i available is 1. This should return 0 rows and raise the exception. This seems normal to me. The only reason it works for i is because you inserted the values before the copy. Am I missing something? I believe so unless I am. There are three separate cases being run for comparison purposes. Multi-row insert on i where an after trigger on i checks the parents within i, a copy on c where an after trigger on c checks the parents within c, a copy on p (with inheritance) where an after trigger on p* checks the parents within the p hierarchy. So, in the case of the multi-row insert, it's inserting (1,3), but it doesn't immediately check, it inserts (2,1) and (3,NULL) before running the checks. The same seems to happen for the base copy. Copy with inheritance seems to be working differently. That may or may not be okay, but if it's different it needs to be prominently mentioned in documentation. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
On Sun, Nov 22, 2009 at 12:35 PM, Jan Urbański wulc...@wulczer.org wrote: I was thinking more about SGML docs. They could mention that BEFORE triggers are fired both for the parent table and for the child table, while AFTER triggers will only be called on the target table. I'd add a sentence or two explaining what happens if you have a three-level inheritance hierarchy (that the tuple will be inserted in the bottommost table of the hierarchy). I have a hard time believing this is OK, even with documentation. While it might be OK in some (many?) particular use cases to fire triggers in this way, making COPY with the partitioning option fire different triggers at different times than COPY without the partitioning option - and in fact every other method of getting data into a table, all of which are consistent with each other and with COPY without the partitioning option - seems like a bad idea to me. I don't think the behavior described above is OK, and I also don't think that the changes in the timing of AFTER-trigger firing are OK. I understand that without that change there was a relcache leak, but I think that just means that bug needs to be found and fixed. I would also like to see some more discussion of the basic mechanism of this patch. Essentially, what it's trying to do is traverse the inheritance hierarchy looking for a table whose constraints match the current tuple, and then inserting the tuple there. First - is this a good idea? It's embeds some assumptions about how inheritance hierarchies are set up which don't seem totally unreasonable, but even so I'm not sure we want to go that route. Second - in lieu of accepting this approach, do we want to wait for Itagaki Takahiro's partitioning syntax patch to go in (as I am hoping that it will) and then do something more structured based on the notation introduced there? One thing that biases me toward thinking that maybe we should wait is the fact that this patch relies on an MRU list to determine into which child table a particular tuple should be inserted. If the constraints on the child tables are not mutually exclusive, the tuple routing won't be deterministic, which seems undesirable to me. On the other hand, if we got rid of the MRU cache and made the order deterministic (say, alphabetical by partition name) then I'm guessing this would be quite slow for large numbers of partitions when most of the tuples need to go into the later partitions. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Emmanuel Cecchet wrote: Hi Jan, Here is the updated patch. Note that the new code in trigger is a copy/paste of the before row insert trigger code modified to use the pointers of the after row trigger functions. Hi, ok, this version applied, compiled and ran the regression tests fine. I tried a few things and was not able to break it this time. A couple of nitpicks first: o) the route_tuple_to_child recurses to child tables of child tables, which is undocumented and requires a check_stack_depth() call if it's really desirable o) the error messages given when a trigger modifies the tuple should be one sentence, I suggest dropping the Aborting insert part o) there are two places with Close the relation but keep the lock comments. Why is in necessary to keep the locks? I confess I don't know why *wouldn't* it be necessary, but maybe the comment could explain that? Or is it just my lack of understanding and it should be obvious that the lock needs to be kept? o) the result of relation_open is explicitly cast to Relation, the result of try_relation_open is not (a minor gripe) And a couple of more important things: o) the code added in trigger.c (ExecARInsertTriggersNow) is copy/pasted from just above, I guess there was a reason why you needed that code, but I also suspect that's a string indication that something's wrong with the abstractions in your patch. Again I don't really know how else you could achieve what you want. It just looks fishy if you need to modify trigger.c to add an option to COPY. o) publicizing ExecRelCheck might also indicate a problem, but I guess that can be defended, as the patch is basically based on using that function for each incoming tuple o) the LRU OID cache is a separate optimisation that could be separated from the patch. I didn't do any performance tests, and I trust that a cache like that helps with some workloads, but I think we could do a better effort that a simplistic cache like that. Also, I'm not 100% sure it's OK to just stick it into CacheMemoryContext... Maybe it could go into the COPY statement context? You said you don't want to start with a cold cache always, but OTOH if you're loading into different tables in the same backend, the cache will actually hurt... [thinks of something really bad... types up a quick test...] Oh, actually, the cache is outright *wrong*, as the attached test6.sql shows. Ugh, let's just forget about that LRU cache for now. o) the patch could use some more docs, especially about descending into child tables. o) my main concern is still valid: the design was never agreed upon. The approach of using inheritance info for automatic partitioning is, at least IMHO, too restricted. Regular INSERTs won't get routed to child tables. Data from writable CTEs won't get routed. People wanting to do partitioning on something else that constraints are stuffed. I strongly suspect the patch will get rejected on the grounds of lack of community agreement on partitioning, but I'd hate to see your work wasted. It's not too late to open a discussion on how automatic partitioning could work (or start working out a common proposal with the people discussing in the Syntax for partitioning thread). Marking as Waiting on Author, although I'd really like to see a solid design being agreed upon, and then the code. Cheers, Jan drop table parent cascade; drop table parent2 cascade; create table parent(i int); create table c1 (check (i 0 and i = 1)) inherits (parent); create table parent2(i int); create table c12 (check (i 0 and i = 1)) inherits (parent2); set copy_partitioning_cache_size = 1; copy parent from stdin with (partitioning); 1 \. copy parent2 from stdin with (partitioning); 1 \. -- all tuples went to parent ! select * from parent; -- is empty select * from parent2; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Jan Urbański wrote: o) my main concern is still valid: the design was never agreed upon. The approach of using inheritance info for automatic partitioning is, at least IMHO, too restricted. Regular INSERTs won't get routed to child tables. Data from writable CTEs won't get routed. People wanting to do partitioning on something else that constraints are stuffed. Whether or not the other paths to load data are supported, COPY is the one you have to get right before this sort of feature is useful to the sort of use-cases that need partitioning the most. While your concerns are valid, I hope Emmanuel doesn't take your feedback the wrong way. I'll take a working prototype that needs improvement over a paper design with no implementation anytime. He's coming at this bottom-up starting with the little details that needs to be right, the partitioning syntax patch is starting at the top and working downward, there seems to be clear progress being made toward the eventual goal somewhere in the middle of all that here. Considering how long this area has been bogged down in discussion without action, I'm rather glad we're seeing working proof of concepts bits rather than just talking about things. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Jan Urbański wrote: o) my main concern is still valid: the design was never agreed upon. The approach of using inheritance info for automatic partitioning is, at least IMHO, too restricted. Regular INSERTs won't get routed to child tables. Data from writable CTEs won't get routed. People wanting to do partitioning on something else that constraints are stuffed. Well, this patch does not claim to implement partitioning for Postgres, it just offers partitioning as an option for COPY (and COPY only) based on the existing mechanism in Postgres. I have already participated in lengthy and relatively sterile discussions on how to implement a full-blown partitioning but we never reached the beginning of an agreement and it was decided that a step-by-step approach would be better. I will propose another implementation of partitioning in COPY once Postgres has another representation than constraints on child tables to implement it. I strongly suspect the patch will get rejected on the grounds of lack of community agreement on partitioning, but I'd hate to see your work wasted. It's not too late to open a discussion on how automatic partitioning could work (or start working out a common proposal with the people discussing in the Syntax for partitioning thread). This is not my call. Right now the syntax for partitioning does not change anything to Postgres, it just adds syntactic sugar on top of the existing implementation. It will not route anything or answer any of the needs you mentioned in your previous point. Marking as Waiting on Author, although I'd really like to see a solid design being agreed upon, and then the code. You are asking the wrong person if you want me to lead the partitioning design discussions. I already tried once and I was unsuccessful. As nothing as changed I don't see why I would be more successful this time. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Emmanuel Cecchet wrote: Hi Jan, Here is a new version of the patch with the following modifications: - used oid list from pg_list.h - properly handles triggers and generate an error if needed (updated doc as well) - added your test cases + extra bad trigger cases Hi, that got broken by the WHEN triggers patch (c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the TriggerEnabled function signature, the code currently does not compile. I'll continue reading, in the meantime could you send a updated patch? Thanks, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: that got broken by the WHEN triggers patch (c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the TriggerEnabled function signature, the code currently does not compile. [ squint... ] What is that patch doing touching the innards of trigger.c in the first place? I can't see any reason for trigger.c to be associated with partitioning. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: that got broken by the WHEN triggers patch (c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the TriggerEnabled function signature, the code currently does not compile. [ squint... ] What is that patch doing touching the innards of trigger.c in the first place? I can't see any reason for trigger.c to be associated with partitioning. The problem I had is that if I used the standard trigger mechanism for after row inserts on a child table where the trigger is called asynchronously, I had a relcache leak on the child table. I tried to ask for help on that earlier on but it got lost with other discussions on the patch. So I tried to call the after trigger synchronously on the child table and it worked. So the patch is just adding a synchronous call to after row insert triggers that is called when the tuple is moved to a child table (also allows to detect for triggers that are messing with the routing). I would be happy to follow any recommendation for a more elegant solution to the problem. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: Program received signal SIGSEGV, Segmentation fault. 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040, tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821 1821child_relation_id = child_oid_cell-oid_value; (gdb) p child_oid_cell $1 = (OidCell *) 0x7f7f7f7f This looks like the patch is trying to create a data structure in a memory context that's not sufficiently long-lived for the use of the structure. If you do this in a non-cassert build, it will seem to work, some of the time, if the memory in question happens to not get reallocated to something else. I was using the CacheMemoryContext. Could someone tell me why this is wrong and what should have been the appropriate context to use? Thanks Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Emmanuel Cecchet m...@asterdata.com writes: Tom Lane wrote: This looks like the patch is trying to create a data structure in a memory context that's not sufficiently long-lived for the use of the structure. If you do this in a non-cassert build, it will seem to work, some of the time, if the memory in question happens to not get reallocated to something else. I was using the CacheMemoryContext. Could someone tell me why this is wrong and what should have been the appropriate context to use? Well, (a) I doubt you really were creating the list in CacheMemoryContext, else it'd have not gotten clobbered; (b) creating statement-local data structures in CacheMemoryContext is entirely unacceptable anyway, because then they represent a permanent memory leak. The right context for statement-lifetime data structures is generally the CurrentMemoryContext the statement code is called with. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Tom Lane wrote: Emmanuel Cecchet m...@asterdata.com writes: Tom Lane wrote: This looks like the patch is trying to create a data structure in a memory context that's not sufficiently long-lived for the use of the structure. If you do this in a non-cassert build, it will seem to work, some of the time, if the memory in question happens to not get reallocated to something else. I was using the CacheMemoryContext. Could someone tell me why this is wrong and what should have been the appropriate context to use? Well, (a) I doubt you really were creating the list in CacheMemoryContext, else it'd have not gotten clobbered; (b) creating statement-local data structures in CacheMemoryContext is entirely unacceptable anyway, because then they represent a permanent memory leak. Well I thought that this code would do it: child_table_lru = (OidLinkedList *)MemoryContextAlloc( + CacheMemoryContext, sizeof(OidLinkedList)); ... + /* Add the new entry in head of the list */ + new_head = (OidCell *) MemoryContextAlloc( + CacheMemoryContext, sizeof(OidCell)); The right context for statement-lifetime data structures is generally the CurrentMemoryContext the statement code is called with. Actually the list is supposed to stay around between statement executions. You don't want to restart with a cold cache at every statement so I really want this structure to stay in memory at a more global level. Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Emmanuel Cecchet m...@asterdata.com writes: Actually the list is supposed to stay around between statement executions. You don't want to restart with a cold cache at every statement so I really want this structure to stay in memory at a more global level. Cache? Why do you need a cache for COPY? Repeated bulk loads into the same table within a single session doesn't seem to me to be a case that is common enough to justify a cache. (BTW, the quoted code seems to be busily reinventing OID Lists. Don't do that.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Tom Lane wrote: Emmanuel Cecchet m...@asterdata.com writes: Actually the list is supposed to stay around between statement executions. You don't want to restart with a cold cache at every statement so I really want this structure to stay in memory at a more global level. Cache? Why do you need a cache for COPY? Repeated bulk loads into the same table within a single session doesn't seem to me to be a case that is common enough to justify a cache. Actually the cache is only activated if you use the partitioning option. It is just a list of oids of child tables where tuples were inserted. It is common to have multiple COPY operations in the same session when you are doing bulk loading in a warehouse. (BTW, the quoted code seems to be busily reinventing OID Lists. Don't do that.) Yes, I understood that I should use an OidList instead. But I was trying to understand what I did wrong here (besides reinventing the oid list ;-)). Why do I get this segfault if I use memory from CacheMemoryContext? Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Emmanuel Cecchet m...@asterdata.com writes: Tom Lane wrote: Cache? Why do you need a cache for COPY? Actually the cache is only activated if you use the partitioning option. It is just a list of oids of child tables where tuples were inserted. Umm ... why is that useful enough to be cached? Why do I get this segfault if I use memory from CacheMemoryContext? Well, CacheMemoryContext will never be reset, so either you freed the data structure yourself or there's something wrong with the pointer you think is pointing at the data structure ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Hi, I'll hopefully look at the next version of the patch tommorrow. Emmanuel Cecchet wrote: o test1.sql always segfaults for me, poking around with gdb suggests it's a case of an uninitialised cache list (another reason to use the builtin one). I was never able to reproduce that problem. I don't know where this comes from. I have integrated your tests in the regression test suite and I was never able to reproduce the segfault you mentioned. What platform are you using? In the meantime I tried the test1.sql file again and it still segfaulted for me. I'm using 32bit Linux, PG compiled with: $ ./configure CFLAGS=-O0 --enable-cassert --enable-debug --without-perl --without-python --without-openssl --without-tcl and then I start postmaster, fire up psql, attach gdb to the backend, do \i test1.sql and get: Program received signal SIGSEGV, Segmentation fault. 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040, tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821 1821child_relation_id = child_oid_cell-oid_value; (gdb) bt #0 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040, tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821 #1 0x081950e3 in CopyFrom (cstate=0x871e0dc) at copy.c:2480 #2 0x08192532 in DoCopy (stmt=0x86fb144, queryString=0x86fa73c copy parent from stdin with (partitioning);) at copy.c:1227 (gdb) p child_oid_cell $1 = (OidCell *) 0x7f7f7f7f (gdb) p child_oid_cell-oid_value Cannot access memory at address 0x7f7f7f7f That 0x7f7f7f7f looks like clobbered memory, the memory management funcs do that when cassert is enabled, IIRC. Cheers, Jan -- Jan Urbanski GPG key ID: E583D7D2 ouden estin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: Program received signal SIGSEGV, Segmentation fault. 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040, tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821 1821child_relation_id = child_oid_cell-oid_value; (gdb) p child_oid_cell $1 = (OidCell *) 0x7f7f7f7f This looks like the patch is trying to create a data structure in a memory context that's not sufficiently long-lived for the use of the structure. If you do this in a non-cassert build, it will seem to work, some of the time, if the memory in question happens to not get reallocated to something else. A good rule of thumb is to never do code development in a non-cassert build. You're just setting yourself up for failure. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Tom Lane wrote: A good rule of thumb is to never do code development in a non-cassert build. And the same rule goes for review, too; I'll update the review guidelines to spell that out more clearly. Basically, if you're doing any work on new code, you should have cassert turned on, *except* if you're doing performance testing. The asserts slow things down enough (particularly with large shared_buffers values) to skew performance tests, but in all other coding situations you should have them enabled. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Emmanuel Cecchet wrote: Hi all, Hi!, partitioning option for COPY Here's the review: == Submission == The patch is contextual, applies cleanly to current HEAD, compiles fine. The docs build cleanly. == Docs == They're reasonably clear, although they still mention ERROR_LOGGING, which was taken out of this patch. They could use some wordsmithing, but I didn't go into details, as there were more severe issues with the patch. One thing that made me cautious was the mention that triggers modifying tuples will make random errors appear. As is demonstrated later, triggers are a big issue. == Regression tests == They ran fine, there's one additional regression test that exercises the new option. == Style/nitpicks == Minor gripes include: o instead of using an ad-hoc data structure for the LRU cache list, I'd suggest an OidList from pg_list.h. o some mentions of method in comments should be changed to function o trailing whitespace in the patch (it's not the end of the world, of course) == Issues == Attached are 3 files that demonstrate problems the patch has. o test1.sql always segfaults for me, poking around with gdb suggests it's a case of an uninitialised cache list (another reason to use the builtin one). o test2.sql demonstrates, that indices on child tables are not being updated, probably because after resultRelInfo in check_tuple_constraints() gets created is never has ri_NumIndices set, and so the code that was supposed to take care of indices is never called. Looks like a copy-paste error. o test3.sql demonstrates, that some triggers that I would expect to be fired are in fact not fired. I guess it's the same reason as mentioned: ri_TrigDesc never gets set, so the code that calls triggers is dead. I stopped there, because unfortunately, apart from all that there's one fundamental problem with this patch, namely we probably don't want it. As it stands it's more of a proof of concept than a really usable solution, it feels like built from spare (copied from around copy.c) parts. IMHO it's much too narrow for a general partitioning solution, even if the design it's based upon would be accepted. It's assuming a lot of things about the presence of child tables (with proper constraints), the absence of triggers, and so on. Granted, it solves a particular problem (bulk loading into a partitioned table, with not extra features like triggers and with standard inheritance/exclusive check constraints setup), but that's not good enough in my opinion, even if all other issues would be addressed. Now I'm not a real Postgres user, it's been a while since I worked in a PG shop (or a DB shop for that matter), but from what I understand from following this community for a while, a patch like that doesn't have a lot of chances to be committed. That said, my puny experience with real PG installations and their needs must be taken into account here. I'll mark this patch as Waiting on Author, but I have little doubt that even after fixing those probably trivial segfaults etc. the patch would be promptly rejected by a committer. I suggest withdrawing it from this commitfest and trying to work out a more complete design first that would address the needs of a bigger variety of users, or joining some of the already underway efforts to bring full-featured partitioning into Postgres. Best, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Jan Urbański wrote: Emmanuel Cecchet wrote: Hi all, Hi!, partitioning option for COPY Attached are 3 files that demonstrate problems the patch has. And the click-before-you-think prize winner is... me. Test cases attached, see the comments for expected/actual results. Jan -- segfaults, probably uninitialised cache oid list -- disabling cache fixes it -- set copy_partitioning_cache_size = 0; drop table parent cascade; create table parent(i int); create table c1 (check (i 0 and i = 1)) inherits (parent); copy parent from stdin with (partitioning); 1 \. drop table parent cascade; create table parent(i int); create table c1 (check (i 0 and i = 1)) inherits (parent); copy parent from stdin with (partitioning); 1 \. set copy_partitioning_cache_size = 0; drop table parent cascade; create table parent(i int, j int); create table c1 (check (i 0 and i = 1)) inherits (parent); create table c2 (check (i 1 and i = 2)) inherits (parent); create table c3 (check (i 2 and i = 3)) inherits (parent); create index c1_idx on c1(j); copy (select i % 3 + 1, i from generate_series(1, 1000) s(i)) to '/tmp/parent'; copy parent from '/tmp/parent' with (partitioning); analyse; set enable_seqscan to false; -- no rows, index was not updated select * from c1 where j = 3; set enable_seqscan to true; set enable_indexscan to false; -- some rows select * from c1 where j = 3; set copy_partitioning_cache_size = 0; drop table parent cascade; drop table audit cascade; drop function audit(); create table parent(i int); create table c1 (check (i 0 and i = 1)) inherits (parent); create table c2 (check (i 1 and i = 2)) inherits (parent); create table c3 (check (i 2 and i = 3)) inherits (parent); create table audit(i int); create function audit() returns trigger as $$ begin insert into audit(i) values (new.i); return new; end; $$ language plpgsql; create trigger parent_a after insert on parent for each row execute procedure audit(); -- the before trigger on the parent would get fired -- create trigger parent_a2 before insert on parent for each row execute procedure audit(); create trigger c1_a before insert on c1 for each row execute procedure audit(); create trigger c1_a2 after insert on c1 for each row execute procedure audit(); copy parent from stdin with (partitioning); 1 2 3 \. -- no rows select * from audit; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Emmanuel Cecchet m...@asterdata.com wrote: I have extracted the partitioning option for COPY (removed the error logging part) from the previous patch. We can use an INSERT trigger to route tuples into partitions even now. Why do you need an additional router for COPY? Also, it would be nicer that the router can works not only in COPY but also in INSERT. BTW, I'm working on meta data of partitioning now. Your partitioning option in COPY could be replaced with the catalog. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Hi, I have extracted the partitioning option for COPY (removed the error logging part) from the previous patch. We can use an INSERT trigger to route tuples into partitions even now. Why do you need an additional router for COPY? Tom has already explained on the list why using a trigger was a bad idea (and I know we can use a trigger since I am the one who wrote it). If you look at the code you will see that you can do optimizations in the COPY code that you cannot do in the trigger. Also, it would be nicer that the router can works not only in COPY but also in INSERT. As 8.5 will at best provide a syntactic hack on top of the existing constraint implementation, I think that it will not hurt to have routing in COPY since we will not have it anywhere otherwise. BTW, I'm working on meta data of partitioning now. Your partitioning option in COPY could be replaced with the catalog. This implementation is only for the current 8.5 and it will not be needed anymore once we get a fully functional partitioning in Postgres which seems to be for a future version. Best regards, Emmanuel -- Emmanuel Cecchet Aster Data Web: http://www.asterdata.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
Emmanuel Cecchet m...@asterdata.com wrote: If you look at the code you will see that you can do optimizations in the COPY code that you cannot do in the trigger. Since the optimizations is nice, I hope it will work not only in COPY but also in INSERT. An idea is moving the partitioning cache into Relation cache, and also moving the routing routines into heap_insert(). My concern is just in the modified position; I think you don't have to change your logic itself. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partitioning option for COPY
We can use an INSERT trigger to route tuples into partitions even now. Why do you need an additional router for COPY? Also, it would be nicer that the router can works not only in COPY but also in INSERT. Yeah, but performance on an insert trigger is impractical for large volumes of data. --Josh Berkus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers