Re: [HACKERS] referential Integrity and SHARE locks
Added to TODO: * Allow UPDATEs on only non-referential integrity columns not to conflict with referential integrity locks http://archives.postgresql.org/pgsql-hackers/2007-02/msg00073.php --- Jan Wieck wrote: On 2/8/2007 2:46 PM, Marc Munro wrote: On Thu, 2007-08-02 at 14:33 -0500, Tom Lane wrote: Marc Munro [EMAIL PROTECTED] writes: Yes in this case, T1 must abort because the record it was going to update has disappeared from underneath it. I don't see how this is significantly different from the same race for the record if the table had no RI constraints. The only difference that I can see, is that T1 now has some locks that it must relinquish as the transaction aborts. No, the difference is there would have been no error at all before; if the record were deleted before T1 got to it then it wouldn't have attempted to update it. I really don't think you can make it work to perform updates or deletes on a record you have not yet locked. The record would be locked before the update or delete is attempted, however it would not be locked until the referential integrity constraints have succeeded in acquiring their locks. It is becoming clear to me that I am missing something but I still don't know what it is. If anyone can see it and explain it I'd really appreciate it. I think you are missing the fact that the exclusive row lock on UPDATE is taken before any triggers are fired at all, even BEFORE ROW triggers. This is necessary in order to prevent the row being updated or removed concurrently while the triggers are executing. Since BEFORE ROW triggers can modify the content of the row (including the foreign key), the RI check and lock of the referenced row cannot happen before other BR triggers are completed. In order to make your idea fly, the RI check trigger on INSERT or UPDATE would have to be fired before taking the row lock considering the NEW values for referencing columns as they are thus far. Since the row isn't locked at this time, it can change or disappear while the RI trigger is executing, so the check and lock has to be redone later with the actual row that got locked and after all BR triggers are done with it. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] referential Integrity and SHARE locks
On 2/8/2007 2:46 PM, Marc Munro wrote: On Thu, 2007-08-02 at 14:33 -0500, Tom Lane wrote: Marc Munro [EMAIL PROTECTED] writes: Yes in this case, T1 must abort because the record it was going to update has disappeared from underneath it. I don't see how this is significantly different from the same race for the record if the table had no RI constraints. The only difference that I can see, is that T1 now has some locks that it must relinquish as the transaction aborts. No, the difference is there would have been no error at all before; if the record were deleted before T1 got to it then it wouldn't have attempted to update it. I really don't think you can make it work to perform updates or deletes on a record you have not yet locked. The record would be locked before the update or delete is attempted, however it would not be locked until the referential integrity constraints have succeeded in acquiring their locks. It is becoming clear to me that I am missing something but I still don't know what it is. If anyone can see it and explain it I'd really appreciate it. I think you are missing the fact that the exclusive row lock on UPDATE is taken before any triggers are fired at all, even BEFORE ROW triggers. This is necessary in order to prevent the row being updated or removed concurrently while the triggers are executing. Since BEFORE ROW triggers can modify the content of the row (including the foreign key), the RI check and lock of the referenced row cannot happen before other BR triggers are completed. In order to make your idea fly, the RI check trigger on INSERT or UPDATE would have to be fired before taking the row lock considering the NEW values for referencing columns as they are thus far. Since the row isn't locked at this time, it can change or disappear while the RI trigger is executing, so the check and lock has to be redone later with the actual row that got locked and after all BR triggers are done with it. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] referential Integrity and SHARE locks
Oops, forgot to include pgsql-hackers when I responded to this the first time. On Tue, 2007-06-02 at 20:53 -0500, Tom Lane wrote: Marc Munro [EMAIL PROTECTED] writes: The RI triggers currently fire when a record is updated. Under my proposal they would fire in the same way but before the record is locked rather than after. Or am I missing your point? IOW, some other transaction could update or delete the tuple meanwhile? Doesn't seem very promising. That other transaction, T1, would have run the same RI triggers and so would have the same parent records locked. The blocked transaction, T2, once T1 has committed, would fail. I don't see this as being much different from the current case, where T1 locks and deletes or updates a row, and T2 then tries to manipulate the same row. In both cases, locks manage the race for the row, and MVCC ensures that T2 fails. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] referential Integrity and SHARE locks
On Thu, 8 Feb 2007, Marc Munro wrote: Oops, forgot to include pgsql-hackers when I responded to this the first time. On Tue, 2007-06-02 at 20:53 -0500, Tom Lane wrote: Marc Munro [EMAIL PROTECTED] writes: The RI triggers currently fire when a record is updated. Under my proposal they would fire in the same way but before the record is locked rather than after. Or am I missing your point? IOW, some other transaction could update or delete the tuple meanwhile? Doesn't seem very promising. That other transaction, T1, would have run the same RI triggers and so would have the same parent records locked. That's not true in the case of delete, since the referencing table triggers are on insert and update. Second, the parent record locks are not exclusive which means that both can be granted, so I don't see how this stops the second from continuing before the first. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] referential Integrity and SHARE locks
On Thu, 2007-08-02 at 10:06 -0800, Stephan Szabo wrote: On Thu, 8 Feb 2007, Marc Munro wrote: . . . That other transaction, T1, would have run the same RI triggers and so would have the same parent records locked. That's not true in the case of delete, since the referencing table triggers are on insert and update. . . . Let me see if I have this scenario right: Transaction T1 updates child record C1, with RI causing the parent P1 to be locked before the child. In the meantime transaction T2, successfully deletes C1 as it has not yet been locked. (Please tell me if I have misunderstood what you are saying) Yes in this case, T1 must abort because the record it was going to update has disappeared from underneath it. I don't see how this is significantly different from the same race for the record if the table had no RI constraints. The only difference that I can see, is that T1 now has some locks that it must relinquish as the transaction aborts. . . . Second, the parent record locks are not exclusive which means that both can be granted, so I don't see how this stops the second from continuing before the first. I don't think this does stop the second from continuing before the first. What will stop it, is the eventual lock that is taken on the child (triggering) record. I am not proposing reducing the number of locks taken, but rather changing the order in which the locks are taken. concerned frown What am I missing? /concerned frown __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] referential Integrity and SHARE locks
Marc Munro [EMAIL PROTECTED] writes: Yes in this case, T1 must abort because the record it was going to update has disappeared from underneath it. I don't see how this is significantly different from the same race for the record if the table had no RI constraints. The only difference that I can see, is that T1 now has some locks that it must relinquish as the transaction aborts. No, the difference is there would have been no error at all before; if the record were deleted before T1 got to it then it wouldn't have attempted to update it. I really don't think you can make it work to perform updates or deletes on a record you have not yet locked. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] referential Integrity and SHARE locks
On Thu, 2007-08-02 at 14:33 -0500, Tom Lane wrote: Marc Munro [EMAIL PROTECTED] writes: Yes in this case, T1 must abort because the record it was going to update has disappeared from underneath it. I don't see how this is significantly different from the same race for the record if the table had no RI constraints. The only difference that I can see, is that T1 now has some locks that it must relinquish as the transaction aborts. No, the difference is there would have been no error at all before; if the record were deleted before T1 got to it then it wouldn't have attempted to update it. I really don't think you can make it work to perform updates or deletes on a record you have not yet locked. The record would be locked before the update or delete is attempted, however it would not be locked until the referential integrity constraints have succeeded in acquiring their locks. It is becoming clear to me that I am missing something but I still don't know what it is. If anyone can see it and explain it I'd really appreciate it. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] referential Integrity and SHARE locks
On Thu, 8 Feb 2007, Marc Munro wrote: On Thu, 2007-08-02 at 10:06 -0800, Stephan Szabo wrote: On Thu, 8 Feb 2007, Marc Munro wrote: . . . That other transaction, T1, would have run the same RI triggers and so would have the same parent records locked. That's not true in the case of delete, since the referencing table triggers are on insert and update. . . . Let me see if I have this scenario right: Transaction T1 updates child record C1, with RI causing the parent P1 to be locked before the child. In the meantime transaction T2, successfully deletes C1 as it has not yet been locked. (Please tell me if I have misunderstood what you are saying) Yes in this case, T1 must abort because the record it was going to update has disappeared from underneath it. I don't see how this is significantly different from the same race for the record if the table had no RI constraints. The only difference that I can see, is that T1 now has some locks that it must relinquish as the transaction aborts. . . . Second, the parent record locks are not exclusive which means that both can be granted, so I don't see how this stops the second from continuing before the first. I don't think this does stop the second from continuing before the first. What will stop it, is the eventual lock that is taken on the child (triggering) record. But at that point, you've already had to compose the new row in order to call the trigger for the ri check, right? So, one of those new rows will be out of date by the time it actually gets the lock. Presumably that means that you need to recalculate the new row, but you've already done a check and gotten a lock based on the old new row. Also, another big problem is the fact that SQL requires that the action already have happened before the check in cases where the constraint references the same table. The row being updated or inserted might reference a row that will be updated or inserted by a later action of the same statement. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] referential Integrity and SHARE locks
On Thu, 2007-08-02 at 12:24 -0800, Stephan Szabo wrote: On Thu, 8 Feb 2007, Marc Munro wrote: I don't think this does stop the second from continuing before the first. What will stop it, is the eventual lock that is taken on the child (triggering) record. But at that point, you've already had to compose the new row in order to call the trigger for the ri check, right? So, one of those new rows will be out of date by the time it actually gets the lock. Presumably that means that you need to recalculate the new row, but you've already done a check and gotten a lock based on the old new row. Yes. That is tricky. For my proposed scheme to work, I guess we'd have to be able to drop those locks which were just acquired by the RI triggers. Not too simple, I guess. Also, another big problem is the fact that SQL requires that the action already have happened before the check in cases where the constraint references the same table. The row being updated or inserted might reference a row that will be updated or inserted by a later action of the same statement. Hmmm. That does seem to be the final nail in the coffin. Consider the proposal withdrawn, and thanks for explaining it all to me. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Referential Integrity and SHARE locks
On Mon, 2007-02-05 at 23:25 +, Gregory Stark wrote: Gregory Stark [EMAIL PROTECTED] writes: Bruce Momjian [EMAIL PROTECTED] writes: OK, please propose some wording so at least we can get agreement on that. How about something open-ended like arrange for updates that do not update columns referenced by foreign keys from other tables to avoid being blocked by locks from concurrent RI checks Hum. Reading back in the thread it seems what I wrote is basically equivalent to the wording Simon originally proposed. I like your wording. It's clearer and includes Stephan's clarification. Some minor mods... TODO avoid blocking of updates because of concurrent RI checks when those updates do not alter columns referenced by foreign keys from other tables -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Referential Integrity and SHARE locks
Simon Riggs wrote: On Sat, 2007-02-03 at 09:43 -0800, Stephan Szabo wrote: On Sat, 3 Feb 2007, Simon Riggs wrote: On Fri, 2007-02-02 at 16:50 -0500, Tom Lane wrote: No, I don't. I think knowledge of which columns are in a PK is quite a few levels away from the semantics of row locking. To point out just one problem, what happens when you add or drop a PK? Or drop and replace with a different column set? Yes, I know dropping one requires exclusive lock on the table, but the transaction doing it could hold row locks within the table, and now it's very unclear what they mean. There are issues, yes. Dropping PKs is a very irregular occurrence nor is it likely to be part of a complex transaction. It wouldn't bother me to say that if a transaction already holds a RowExclusiveLock or a RowShareLock it cannot upgrade to an AccessExclusiveLock. The lock check seems like a strange constraint, given that it's not necessarily going to be anything that conflicts with the row locks. I'm not sure there'd be a better idea given this sort of scheme, but it still seems strange. The TODO I was requesting you consider was this: Develop non-conflicting locking scheme to allow RI checks to co-exist peacefully with non-PK UPDATEs on the referenced table. That is, IMHO, a general statement of an important unresolved issue with our Referential Integrity implementation. That is in no way intended as any form of negative commentary on the excellent detailed work that has got us so far already. Well, if we really want to solve that completely then we really need column locking, or at least locking at the level of arbitrary (possibly overlapping) unique constraints, not just the PK because foreign keys don't necessarily reference the primary key. But the PK case is certainly the most common and it'd certainly be nice to cover that case. IMHO generic column level locking would hardly ever be used. Locking for RI seems to be 99% of the use case, which means we'd be OK if we found a way of only locking an arbitary number of unique col groups. By definition, each of these column groups is covered by a unique index. Not saying this'll gain us anything but... It has ocurred to me that the lock could be reduced in another way. If we had an immutable constraint that could be applied to pkey-columns then we wouldn't have to worry about updates at all. If a pkey value was there before an update, it would be there after too. The only thing you'd need to prevent would be deletes. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] referential Integrity and SHARE locks
Simon Riggs started this thread with the question: . . . Why do we need a SHARE lock at all, on the **referenc(ed)** table? . . . The root problem addressed by this thread seems to be that using share locks in this way increases the likelihood of deadlock, and causes blocking when no blocking is actually needed. I would like to make a few observations leading to two alternative proposals for dealing with this issue. Deadlocks arise because of differences in the order in which locks are taken. If we have a parent table P, and a child C, and we modify two children of the same P, locks will be taken in the order C1, P, C2. Another process modifying only C2, will cause locks to be taken in the order C2, P, leading to the possibility of deadlock. With the current system of RI, this sort of deadlock arises far too easily with the result that RI is often disabled. It is solely the order in which the locks are taken that causes the problem. If the RI constraints could lock the parent records before locking the child, the possibility of deadlock would be much reduced. Proposal 1: Alter the way RI triggers fire, so that they complete before locking the row against which they fire. Having a background in Oracle, I found myself considering how this is not usually a problem with Oracle databases. If I understand it correctly, in Oracle the referential integrity constraints are implemented by locking the index associated with the constraint, rather than the records themselves. Proposal 2: Lock the index associated with the parent record, rather than the parent record itself. Updates to indexed fields, and deletions of records would need to also take such locks, but this should be enough to allow non-referenced fields to be updated in a parent, even while transactions are modifying its children. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] referential Integrity and SHARE locks
Marc Munro [EMAIL PROTECTED] writes: Proposal 1: Alter the way RI triggers fire, so that they complete before locking the row against which they fire. It's kind of hard to know what records the user will choose to update before he actually does the update... Proposal 2: Lock the index associated with the parent record, rather than the parent record itself. That doesn't help in our case because each version of a record has an index entry. So even updates to unrelated fields imply index modifications. Worse, deleting and updating don't remove the old index entries so even if you've locked them you won't prevent people from doing exactly those operations you're trying to avoid. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] referential Integrity and SHARE locks
On Tue, 2007-06-02 at 23:47 +, Gregory Stark wrote: Marc Munro [EMAIL PROTECTED] writes: Proposal 1: Alter the way RI triggers fire, so that they complete before locking the row against which they fire. It's kind of hard to know what records the user will choose to update before he actually does the update... The RI triggers currently fire when a record is updated. Under my proposal they would fire in the same way but before the record is locked rather than after. Or am I missing your point? Proposal 2: Lock the index associated with the parent record, rather than the parent record itself. That doesn't help in our case because each version of a record has an index entry. So even updates to unrelated fields imply index modifications. Worse, deleting and updating don't remove the old index entries so even if you've locked them you won't prevent people from doing exactly those operations you're trying to avoid. I guess my proposal was incomplete. Obviously, before deleting, or updating an indexed column, a lock would have to be taken on the index. I believe this would suffice to guarantee referential integrity without blocking updates that leave the referred indexes unchanged. What you say about each version of a record having an index entry confuses me. I thought there was one index entry that lead to a chain of tuples. If this is not the case, I don't see how the current exclusive locks on indexes work to enforce uniqueness. Could you point me to somewhere in the code or the documentation that explains this? It still seems to me that if we can lock an index entry to guarantee uniqueness, we can also lock it to implement RI constraints. __ Marc signature.asc Description: This is a digitally signed message part
Re: [HACKERS] referential Integrity and SHARE locks
Marc Munro [EMAIL PROTECTED] writes: The RI triggers currently fire when a record is updated. Under my proposal they would fire in the same way but before the record is locked rather than after. Or am I missing your point? IOW, some other transaction could update or delete the tuple meanwhile? Doesn't seem very promising. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Referential Integrity and SHARE locks
On Sat, 3 Feb 2007, Simon Riggs wrote: There are issues, yes. Dropping PKs is a very irregular occurrence nor is it likely to be part of a complex transaction. It wouldn't bother me to say that if a transaction already holds a RowExclusiveLock or a RowShareLock it cannot upgrade to an AccessExclusiveLock. Actually, since rearranging PKs is such a drastic change I would expect them only to be part of a large complex transaction. I know for apps I work on it would be part of a single transaction script that updated whole chunks of data and schema. Kris Jurka ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Referential Integrity and SHARE locks
On Sun, 2007-02-04 at 09:38 +, Simon Riggs wrote: The TODO I was requesting you consider was this: Develop non-conflicting locking scheme to allow RI checks to co-exist peacefully with non-PK UPDATEs on the referenced table. That is, IMHO, a general statement of an important unresolved issue with our Referential Integrity implementation. That is in no way intended as any form of negative commentary on the excellent detailed work that has got us so far already. Well, if we really want to solve that completely then we really need column locking, or at least locking at the level of arbitrary (possibly overlapping) unique constraints, not just the PK because foreign keys don't necessarily reference the primary key. But the PK case is certainly the most common and it'd certainly be nice to cover that case. ... It occurs to me that if we had visibility in unique indexes, this would allow the index rows to be separately lockable to the main row. That's exactly what we need here. I've implemented a work-around using this principle, utilising RULEs and a duplicated PK column-only table. This still allows FK checks to work correctly, yet doesn't require the backend hack Csaba mentioned. My feeling is that more work in this area is required, even if we can't yet agree a TODO item. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Referential Integrity and SHARE locks
Simon Riggs wrote: It occurs to me that if we had visibility in unique indexes, this would allow the index rows to be separately lockable to the main row. That's exactly what we need here. I've implemented a work-around using this principle, utilising RULEs and a duplicated PK column-only table. This still allows FK checks to work correctly, yet doesn't require the backend hack Csaba mentioned. My feeling is that more work in this area is required, even if we can't yet agree a TODO item. OK, please propose some wording so at least we can get agreement on that. -- Bruce Momjian [EMAIL PROTECTED] EnterpriseDBhttp://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Referential Integrity and SHARE locks
On Mon, 5 Feb 2007, Simon Riggs wrote: On Sun, 2007-02-04 at 09:38 +, Simon Riggs wrote: The TODO I was requesting you consider was this: Develop non-conflicting locking scheme to allow RI checks to co-exist peacefully with non-PK UPDATEs on the referenced table. That is, IMHO, a general statement of an important unresolved issue with our Referential Integrity implementation. That is in no way intended as any form of negative commentary on the excellent detailed work that has got us so far already. Well, if we really want to solve that completely then we really need column locking, or at least locking at the level of arbitrary (possibly overlapping) unique constraints, not just the PK because foreign keys don't necessarily reference the primary key. But the PK case is certainly the most common and it'd certainly be nice to cover that case. ... It occurs to me that if we had visibility in unique indexes, this would allow the index rows to be separately lockable to the main row. That's exactly what we need here. I've implemented a work-around using this principle, utilising RULEs and a duplicated PK column-only table. This still allows FK checks to work correctly, yet doesn't require the backend hack Csaba mentioned. My feeling is that more work in this area is required, even if we can't yet agree a TODO item. I actually like the general idea your TODO item had, although I would say non-referenced column update rather than non-PK update. Even if we put it far out due to questions about what would be acceptable implementation. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Referential Integrity and SHARE locks
Bruce Momjian [EMAIL PROTECTED] writes: OK, please propose some wording so at least we can get agreement on that. How about something open-ended like arrange for updates that do not update columns referenced by foreign keys from other tables to avoid being blocked by locks from concurrent RI checks -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Referential Integrity and SHARE locks
Gregory Stark [EMAIL PROTECTED] writes: Bruce Momjian [EMAIL PROTECTED] writes: OK, please propose some wording so at least we can get agreement on that. How about something open-ended like arrange for updates that do not update columns referenced by foreign keys from other tables to avoid being blocked by locks from concurrent RI checks Hum. Reading back in the thread it seems what I wrote is basically equivalent to the wording Simon originally proposed. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Referential Integrity and SHARE locks
On Sat, 2007-02-03 at 09:43 -0800, Stephan Szabo wrote: On Sat, 3 Feb 2007, Simon Riggs wrote: On Fri, 2007-02-02 at 16:50 -0500, Tom Lane wrote: No, I don't. I think knowledge of which columns are in a PK is quite a few levels away from the semantics of row locking. To point out just one problem, what happens when you add or drop a PK? Or drop and replace with a different column set? Yes, I know dropping one requires exclusive lock on the table, but the transaction doing it could hold row locks within the table, and now it's very unclear what they mean. There are issues, yes. Dropping PKs is a very irregular occurrence nor is it likely to be part of a complex transaction. It wouldn't bother me to say that if a transaction already holds a RowExclusiveLock or a RowShareLock it cannot upgrade to an AccessExclusiveLock. The lock check seems like a strange constraint, given that it's not necessarily going to be anything that conflicts with the row locks. I'm not sure there'd be a better idea given this sort of scheme, but it still seems strange. The TODO I was requesting you consider was this: Develop non-conflicting locking scheme to allow RI checks to co-exist peacefully with non-PK UPDATEs on the referenced table. That is, IMHO, a general statement of an important unresolved issue with our Referential Integrity implementation. That is in no way intended as any form of negative commentary on the excellent detailed work that has got us so far already. Well, if we really want to solve that completely then we really need column locking, or at least locking at the level of arbitrary (possibly overlapping) unique constraints, not just the PK because foreign keys don't necessarily reference the primary key. But the PK case is certainly the most common and it'd certainly be nice to cover that case. IMHO generic column level locking would hardly ever be used. Locking for RI seems to be 99% of the use case, which means we'd be OK if we found a way of only locking an arbitary number of unique col groups. By definition, each of these column groups is covered by a unique index. It occurs to me that if we had visibility in unique indexes, this would allow the index rows to be separately lockable to the main row. That's exactly what we need here. It also occurs to me that putting visibility in indexes doesn't prevent us from optimizing away index inserts for UPDATEs. There is no particular reason why the xmin and xmax of a unique index exactly matches the xmin and xmax of the main row. [I said the opposite to Jim Nasby a few days ago, regrettably]. The indexes would record the xmin and xmax of the row, while the main heap would have the xmin and xmax of the individual row versions. If we did both HOT + visibility in unique indexes then we would be able to eliminate the contention between INSERTs and UPDATEs with RI. As Tom pointed out this would complicate deadlock detection, but then so would any column-level locking scheme, so that isn't an argument against any one in particular. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Referential Integrity and SHARE locks
On Fri, 2007-02-02 at 16:50 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2007-02-02 at 15:57 -0500, Tom Lane wrote: , and it doesn't scale to more than two holders, and I don't think it works for combinations of share and exclusive lock. Also, what happened to the third type of lock? Well, we just need to record the maximum two lock holders (given the semantics described). The third lock type is both locks at once. You're not going to support shared locks? That will be a big step backwards ... I did say that Shared locks were supported also. The lack of ordering of multitransactions is a hole in my suggestion, so I need to reconsider. Anyway, implementation aside, I wanted to agree the overall TODO, so we can think through the best way over a long period, if you agree in general. No, I don't. I think knowledge of which columns are in a PK is quite a few levels away from the semantics of row locking. To point out just one problem, what happens when you add or drop a PK? Or drop and replace with a different column set? Yes, I know dropping one requires exclusive lock on the table, but the transaction doing it could hold row locks within the table, and now it's very unclear what they mean. There are issues, yes. Dropping PKs is a very irregular occurrence nor is it likely to be part of a complex transaction. It wouldn't bother me to say that if a transaction already holds a RowExclusiveLock or a RowShareLock it cannot upgrade to an AccessExclusiveLock. For me, PKs are intimately tied to the rows they uniquely identify; we should be mentally linking them and seeing them as a replacement for Oids, which we still see as intimately linked at low level. We'll be missing many optimizations if we don't, we've discussed others this week already. The TODO I was requesting you consider was this: Develop non-conflicting locking scheme to allow RI checks to co-exist peacefully with non-PK UPDATEs on the referenced table. That is, IMHO, a general statement of an important unresolved issue with our Referential Integrity implementation. That is in no way intended as any form of negative commentary on the excellent detailed work that has got us so far already. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Referential Integrity and SHARE locks
On Sat, 3 Feb 2007, Simon Riggs wrote: On Fri, 2007-02-02 at 16:50 -0500, Tom Lane wrote: No, I don't. I think knowledge of which columns are in a PK is quite a few levels away from the semantics of row locking. To point out just one problem, what happens when you add or drop a PK? Or drop and replace with a different column set? Yes, I know dropping one requires exclusive lock on the table, but the transaction doing it could hold row locks within the table, and now it's very unclear what they mean. There are issues, yes. Dropping PKs is a very irregular occurrence nor is it likely to be part of a complex transaction. It wouldn't bother me to say that if a transaction already holds a RowExclusiveLock or a RowShareLock it cannot upgrade to an AccessExclusiveLock. The lock check seems like a strange constraint, given that it's not necessarily going to be anything that conflicts with the row locks. I'm not sure there'd be a better idea given this sort of scheme, but it still seems strange. The TODO I was requesting you consider was this: Develop non-conflicting locking scheme to allow RI checks to co-exist peacefully with non-PK UPDATEs on the referenced table. That is, IMHO, a general statement of an important unresolved issue with our Referential Integrity implementation. That is in no way intended as any form of negative commentary on the excellent detailed work that has got us so far already. Well, if we really want to solve that completely then we really need column locking, or at least locking at the level of arbitrary (possibly overlapping) unique constraints, not just the PK because foreign keys don't necessarily reference the primary key. But the PK case is certainly the most common and it'd certainly be nice to cover that case. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Referential Integrity and SHARE locks
On 2/2/2007 4:51 AM, Simon Riggs wrote: It sounds like if we don't put a SHARE lock on the referenced table then we can end the transaction in an inconsistent state if the referenced table has concurrent UPDATEs or DELETEs. BUT those operations do impose locking rules back onto the referencing tables that would not be granted until after any changes to the referencing table complete, whereupon they would restrict or cascade. So an inconsistent state doesn't seem possible to me. What am I missing? You're missing MVCC. The newly inserted reference only becomes visible when it is committed. If the order of actions is insert and check for PK, other transaction deletes PK and commits, inserted FK commits ... the other transaction didn't see it coming. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] Referential Integrity and SHARE locks
I'm reading the SQL Standard and I can't find anywhere that says that we need to place SHARE locks on rows in the referenced table. RI_FKey_check() clearly does that. What I do see is this: 4. For each row of the referenced table, its matching rows, unique matching rows, and non-unique matching rows are determined immediately prior to the execution of any SQL procedure statement. No new matching rows are added during the execution of that SQL procedure statement. The association between a referenced row and a non-unique matching row is dropped during the execution of that SQL-statement if the referenced row is either marked for deletion or updated to a distinct value on any referenced column that corresponds to a non-null referencing column. This occurs immediately after such a mark for deletion or update of the referenced row. Unique matching rows and non-unique matching rows for a referenced row are evaluated immediately after dropping the association between that referenced row and a non-unique matching row. under General Rules for referential constraint definition That explicitly says are determined immediately prior to the execution. To me, that implies that a Read Committed snapshot is sufficient to read referenced rows and that no lock is required. Why do we need a SHARE lock at all, on the **referenc(ed)** table? It sounds like if we don't put a SHARE lock on the referenced table then we can end the transaction in an inconsistent state if the referenced table has concurrent UPDATEs or DELETEs. BUT those operations do impose locking rules back onto the referencing tables that would not be granted until after any changes to the referencing table complete, whereupon they would restrict or cascade. So an inconsistent state doesn't seem possible to me. What am I missing? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Referential Integrity and SHARE locks
On Fri, 2007-02-02 at 10:51, Simon Riggs wrote: [snip] Why do we need a SHARE lock at all, on the **referenc(ed)** table? It sounds like if we don't put a SHARE lock on the referenced table then we can end the transaction in an inconsistent state if the referenced table has concurrent UPDATEs or DELETEs. BUT those operations do impose locking rules back onto the referencing tables that would not be granted until after any changes to the referencing table complete, whereupon they would restrict or cascade. So an inconsistent state doesn't seem possible to me. What am I missing? Well, here we do have a patch (deployed on production servers) which does not put the shared lock on the referenced table, and it lets in occasionally rows in the referencing tables which do not have parent rows in the referenced table. I'm not sure what is the mechanism, but it does happen, I can assure you. It happens rare enough that is not disturbing for us, compared to the deadlocks which happen without the patch - that's another matter... In our application we never update any key ids, so only deletes/inserts come in play, and I guess it happens when a referenced row is deleted just between a newly inserted child row checks that the parent row exists and the row is really inserted. Or something like that... Cheers, Csaba. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Referential Integrity and SHARE locks
Csaba Nagy wrote: On Fri, 2007-02-02 at 10:51, Simon Riggs wrote: [snip] Why do we need a SHARE lock at all, on the **referenc(ed)** table? Well, here we do have a patch (deployed on production servers) which does not put the shared lock on the referenced table, and it lets in occasionally rows in the referencing tables which do not have parent rows in the referenced table. I'm not sure what is the mechanism, but it does happen, I can assure you. It happens rare enough that is not disturbing for us, compared to the deadlocks which happen without the patch - that's another matter... You say below the cut that you're not updating keys, so presumably it's other columns. Which leads me to something I've wondered for a while - why do we lock the whole row? Is it just a matter of not optimised that yet or is there a good reason why locking just some columns isn't practical. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Referential Integrity and SHARE locks
You say below the cut that you're not updating keys, so presumably it's other columns. Which leads me to something I've wondered for a while - why do we lock the whole row? Is it just a matter of not optimised that yet or is there a good reason why locking just some columns isn't practical. For the conditions of generating the deadlock, see: http://archives.postgresql.org/pgsql-general/2006-12/msg00029.php The reason of the occasional orphan rows is not completely clear to me, but it must be some kind of race condition while inserting/deleting/?updating concurrently the parent/child tables. Cheers, Csaba. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Referential Integrity and SHARE locks
Csaba Nagy wrote: The reason of the occasional orphan rows is not completely clear to me, but it must be some kind of race condition while inserting/deleting/?updating concurrently the parent/child tables. I guess the following sequence would generate a orphaned row. A: executes insert into table_child parent_id=1 B: executes delete from table_parent where id=1 A: RI trigger checks for matching row in table_parent B: The row with id=1 is marked as deleted in table_parent A: The new row with parent_id=1 is inserted into table_child B: The delete is commited A: The insert is comitted. Any ordering that marks the row as deleted between the execution of the ri-trigger and the insertion of the new row would lead to the same result.. greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Referential Integrity and SHARE locks
On Fri, 2 Feb 2007, Simon Riggs wrote: It sounds like if we don't put a SHARE lock on the referenced table then we can end the transaction in an inconsistent state if the referenced table has concurrent UPDATEs or DELETEs. BUT those operations do impose locking rules back onto the referencing tables that would not be granted until after any changes to the referencing table complete, whereupon they would restrict or cascade. So an inconsistent state doesn't seem possible to me. What locking back to the referencing table are you thinking about? The row locks are insufficient because that doesn't prevent an insert of a new row that matches the criteria previously locked against AFAIK. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Referential Integrity and SHARE locks
On Fri, 2007-02-02 at 12:01 +0100, Csaba Nagy wrote: You say below the cut that you're not updating keys, so presumably it's other columns. Which leads me to something I've wondered for a while - why do we lock the whole row? Is it just a matter of not optimised that yet or is there a good reason why locking just some columns isn't practical. For the conditions of generating the deadlock, see: http://archives.postgresql.org/pgsql-general/2006-12/msg00029.php The reason of the occasional orphan rows is not completely clear to me, but it must be some kind of race condition while inserting/deleting/?updating concurrently the parent/child tables. Thanks very much to Csaba, Richard and Florian for insight on this. As you might have guessed across my recent posts, I'm coping with a locking problem that is occurring on RI checks. Similarly to Csaba's example, this is a port from Oracle. My earlier thinking was that Oracle appears to be able to avoid locking and my thought was that this was simply a rather dodgy interpretation of the SQL Standard. Anyway, I'm not happy with simply forgetting the SHARE lock; that clearly leads to invalid states in some cases, even if I need to have a strong cup of coffee in the morning before I see them. Using SELECT ... FOR SHARE in RI checks is better than using FOR UPDATE, but its still too heavy a lock for many scenarios. In particular, an INSERT on the referencing table can be blocked because of an RI check against the referenced table, when there is a concurrent UPDATE (on referenced table). So RI works well when the referenced tables are mostly read-only, but we see lots of problems when we perform regular updates against both referencing and referenced tables. When we perform an INSERT into the referencing table, we want to be certain that the FK value we are inserting still exists within the referenced table. A DELETE on the referenced table should prevent the INSERT, as should an UPDATE which changes the Primary Key. However, an UPDATE which simply UPDATEs a non-PK column on the referenced table should neither block nor prevent the INSERT on the referencing table. We might think of using a SELECT ... FOR SHARE NOWAIT but that will return an ERROR. Even if we wrap the request in a subtransaction the query will still fail even when a permissible non-PK UPDATE is taking place, so that alone is not good enough. Various other thoughts about new lock modes yield nothing useful either, after close analysis. So changing the lock *strength* is not the right thing to do, but changing the lock footprint does seem worthwhile. My initial proposal is to change the way write locking works, so as to implement simplified column-level locking. Rather than locking each column individually, we can lock the columns in one of two groups, plus the full row. Thus we have three types of write lock: 1. full row write lock as well as two mutually exclusive groups of columns: 2.a) PK cols 2.b) all columns apart from the PK cols So type (1) conflicts with either (2a) or (2b). (2a) and (2b) do not conflict with one another. Shared and Write locks conflict as before at the various levels. INSERT, DELETE - full row write lock UPDATE - will place a write lock on either: full row or all-non-PK-cols, depending upon whether the SET clause touches the PK columns. (So you cannot UPDATE the PK while the non-PK cols are being UPDATEd...) If no FK references this table, we will always take a full row write lock. SELECT FOR UPDATE - full row write lock SELECT FOR SHARE - will place full row lock by default, but in cases where the SELECT doesn't reference anything apart from the PK columns, we would place the lock only on the PK-cols. (Might be easier to do this only for RI check queries.) With this model, an INSERT or FK UPDATE on the referencing table that generates a SELECT FOR SHARE onto the referenced table will conflict with a DELETE or a PK UPDATE, but won't conflict at all with a non-PK UPDATE. This would be possible by using 2 additional tuple info bits to flag that the lock held was either HEAP_LOCKED_PK_ONLY or HEAP_LOCKED_NON_PK_COLS. When both lock types are held simultaneously we will replace xmax with a multitransaction id, where we hold only two transactionids at most - the first Xid is the holder of the PK cols lock, the second Xid is the holder of the non-PK cols lock. As a non-PK UPDATE is carried out, the details of any share locks on the PK cols are carried forward onto the new row version. So all of the machinery we need is already in place, we just need to extend it somewhat. Clearly an 8.4+ item, but seems worth getting onto the TODO list if we agree to it: TODO: Develop non-conflicting locking scheme to allow RI checks to co-exist peacefully with non-PK UPDATEs on the referenced table. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)---
Re: [HACKERS] Referential Integrity and SHARE locks
On Fri, 2007-02-02 at 10:35 -0800, Stephan Szabo wrote: On Fri, 2 Feb 2007, Simon Riggs wrote: It sounds like if we don't put a SHARE lock on the referenced table then we can end the transaction in an inconsistent state if the referenced table has concurrent UPDATEs or DELETEs. BUT those operations do impose locking rules back onto the referencing tables that would not be granted until after any changes to the referencing table complete, whereupon they would restrict or cascade. So an inconsistent state doesn't seem possible to me. What locking back to the referencing table are you thinking about? The row locks are insufficient because that doesn't prevent an insert of a new row that matches the criteria previously locked against AFAIK. Probably best to read the later posts; this one was at the beginning of my thought train, so is slightly off track, as later posters remind me. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Referential Integrity and SHARE locks
Simon Riggs [EMAIL PROTECTED] writes: Thus we have three types of write lock: 1. full row write lock as well as two mutually exclusive groups of columns: 2.a) PK cols 2.b) all columns apart from the PK cols This appears to require that we add enough fields to row headers to represent *three* locking transactions instead of the current one. Given the amount of griping about row header overhead that normally flows across this list, I can't see such a proposal getting accepted. This would be possible by using 2 additional tuple info bits to flag that the lock held was either HEAP_LOCKED_PK_ONLY or HEAP_LOCKED_NON_PK_COLS. When both lock types are held simultaneously we will replace xmax with a multitransaction id, where we hold only two transactionids at most - the first Xid is the holder of the PK cols lock, the second Xid is the holder of the non-PK cols lock. You haven't thought that through: it fails to distinguish who holds which lock (mxact membership is not ordered), and it doesn't scale to more than two holders, and I don't think it works for combinations of share and exclusive lock. Also, what happened to the third type of lock? Implementation details aside, I'm a tad concerned about introducing deadlock failures that did not happen before, in scenarios where transactions touch the row multiple times and end up needing to acquire one of these locks while already holding another. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Referential Integrity and SHARE locks
On Fri, 2007-02-02 at 15:57 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Thus we have three types of write lock: 1. full row write lock as well as two mutually exclusive groups of columns: 2.a) PK cols 2.b) all columns apart from the PK cols This appears to require that we add enough fields to row headers to represent *three* locking transactions instead of the current one. Given the amount of griping about row header overhead that normally flows across this list, I can't see such a proposal getting accepted. Yeh, I said ouch myself. This would be possible by using 2 additional tuple info bits to flag that the lock held was either HEAP_LOCKED_PK_ONLY or HEAP_LOCKED_NON_PK_COLS. When both lock types are held simultaneously we will replace xmax with a multitransaction id, where we hold only two transactionids at most - the first Xid is the holder of the PK cols lock, the second Xid is the holder of the non-PK cols lock. You haven't thought that through: it fails to distinguish who holds which lock (mxact membership is not ordered) OK, sorry, I thought there was some ordering possible. , and it doesn't scale to more than two holders, and I don't think it works for combinations of share and exclusive lock. Also, what happened to the third type of lock? Well, we just need to record the maximum two lock holders (given the semantics described). The third lock type is both locks at once. Implementation details aside, I'm a tad concerned about introducing deadlock failures that did not happen before, in scenarios where transactions touch the row multiple times and end up needing to acquire one of these locks while already holding another. Well, right now we have deadlocks and lots of locking. Updating PKs isn't a regular occurence, so I'd rather swap a common deadlock for an uncommon one, any day. Anyway, implementation aside, I wanted to agree the overall TODO, so we can think through the best way over a long period, if you agree in general. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Referential Integrity and SHARE locks
Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2007-02-02 at 15:57 -0500, Tom Lane wrote: , and it doesn't scale to more than two holders, and I don't think it works for combinations of share and exclusive lock. Also, what happened to the third type of lock? Well, we just need to record the maximum two lock holders (given the semantics described). The third lock type is both locks at once. You're not going to support shared locks? That will be a big step backwards ... Anyway, implementation aside, I wanted to agree the overall TODO, so we can think through the best way over a long period, if you agree in general. No, I don't. I think knowledge of which columns are in a PK is quite a few levels away from the semantics of row locking. To point out just one problem, what happens when you add or drop a PK? Or drop and replace with a different column set? Yes, I know dropping one requires exclusive lock on the table, but the transaction doing it could hold row locks within the table, and now it's very unclear what they mean. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Referential Integrity and SHARE locks
Simon Riggs wrote: My earlier thinking was that Oracle appears to be able to avoid locking and my thought was that this was simply a rather dodgy interpretation of the SQL Standard. Anyway, I'm not happy with simply forgetting the SHARE lock; that clearly leads to invalid states in some cases, even if I need to have a strong cup of coffee in the morning before I see them. I think oracle is in a completly different situation here - Oracle imposes limits on the maximum size of a transaction dues to various reasons I believe - one being the size of the rollback segment. AFAIK, postgres doesn't impose any such limits (apart from problems with long-running transactions an vacuums, and possibly if you do set constraints all deferred) - which is why row locks have to be stored on-disk in the tuple header, and not in some shared-memory segment. Now, _if_ you're already imposing limits on transaction size, than it becomes quite feasable IMHO to also limit the number of row-locks a transaction can take - and to just store them in memory. This again makes column-level locking much easier I'd think. Using SELECT ... FOR SHARE in RI checks is better than using FOR UPDATE, but its still too heavy a lock for many scenarios. I think it's not too heavy, but it's actually the wrong kind of lock for ri checks. Both SHARE and EXCLUSIVE row locks are, well, _row_ locks - the lock a specific tuple. This is fine, if you want to guarantee that a certain tuple stays as it is as long as still need it. But it's not really what a RI constraints wants to ensure. An RI constraint actually wants to force a specific _condition_ (namely, the existence of a row with a certain value in a certain column) to be true, not prevent a specific physical tuple from being modifier. Now, a generic mechanism for condition locking is probably impossible to implement with large performance sacrifices - but AFAICS the cases needed for RI checks are always of the form Is there a row where (field1, field2, ...) = (a, b, c, ...). And - at least for RI-Checks done when updating the _referencing_ table, postgres already forces an index to exist on (field1, field2, ...) I think. The condition There is a row where (field1, field2, ...) = (a,b,c,...) is the same as saying There as an index entry for (a,b,c,) that points to a live row. _If_ is was possible to somehow take a lock on a index key (not a certain index entry, but rather _all_ entries with a given key), than that could maybe be used for more efficient RI locks. I guess this would need some sort of tuple-visibility-in-index entries, but it seems that there a few people interested in making this happen. greetings, Florian Pflug ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings