On Tue, Jun 18, 2024 at 1:18 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > On Tue, Jun 18, 2024 at 12:11 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Tue, Jun 18, 2024 at 11:54 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > > > > > > On Mon, Jun 17, 2024 at 8:51 PM Robert Haas <robertmh...@gmail.com> wrote: > > > > > > > > On Mon, Jun 17, 2024 at 1:42 AM Amit Kapila <amit.kapil...@gmail.com> > > > > wrote: > > > > > The difference w.r.t the existing mechanisms for holding deleted data > > > > > is that we don't know whether we need to hold off the vacuum from > > > > > cleaning up the rows because we can't say with any certainty whether > > > > > other nodes will perform any conflicting operations in the future. > > > > > Using the example we discussed, > > > > > Node A: > > > > > T1: INSERT INTO t (id, value) VALUES (1,1); > > > > > T2: DELETE FROM t WHERE id = 1; > > > > > > > > > > Node B: > > > > > T3: UPDATE t SET value = 2 WHERE id = 1; > > > > > > > > > > Say the order of receiving the commands is T1-T2-T3. We can't predict > > > > > whether we will ever get T-3, so on what basis shall we try to prevent > > > > > vacuum from removing the deleted row? > > > > > > > > The problem arises because T2 and T3 might be applied out of order on > > > > some nodes. Once either one of them has been applied on every node, no > > > > further conflicts are possible. > > > > > > If we decide to skip the update whether the row is missing or deleted, > > > we indeed reach the same end result regardless of the order of T2, T3, > > > and Vacuum. Here's how it looks in each case: > > > > > > Case 1: T1, T2, Vacuum, T3 -> Skip the update for a non-existing row > > > -> end result we do not have a row. > > > Case 2: T1, T2, T3 -> Skip the update for a deleted row -> end result > > > we do not have a row. > > > Case 3: T1, T3, T2 -> deleted the row -> end result we do not have a row. > > > > > > > In case 3, how can deletion be successful? The row required to be > > deleted has already been updated. > > Hmm, I was considering this case in the example given by you above[1], > so we have updated some fields of the row with id=1, isn't this row > still detectable by the delete because delete will find this by id=1 > as we haven't updated the id? I was making the point w.r.t. the > example used above. >
Your point is correct w.r.t the example but I responded considering a general update-delete ordering. BTW, it is not clear to me how update_delete conflict will be handled with what Robert and you are saying. I'll try to say what I understood. If we assume that there are two nodes A & B as mentioned in the above example and DELETE has applied on both nodes, now say UPDATE has been performed on node B then irrespective of whether we consider the conflict as update_delete or update_missing, the data will remain same on both nodes. So, in such a case, we don't need to bother differentiating between those two types of conflicts. Is that what we can interpret from above? -- With Regards, Amit Kapila.