[jira] [Commented] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116570#comment-17116570 ] Michael Semb Wever commented on CASSANDRA-15805: [~slebresne], i've re-run #131 for you as https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/134/pipeline (taking away the stress and cdc stages, as they don't exist in 3.0) and #132 as https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/135/pipeline (just a retry). (The CI failure in #132 is addressed in CASSANDRA-15826.) > Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones > interacts with collection tombstones > -- > > Key: CASSANDRA-15805 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15805 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/SSTable >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > The legacy reading code ({{LegacyLayout}} and > {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly > the case where a range tombstone covering multiple rows interacts with a > collection tombstone. > A simple example of this problem is if one runs on 2.X: > {noformat} > CREATE TABLE t ( > k int, > c1 text, > c2 text, > a text, > b set, > c text, > PRIMARY KEY((k), c1, c2) > ); > // Delete all rows where c1 is 'A' > DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A'; > // Inserts a row covered by that previous range tombstone > INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, > 'bar') USING TIMESTAMP 2; > // Delete the collection of that previously inserted row > DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X'; > {noformat} > If the following is ran on 2.X (with everything either flushed in the same > table or compacted together), then this will result in the inserted row being > duplicated (one part containing the {{a}} column, the other the {{c}} one). > I will note that this is _not_ a duplicate of CASSANDRA-15789 and this > reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, > the additional code added to CASSANDRA-15789 to force merging duplicated rows > if they are produced _will_ end up fixing this as a consequence (assuming > there is no variation of this problem that leads to other visible issues than > duplicated rows). That said, I "think" we'd still rather fix the source of > the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111019#comment-17111019 ] Sylvain Lebresne commented on CASSANDRA-15805: -- Thanks for the review. I addressed the comments, squash-cleaned, 'merged' into 3.11 and started CI (first try at https://ci-cassandra.apache.org, not sure how that will go). ||branch||CI|| | [3.0|https://github.com/pcmanus/cassandra/commits/C-15805-3.0] | [ci-cassandra #122|https://ci-cassandra.apache.org/job/Cassandra-devbranch/122/] | | [3.11|https://github.com/pcmanus/cassandra/commits/C-15805-3.11] | [ci-cassandra #123|https://ci-cassandra.apache.org/job/Cassandra-devbranch/123/] | > Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones > interacts with collection tombstones > -- > > Key: CASSANDRA-15805 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15805 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/SSTable >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > The legacy reading code ({{LegacyLayout}} and > {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly > the case where a range tombstone covering multiple rows interacts with a > collection tombstone. > A simple example of this problem is if one runs on 2.X: > {noformat} > CREATE TABLE t ( > k int, > c1 text, > c2 text, > a text, > b set, > c text, > PRIMARY KEY((k), c1, c2) > ); > // Delete all rows where c1 is 'A' > DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A'; > // Inserts a row covered by that previous range tombstone > INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, > 'bar') USING TIMESTAMP 2; > // Delete the collection of that previously inserted row > DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X'; > {noformat} > If the following is ran on 2.X (with everything either flushed in the same > table or compacted together), then this will result in the inserted row being > duplicated (one part containing the {{a}} column, the other the {{c}} one). > I will note that this is _not_ a duplicate of CASSANDRA-15789 and this > reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, > the additional code added to CASSANDRA-15789 to force merging duplicated rows > if they are produced _will_ end up fixing this as a consequence (assuming > there is no variation of this problem that leads to other visible issues than > duplicated rows). That said, I "think" we'd still rather fix the source of > the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110514#comment-17110514 ] Aleksey Yeschenko commented on CASSANDRA-15805: --- Nit, feel free to ignore: in {{UnfilteredDeserializer}} on line 701, redundant {{this}}. Otherwise pretty straightforward and LGTM. > Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones > interacts with collection tombstones > -- > > Key: CASSANDRA-15805 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15805 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/SSTable >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > The legacy reading code ({{LegacyLayout}} and > {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly > the case where a range tombstone covering multiple rows interacts with a > collection tombstone. > A simple example of this problem is if one runs on 2.X: > {noformat} > CREATE TABLE t ( > k int, > c1 text, > c2 text, > a text, > b set, > c text, > PRIMARY KEY((k), c1, c2) > ); > // Delete all rows where c1 is 'A' > DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A'; > // Inserts a row covered by that previous range tombstone > INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, > 'bar') USING TIMESTAMP 2; > // Delete the collection of that previously inserted row > DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X'; > {noformat} > If the following is ran on 2.X (with everything either flushed in the same > table or compacted together), then this will result in the inserted row being > duplicated (one part containing the {{a}} column, the other the {{c}} one). > I will note that this is _not_ a duplicate of CASSANDRA-15789 and this > reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, > the additional code added to CASSANDRA-15789 to force merging duplicated rows > if they are produced _will_ end up fixing this as a consequence (assuming > there is no variation of this problem that leads to other visible issues than > duplicated rows). That said, I "think" we'd still rather fix the source of > the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108165#comment-17108165 ] Marcus Eriksson commented on CASSANDRA-15805: - This LGTM, two minor ignoreable comments; * lazily initialise the {{outOfOrderAtom}} queue in {{AtomIterator}} as it will be infrequently used (and maybe name it {{outOfOrderAtoms}} ) * consume the peeked atom ({{atoms.next()}}) before calling {{atoms.pushOutOfOrder}} in {{UnfilteredIterator#readRow}} to make it obvious we can't consume the atom we just pushed > Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones > interacts with collection tombstones > -- > > Key: CASSANDRA-15805 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15805 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/SSTable >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > The legacy reading code ({{LegacyLayout}} and > {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly > the case where a range tombstone covering multiple rows interacts with a > collection tombstone. > A simple example of this problem is if one runs on 2.X: > {noformat} > CREATE TABLE t ( > k int, > c1 text, > c2 text, > a text, > b set, > c text, > PRIMARY KEY((k), c1, c2) > ); > // Delete all rows where c1 is 'A' > DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A'; > // Inserts a row covered by that previous range tombstone > INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, > 'bar') USING TIMESTAMP 2; > // Delete the collection of that previously inserted row > DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X'; > {noformat} > If the following is ran on 2.X (with everything either flushed in the same > table or compacted together), then this will result in the inserted row being > duplicated (one part containing the {{a}} column, the other the {{c}} one). > I will note that this is _not_ a duplicate of CASSANDRA-15789 and this > reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, > the additional code added to CASSANDRA-15789 to force merging duplicated rows > if they are produced _will_ end up fixing this as a consequence (assuming > there is no variation of this problem that leads to other visible issues than > duplicated rows). That said, I "think" we'd still rather fix the source of > the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17106271#comment-17106271 ] Sylvain Lebresne commented on CASSANDRA-15805: -- There is probably a few variations for how to fix this, but what feels the more intuitive to me is to: # slightly modify the ctor of {{LegacyRangeTombstone}} so the {{atom5}} of my previous comment use an inclusive start. Mostly because that make the rest of the logic a bit simpler imo (we can still assume that when we get a atom whose cluster strictly sort after the currently grouper row, we're done with that row). # modify {{UnfilteredDeserializer.OldFormatDeserializer}} so that when, while grouping a row, it encounters a RT that covers it, is "splits" that into a row tombstone covering the row, and push back the handling of the rest of the tombstone to when the row is truly finished. I've pushed a patch doing so for 3.0 below (thanks to [~markus] for triggering CI on this): ||branch||unit tests||dtests||jvm dtests||jvm upgrade dtest|| | https://github.com/pcmanus/cassandra/commits/C-15805-3.0 | [utests|https://circleci.com/gh/krummas/cassandra/3289] | [vnodes|https://circleci.com/gh/krummas/cassandra/3292] [no-vnodes|https://circleci.com/gh/krummas/cassandra/3293] | [jvm dtests|https://circleci.com/gh/krummas/cassandra/3290] | [upgrade dtests|https://circleci.com/gh/krummas/cassandra/3294] | I'll note that the branch contains another small fix, that is a lot less important. Namely, [the return at the beginning of {{CellGrouper#addCollectionTombstone}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1516] should be {{true}}, not {{false}} as it currently is. The test in question checks if a collection tombstone happens to not be selected by the query we're decoding data for. If it isn't included, we can ignore the tombstone, so we can/should return, but not with {{false}} as that imply the row is finished, which it probably isn't. Now the reason I say that last problem is less important is that in practice, only thrift queries should run into this (since CQL queries queries all column effectively) so even if we duplicate the row here, it won't matter when the result is converted back to thrift (besides, having a collection tombstone implies that this is a thrift query on a CQL table, which is dodgy in the first place). Anyway, the code is still obviously wrong and the fix trivial, so included it it nonetheless (in a separate commit). > Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones > interacts with collection tombstones > -- > > Key: CASSANDRA-15805 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15805 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/SSTable >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne >Priority: Normal > Fix For: 3.0.x, 3.11.x > > > The legacy reading code ({{LegacyLayout}} and > {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly > the case where a range tombstone covering multiple rows interacts with a > collection tombstone. > A simple example of this problem is if one runs on 2.X: > {noformat} > CREATE TABLE t ( > k int, > c1 text, > c2 text, > a text, > b set, > c text, > PRIMARY KEY((k), c1, c2) > ); > // Delete all rows where c1 is 'A' > DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A'; > // Inserts a row covered by that previous range tombstone > INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, > 'bar') USING TIMESTAMP 2; > // Delete the collection of that previously inserted row > DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X'; > {noformat} > If the following is ran on 2.X (with everything either flushed in the same > table or compacted together), then this will result in the inserted row being > duplicated (one part containing the {{a}} column, the other the {{c}} one). > I will note that this is _not_ a duplicate of CASSANDRA-15789 and this > reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, > the additional code added to CASSANDRA-15789 to force merging duplicated rows > if they are produced _will_ end up fixing this as a consequence (assuming > there is no variation of this problem that leads to other visible issues than > duplicated rows). That said, I "think" we'd still rather fix the source of > the issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Commented] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105536#comment-17105536 ] Sylvain Lebresne commented on CASSANDRA-15805: -- To understand why this happens, let me write down the atoms that the example of the description generates on 2.X (using a simplified representation that I hope is clear enough): {noformat} atom1: RT([A:_, A:X:b:_])@1, // beginning of all 'A' rows to beginning of A:X's b column atom2: Cell(A:X:)@2, // row marker for A:X atom3: Cell(A:X:a=foo)@2,// value of a in A:X atom4: RT([A:X:b:_, A:X:b:!])@3, // collection tombstone for b in A:X's atom5: RT([A:X:b:!, A:!])@1, // remainder of covering RT, from end of b in A:X to end of all 'A' rows atom6: Cell(A:X:c=bar)@2 // value of c in A:X {noformat} Those atoms are deserialized into {{LegacyCell}} and {{LegacyRangeTombstone}} on 3.X as: {noformat} atom1: RT(Bound(INCL_START_BOUND(A), collection=null)-Bound(EXCL_END_BOUND(A:B), collection=null), deletedAt=1, localDeletion=1589204864) atom2: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=null, collElt=null), v=, ts=2, ldt=2147483647, ttl=0) atom3: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=a, collElt=null), v=foo, ts=2, ldt=2147483647, ttl=0) atom4: RT(Bound(INCL_START_BOUND(A:X), collection=b)-Bound(INCL_END_BOUND(A:X), collection=b), deletedAt=3, localDeletion=1589204864) atom5: RT(Bound(EXCL_START_BOUND(A:X), collection=null)-Bound(INCL_END_BOUND(A), collection=null), deletedAt=1, localDeletion=1589204864) atom6: LegacyCell(REGULAR, name=Cellname(clustering=A:X, column=c, collElt=null), v=bar, ts=2, ldt=2147483647, ttl=0) {noformat} I'll point out that those are a direct translation of the 2.X atoms except for {{atom1}} and {{atom5}} that are slightly different: * instead of {{atom1}} stopping at the beginning of the row {{b}} column, it extends to the end of the row. * and instead of {{atom5}} staring after that {{b}} column, it starts after the row. Do note however that the order of atoms is still the one above, so that atom is effectively out-of-order. The reason for those differences is the logic [at the beginning of {{LegacyLayout.RangeTombstone}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1883], whose comment is trying to explain, but is basically due to the legacy layer having to map all 2.X RTs into either a 3.X range tombstone (so one over multiple rows), a row tombstone or a collection one. Anyway, as mentioned above, the problem is that {{atom5}} is out of order. What currently happens is that when {{atom5}} is encountered by {{UnfilteredDeserialized.OldFormatDeserializer}}, it will be passed to the {{CellGrouper}} currently grouping the row, and will end up in the [{{CellGrouper#addGenericTombstone}} method|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1544]. But, because that atom starts strictly after the row being grouped, the method returns {{false}} and the row is generated a first time. Later, we get {{atom6}} which restarts the row with the value of column {{c}}, after which it is generated a second time. > Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones > interacts with collection tombstones > -- > > Key: CASSANDRA-15805 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15805 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination, Local/SSTable >Reporter: Sylvain Lebresne >Priority: Normal > > The legacy reading code ({{LegacyLayout}} and > {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly > the case where a range tombstone covering multiple rows interacts with a > collection tombstone. > A simple example of this problem is if one runs on 2.X: > {noformat} > CREATE TABLE t ( > k int, > c1 text, > c2 text, > a text, > b set, > c text, > PRIMARY KEY((k), c1, c2) > ); > // Delete all rows where c1 is 'A' > DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A'; > // Inserts a row covered by that previous range tombstone > INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, > 'bar') USING TIMESTAMP 2; > // Delete the collection of that previously inserted row > DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X'; > {noformat} > If the following is ran on 2.X (with everything either flushed in the same > table or compacted together), then this will result in the inserted row being > duplicated (one part containing the {{a}} column, the other the {{c}} one). > I will note that this is