[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810931#comment-16810931 ] Sam Tunnicliffe commented on CASSANDRA-15072: - This looks safe to me wrt to "the dark corners" as the new counter is only used in this very specific use case, so if the CI looks good I'm +1 on the patch. > Incomplete range results during 2.X -> 3.11.4 upgrade > - > > Key: CASSANDRA-15072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15072 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: Muir Manders >Assignee: Blake Eggleston >Priority: High > Attachments: eriksw-repro.sh > > > Hello > During an upgrade from 2.1.17 to 3.11.4, our application starting getting > back incomplete results for range queries. When all nodes were upgraded > (before upgrading sstables), we stopped getting incomplete results. I was > able to reproduce it and listed steps below. It seems to require the random > partitioner and compact storage to reproduce reliably. It also reproduces > coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old > node is your coordinator and it has to talk to an upgraded replica. > {noformat} > ccm create test -v 2.1.17 -n 3 > ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' > ccm node1 updateconf 'initial_token: 0' > ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' > ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' > ccm start > ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE COLUMNFAMILY test.test ( > id text, > foo text, > bar text, > PRIMARY KEY (id) > ) WITH COMPACT STORAGE; > CONSISTENCY QUORUM; > INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); > INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); > SCHEMA > ccm node1 stop > ccm node1 setdir -v 3.11.4 > ccm node1 start > ccm node2 stop > ccm node2 setdir -v 3.11.4 > ccm node2 start > # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to > # allow for simpler test setup) > cqlsh 127.0.0.3 < CONSISTENCY QUORUM; > PAGING 2; > select * from test.test; > QUERY > {noformat} > This results in: > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > (1 rows) > {noformat} > Running it against the upgraded node (node1): > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > 1 | there | hi > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810575#comment-16810575 ] Sam Tunnicliffe commented on CASSANDRA-15072: - [~bdeggleston] sure, I'll review asap > Incomplete range results during 2.X -> 3.11.4 upgrade > - > > Key: CASSANDRA-15072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15072 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: Muir Manders >Assignee: Blake Eggleston >Priority: High > Attachments: eriksw-repro.sh > > > Hello > During an upgrade from 2.1.17 to 3.11.4, our application starting getting > back incomplete results for range queries. When all nodes were upgraded > (before upgrading sstables), we stopped getting incomplete results. I was > able to reproduce it and listed steps below. It seems to require the random > partitioner and compact storage to reproduce reliably. It also reproduces > coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old > node is your coordinator and it has to talk to an upgraded replica. > {noformat} > ccm create test -v 2.1.17 -n 3 > ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' > ccm node1 updateconf 'initial_token: 0' > ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' > ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' > ccm start > ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE COLUMNFAMILY test.test ( > id text, > foo text, > bar text, > PRIMARY KEY (id) > ) WITH COMPACT STORAGE; > CONSISTENCY QUORUM; > INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); > INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); > SCHEMA > ccm node1 stop > ccm node1 setdir -v 3.11.4 > ccm node1 start > ccm node2 stop > ccm node2 setdir -v 3.11.4 > ccm node2 start > # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to > # allow for simpler test setup) > cqlsh 127.0.0.3 < CONSISTENCY QUORUM; > PAGING 2; > select * from test.test; > QUERY > {noformat} > This results in: > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > (1 rows) > {noformat} > Running it against the upgraded node (node1): > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > 1 | there | hi > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808193#comment-16808193 ] Blake Eggleston commented on CASSANDRA-15072: - No problem. Yes mixed mode just means you're upgrading your cluster. I don't know the exact cause, but you've summarized what I think is probably happening. Specifically the legacy read path on the 3.0 nodes is probably always interpreting single cells as rows for compact storage tables, even ones without clustering columns. > Incomplete range results during 2.X -> 3.11.4 upgrade > - > > Key: CASSANDRA-15072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15072 > Project: Cassandra > Issue Type: Bug >Reporter: Muir Manders >Assignee: Blake Eggleston >Priority: High > Attachments: eriksw-repro.sh > > > Hello > During an upgrade from 2.1.17 to 3.11.4, our application starting getting > back incomplete results for range queries. When all nodes were upgraded > (before upgrading sstables), we stopped getting incomplete results. I was > able to reproduce it and listed steps below. It seems to require the random > partitioner and compact storage to reproduce reliably. It also reproduces > coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old > node is your coordinator and it has to talk to an upgraded replica. > {noformat} > ccm create test -v 2.1.17 -n 3 > ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' > ccm node1 updateconf 'initial_token: 0' > ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' > ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' > ccm start > ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE COLUMNFAMILY test.test ( > id text, > foo text, > bar text, > PRIMARY KEY (id) > ) WITH COMPACT STORAGE; > CONSISTENCY QUORUM; > INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); > INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); > SCHEMA > ccm node1 stop > ccm node1 setdir -v 3.11.4 > ccm node1 start > ccm node2 stop > ccm node2 setdir -v 3.11.4 > ccm node2 start > # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to > # allow for simpler test setup) > cqlsh 127.0.0.3 < CONSISTENCY QUORUM; > PAGING 2; > select * from test.test; > QUERY > {noformat} > This results in: > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > (1 rows) > {noformat} > Running it against the upgraded node (node1): > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > 1 | there | hi > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808142#comment-16808142 ] Muir Manders commented on CASSANDRA-15072: -- Thanks for helping us investigate this issue. Do you think you understand the exact cause at this point? {quote}It looks like the mixed mode read path is treating the table as a proper compact storage table though, and treating each cell as a row {quote} Does "mixed mode" refer to the mixed 2.X <=> 3.X cassandra versions? >From a high level it sounds like a 2.X coordinator and a 3.X replica have some >confusion regarding compact storage cells vs. rows, and how many are needed to >satisfy a limit or page quota. Is that still what you think is going on? > Incomplete range results during 2.X -> 3.11.4 upgrade > - > > Key: CASSANDRA-15072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15072 > Project: Cassandra > Issue Type: Bug >Reporter: Muir Manders >Assignee: Blake Eggleston >Priority: High > Attachments: eriksw-repro.sh > > > Hello > During an upgrade from 2.1.17 to 3.11.4, our application starting getting > back incomplete results for range queries. When all nodes were upgraded > (before upgrading sstables), we stopped getting incomplete results. I was > able to reproduce it and listed steps below. It seems to require the random > partitioner and compact storage to reproduce reliably. It also reproduces > coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old > node is your coordinator and it has to talk to an upgraded replica. > {noformat} > ccm create test -v 2.1.17 -n 3 > ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' > ccm node1 updateconf 'initial_token: 0' > ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' > ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' > ccm start > ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE COLUMNFAMILY test.test ( > id text, > foo text, > bar text, > PRIMARY KEY (id) > ) WITH COMPACT STORAGE; > CONSISTENCY QUORUM; > INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); > INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); > SCHEMA > ccm node1 stop > ccm node1 setdir -v 3.11.4 > ccm node1 start > ccm node2 stop > ccm node2 setdir -v 3.11.4 > ccm node2 start > # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to > # allow for simpler test setup) > cqlsh 127.0.0.3 < CONSISTENCY QUORUM; > PAGING 2; > select * from test.test; > QUERY > {noformat} > This results in: > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > (1 rows) > {noformat} > Running it against the upgraded node (node1): > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > 1 | there | hi > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808114#comment-16808114 ] Blake Eggleston commented on CASSANDRA-15072: - Huh, I did not know that. I guess that makes sense though. So then this is just an upgrade bug. > Incomplete range results during 2.X -> 3.11.4 upgrade > - > > Key: CASSANDRA-15072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15072 > Project: Cassandra > Issue Type: Bug >Reporter: Muir Manders >Assignee: Blake Eggleston >Priority: High > Attachments: eriksw-repro.sh > > > Hello > During an upgrade from 2.1.17 to 3.11.4, our application starting getting > back incomplete results for range queries. When all nodes were upgraded > (before upgrading sstables), we stopped getting incomplete results. I was > able to reproduce it and listed steps below. It seems to require the random > partitioner and compact storage to reproduce reliably. It also reproduces > coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old > node is your coordinator and it has to talk to an upgraded replica. > {noformat} > ccm create test -v 2.1.17 -n 3 > ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' > ccm node1 updateconf 'initial_token: 0' > ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' > ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' > ccm start > ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE COLUMNFAMILY test.test ( > id text, > foo text, > bar text, > PRIMARY KEY (id) > ) WITH COMPACT STORAGE; > CONSISTENCY QUORUM; > INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); > INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); > SCHEMA > ccm node1 stop > ccm node1 setdir -v 3.11.4 > ccm node1 start > ccm node2 stop > ccm node2 setdir -v 3.11.4 > ccm node2 start > # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to > # allow for simpler test setup) > cqlsh 127.0.0.3 < CONSISTENCY QUORUM; > PAGING 2; > select * from test.test; > QUERY > {noformat} > This results in: > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > (1 rows) > {noformat} > Running it against the upgraded node (node1): > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > 1 | there | hi > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808060#comment-16808060 ] Muir Manders commented on CASSANDRA-15072: -- [https://docs.datastax.com/en/cql/3.3/cql/cql_using/useCompactStorage.html] also explicitly states the implied inverse: {quote} A compact table with a primary key that is not compound can have multiple columns that are not part of the primary key. {quote} > Incomplete range results during 2.X -> 3.11.4 upgrade > - > > Key: CASSANDRA-15072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15072 > Project: Cassandra > Issue Type: Bug >Reporter: Muir Manders >Assignee: Blake Eggleston >Priority: High > Attachments: eriksw-repro.sh > > > Hello > During an upgrade from 2.1.17 to 3.11.4, our application starting getting > back incomplete results for range queries. When all nodes were upgraded > (before upgrading sstables), we stopped getting incomplete results. I was > able to reproduce it and listed steps below. It seems to require the random > partitioner and compact storage to reproduce reliably. It also reproduces > coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old > node is your coordinator and it has to talk to an upgraded replica. > {noformat} > ccm create test -v 2.1.17 -n 3 > ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' > ccm node1 updateconf 'initial_token: 0' > ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' > ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' > ccm start > ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE COLUMNFAMILY test.test ( > id text, > foo text, > bar text, > PRIMARY KEY (id) > ) WITH COMPACT STORAGE; > CONSISTENCY QUORUM; > INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); > INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); > SCHEMA > ccm node1 stop > ccm node1 setdir -v 3.11.4 > ccm node1 start > ccm node2 stop > ccm node2 setdir -v 3.11.4 > ccm node2 start > # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to > # allow for simpler test setup) > cqlsh 127.0.0.3 < CONSISTENCY QUORUM; > PAGING 2; > select * from test.test; > QUERY > {noformat} > This results in: > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > (1 rows) > {noformat} > Running it against the upgraded node (node1): > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > 1 | there | hi > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808049#comment-16808049 ] Peter Sanford commented on CASSANDRA-15072: --- {quote}Tables with compact storage can only have a single column, so you shouldn’t be able to create a compact storage table with 2 columns. {quote} According to [http://cassandra.apache.org/doc/latest/cql/ddl.html] that restriction is only for tables with clustering columns: {quote}if a compact table has at least one clustering column, then it must have exactly one column outside of the primary key ones. {quote} We have a lot of tables created from thrift (compact storage) that do not have clustering columns and have > 1 column in the CQL schema. > Incomplete range results during 2.X -> 3.11.4 upgrade > - > > Key: CASSANDRA-15072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15072 > Project: Cassandra > Issue Type: Bug >Reporter: Muir Manders >Assignee: Blake Eggleston >Priority: High > Attachments: eriksw-repro.sh > > > Hello > During an upgrade from 2.1.17 to 3.11.4, our application starting getting > back incomplete results for range queries. When all nodes were upgraded > (before upgrading sstables), we stopped getting incomplete results. I was > able to reproduce it and listed steps below. It seems to require the random > partitioner and compact storage to reproduce reliably. It also reproduces > coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old > node is your coordinator and it has to talk to an upgraded replica. > {noformat} > ccm create test -v 2.1.17 -n 3 > ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' > ccm node1 updateconf 'initial_token: 0' > ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' > ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' > ccm start > ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE COLUMNFAMILY test.test ( > id text, > foo text, > bar text, > PRIMARY KEY (id) > ) WITH COMPACT STORAGE; > CONSISTENCY QUORUM; > INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); > INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); > SCHEMA > ccm node1 stop > ccm node1 setdir -v 3.11.4 > ccm node1 start > ccm node2 stop > ccm node2 setdir -v 3.11.4 > ccm node2 start > # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to > # allow for simpler test setup) > cqlsh 127.0.0.3 < CONSISTENCY QUORUM; > PAGING 2; > select * from test.test; > QUERY > {noformat} > This results in: > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > (1 rows) > {noformat} > Running it against the upgraded node (node1): > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > 1 | there | hi > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808019#comment-16808019 ] Blake Eggleston commented on CASSANDRA-15072: - This is a great repro script, thanks. A couple of observations: * test.test has 2 columns, and uses compact storage, which shouldn’t be possible * node1 & node3 are the replicas of the missing partition (we’re querying from the un-upgraded node2, for those following along). * doing a point read ({{select * from test.test where id=‘1’;}}) returns the expected partition * using LIMIT 2 instead of PAGING 2 has the same problem * LIMIT 3 returns a partial row: {{1 | there | null}} * LIMIT 4 returns the entire row: {{1 | there | hi}} Tables with compact storage can only have a single column, so you shouldn’t be able to create a compact storage table with 2 columns. Instead of throwing an error though, it seems like it just silently treats the table as a normal table. This might be why no one has noticed that our ddl validation is broken. It looks like the mixed mode read path is treating the table as a proper compact storage table though, and treating each cell as a row, which is why you see partial rows start to appear as you increase the limit. If you remove compact storage from the ddl, or only use a single column, everything works normally. I'll think on the best way to address this. > Incomplete range results during 2.X -> 3.11.4 upgrade > - > > Key: CASSANDRA-15072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15072 > Project: Cassandra > Issue Type: Bug >Reporter: Muir Manders >Assignee: Blake Eggleston >Priority: High > Attachments: eriksw-repro.sh > > > Hello > During an upgrade from 2.1.17 to 3.11.4, our application starting getting > back incomplete results for range queries. When all nodes were upgraded > (before upgrading sstables), we stopped getting incomplete results. I was > able to reproduce it and listed steps below. It seems to require the random > partitioner and compact storage to reproduce reliably. It also reproduces > coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old > node is your coordinator and it has to talk to an upgraded replica. > {noformat} > ccm create test -v 2.1.17 -n 3 > ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' > ccm node1 updateconf 'initial_token: 0' > ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' > ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' > ccm start > ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE COLUMNFAMILY test.test ( > id text, > foo text, > bar text, > PRIMARY KEY (id) > ) WITH COMPACT STORAGE; > CONSISTENCY QUORUM; > INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); > INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); > SCHEMA > ccm node1 stop > ccm node1 setdir -v 3.11.4 > ccm node1 start > ccm node2 stop > ccm node2 setdir -v 3.11.4 > ccm node2 start > # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to > # allow for simpler test setup) > cqlsh 127.0.0.3 < CONSISTENCY QUORUM; > PAGING 2; > select * from test.test; > QUERY > {noformat} > This results in: > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > (1 rows) > {noformat} > Running it against the upgraded node (node1): > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > 1 | there | hi > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807354#comment-16807354 ] Erik Swanson commented on CASSANDRA-15072: -- Please see the attached [^eriksw-repro.sh], which includes aggressive flushing, draining, and deleting commit logs when stopped to ensure they play no part. With these steps, the truncated results behavior when querying node2 (the un-upgraded node) is 100% reproducible for me for an unlimited number of queries with CONSISTENCY ALL. {quote}do you mean data you'd inserted before the upgrade reappeared? {quote} Yes. After the last node was upgraded to Cassandra 3.11.4, we no longer saw truncated results regardless of which node we queried. All data that should have been in the results was correctly returned for all queries after that point. > Incomplete range results during 2.X -> 3.11.4 upgrade > - > > Key: CASSANDRA-15072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15072 > Project: Cassandra > Issue Type: Bug >Reporter: Muir Manders >Assignee: Blake Eggleston >Priority: High > Attachments: eriksw-repro.sh > > > Hello > During an upgrade from 2.1.17 to 3.11.4, our application starting getting > back incomplete results for range queries. When all nodes were upgraded > (before upgrading sstables), we stopped getting incomplete results. I was > able to reproduce it and listed steps below. It seems to require the random > partitioner and compact storage to reproduce reliably. It also reproduces > coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old > node is your coordinator and it has to talk to an upgraded replica. > {noformat} > ccm create test -v 2.1.17 -n 3 > ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' > ccm node1 updateconf 'initial_token: 0' > ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' > ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' > ccm start > ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE COLUMNFAMILY test.test ( > id text, > foo text, > bar text, > PRIMARY KEY (id) > ) WITH COMPACT STORAGE; > CONSISTENCY QUORUM; > INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); > INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); > SCHEMA > ccm node1 stop > ccm node1 setdir -v 3.11.4 > ccm node1 start > ccm node2 stop > ccm node2 setdir -v 3.11.4 > ccm node2 start > # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to > # allow for simpler test setup) > cqlsh 127.0.0.3 < CONSISTENCY QUORUM; > PAGING 2; > select * from test.test; > QUERY > {noformat} > This results in: > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > (1 rows) > {noformat} > Running it against the upgraded node (node1): > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > 1 | there | hi > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807327#comment-16807327 ] Blake Eggleston commented on CASSANDRA-15072: - Ok, I can repro your issue with the updated script. It looks like you’re hitting a commit log bug that was introduced in 2.1 and fixed in 3.0 (CASSANDRA-13987) If you drain node 1 & 2 before shutting them down, this should stop happening. I’d also expected putting a sleep larger than the commit log sync interval before shutting down node 1 would fix the problem, but it didn’t. I’m still looking at why that is. When you say: {quote}When all nodes were upgraded (before upgrading sstables), we stopped getting incomplete results {quote} do you mean data you'd inserted before the upgrade reappeared? > Incomplete range results during 2.X -> 3.11.4 upgrade > - > > Key: CASSANDRA-15072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15072 > Project: Cassandra > Issue Type: Bug >Reporter: Muir Manders >Priority: High > > Hello > During an upgrade from 2.1.17 to 3.11.4, our application starting getting > back incomplete results for range queries. When all nodes were upgraded > (before upgrading sstables), we stopped getting incomplete results. I was > able to reproduce it and listed steps below. It seems to require the random > partitioner and compact storage to reproduce reliably. It also reproduces > coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old > node is your coordinator and it has to talk to an upgraded replica. > {noformat} > ccm create test -v 2.1.17 -n 3 > ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' > ccm node1 updateconf 'initial_token: 0' > ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' > ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' > ccm start > ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE COLUMNFAMILY test.test ( > id text, > foo text, > bar text, > PRIMARY KEY (id) > ) WITH COMPACT STORAGE; > CONSISTENCY QUORUM; > INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); > INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); > SCHEMA > ccm node1 stop > ccm node1 setdir -v 3.11.4 > ccm node1 start > ccm node2 stop > ccm node2 setdir -v 3.11.4 > ccm node2 start > # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to > # allow for simpler test setup) > cqlsh 127.0.0.3 < CONSISTENCY QUORUM; > PAGING 2; > select * from test.test; > QUERY > {noformat} > This results in: > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > (1 rows) > {noformat} > Running it against the upgraded node (node1): > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > 1 | there | hi > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807315#comment-16807315 ] Erik Swanson commented on CASSANDRA-15072: -- Muir's colleague here: I get 100% reproducibility for repeated queries with the following changes: # Create the keyspace with replication_factor 2 # Do the inserts with CONSISTENCY ALL # Upgrade the two nodes that contain data (node1, node3); keep the node that does not contain any sstables for test.test (node2) back at 2.1.17 After those steps, I get full results 100% of the time when querying node1 and node3, and truncated results 100% of the time when querying node2. This is using cqlsh as packaged with 3.11.4. > Incomplete range results during 2.X -> 3.11.4 upgrade > - > > Key: CASSANDRA-15072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15072 > Project: Cassandra > Issue Type: Bug >Reporter: Muir Manders >Priority: High > > Hello > During an upgrade from 2.1.17 to 3.11.4, our application starting getting > back incomplete results for range queries. When all nodes were upgraded > (before upgrading sstables), we stopped getting incomplete results. I was > able to reproduce it and listed steps below. It seems to require the random > partitioner and compact storage to reproduce reliably. It also reproduces > coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old > node is your coordinator and it has to talk to an upgraded replica. > {noformat} > ccm create test -v 2.1.17 -n 3 > ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' > ccm node1 updateconf 'initial_token: 0' > ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' > ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' > ccm start > ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE COLUMNFAMILY test.test ( > id text, > foo text, > bar text, > PRIMARY KEY (id) > ) WITH COMPACT STORAGE; > CONSISTENCY QUORUM; > INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); > INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); > SCHEMA > ccm node1 stop > ccm node1 setdir -v 3.11.4 > ccm node1 start > ccm node2 stop > ccm node2 setdir -v 3.11.4 > ccm node2 start > # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to > # allow for simpler test setup) > cqlsh 127.0.0.3 < CONSISTENCY QUORUM; > PAGING 2; > select * from test.test; > QUERY > {noformat} > This results in: > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > (1 rows) > {noformat} > Running it against the upgraded node (node1): > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > 1 | there | hi > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807227#comment-16807227 ] Muir Manders commented on CASSANDRA-15072: -- It seems with my updated steps that only the first query against node3 reproduces it. After that it returns both rows. If you restart node3, it reproduces it again for one query. This is not the behavior we experienced in production (i.e. the problem did not go away). I wonder if I have actually reproduced our issue or not... > Incomplete range results during 2.X -> 3.11.4 upgrade > - > > Key: CASSANDRA-15072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15072 > Project: Cassandra > Issue Type: Bug >Reporter: Muir Manders >Priority: High > > Hello > During an upgrade from 2.1.17 to 3.11.4, our application starting getting > back incomplete results for range queries. When all nodes were upgraded > (before upgrading sstables), we stopped getting incomplete results. I was > able to reproduce it and listed steps below. It seems to require the random > partitioner and compact storage to reproduce reliably. It also reproduces > coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old > node is your coordinator and it has to talk to an upgraded replica. > {noformat} > ccm create test -v 2.1.17 -n 3 > ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' > ccm node1 updateconf 'initial_token: 0' > ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' > ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' > ccm start > ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE COLUMNFAMILY test.test ( > id text, > foo text, > bar text, > PRIMARY KEY (id) > ) WITH COMPACT STORAGE; > CONSISTENCY QUORUM; > INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); > INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); > SCHEMA > ccm node1 stop > ccm node1 setdir -v 3.11.4 > ccm node1 start > ccm node2 stop > ccm node2 setdir -v 3.11.4 > ccm node2 start > # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to > # allow for simpler test setup) > cqlsh 127.0.0.3 < CONSISTENCY QUORUM; > PAGING 2; > select * from test.test; > QUERY > {noformat} > This results in: > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > (1 rows) > {noformat} > Running it against the upgraded node (node1): > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > 1 | there | hi > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807221#comment-16807221 ] Muir Manders commented on CASSANDRA-15072: -- Yes, we saw a lot of incomplete results in a real cluster. We read and write at quorum. Oops, you are right about my repro. I modified the steps to reproduce it at quorum (I upgraded two out of three nodes instead of just one, changed the reads/writes to be quorum, and connected to node 3 to perform the reproduction query). > Incomplete range results during 2.X -> 3.11.4 upgrade > - > > Key: CASSANDRA-15072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15072 > Project: Cassandra > Issue Type: Bug >Reporter: Muir Manders >Priority: High > > Hello > During an upgrade from 2.1.17 to 3.11.4, our application starting getting > back incomplete results for range queries. When all nodes were upgraded > (before upgrading sstables), we stopped getting incomplete results. I was > able to reproduce it and listed steps below. It seems to require the random > partitioner and compact storage to reproduce reliably. It also reproduces > coming from 2.1.21 and 2.2.14. You seem to get the bad behavior when an old > node is your coordinator and it has to talk to an upgraded replica. > {noformat} > ccm create test -v 2.1.17 -n 3 > ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' > ccm node1 updateconf 'initial_token: 0' > ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' > ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' > ccm start > ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE COLUMNFAMILY test.test ( > id text, > foo text, > bar text, > PRIMARY KEY (id) > ) WITH COMPACT STORAGE; > CONSISTENCY QUORUM; > INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); > INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); > SCHEMA > ccm node1 stop > ccm node1 setdir -v 3.11.4 > ccm node1 start > ccm node2 stop > ccm node2 setdir -v 3.11.4 > ccm node2 start > # here I use 3.X cqlsh to connect to 2.X node so I can lower the page size (to > # allow for simpler test setup) > cqlsh 127.0.0.3 < CONSISTENCY QUORUM; > PAGING 2; > select * from test.test; > QUERY > {noformat} > This results in: > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > (1 rows) > {noformat} > Running it against the upgraded node (node1): > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > 1 | there | hi > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15072) Incomplete range results during 2.X -> 3.11.4 upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-15072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807185#comment-16807185 ] Blake Eggleston commented on CASSANDRA-15072: - Are you seeing incomplete results like this in a real cluster? If so, what consistency level are you reading and writing at? The ccm script you have here _does_ return incomplete results, but it’s also writing and reading at CL ONE (the cqlsh default), so that’s not unexpected. I modified the script here to read and write at QUORUM, and haven't gotten any incomplete results. > Incomplete range results during 2.X -> 3.11.4 upgrade > - > > Key: CASSANDRA-15072 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15072 > Project: Cassandra > Issue Type: Bug >Reporter: Muir Manders >Priority: High > > Hello > During an upgrade from 2.1.17 to 3.11.4, our application starting getting > back incomplete results for range queries. When all nodes were upgraded > (before upgrading sstables), we stopped getting incomplete results. I was > able to reproduce it and listed steps below. It seems to require the random > partitioner and compact storage to reproduce reliably. It also reproduces > coming from 2.1.21 and 2.2.14. > {noformat} > ccm create test -v 2.1.17 -n 3 > ccm updateconf 'partitioner: org.apache.cassandra.dht.RandomPartitioner' > ccm node1 updateconf 'initial_token: 0' > ccm node2 updateconf 'initial_token: 56713727820156410577229101238628035242' > ccm node3 updateconf 'initial_token: 113427455640312821154458202477256070484' > ccm start > ccm node1 cqlsh < CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > CREATE COLUMNFAMILY test.test ( > id text, > foo text, > bar text, > PRIMARY KEY (id) > ) WITH COMPACT STORAGE; > INSERT INTO test.test (id, foo, bar) values ('1', 'hi', 'there'); > INSERT INTO test.test (id, foo, bar) values ('2', 'hi', 'there'); > SCHEMA > ccm node1 stop > ccm node1 setdir -v 3.11.4 > ccm node1 start > # need to use new cqlsh so we can configure page size > cqlsh 127.0.0.2 < PAGING 2; > select * from test.test; > QUERY > {noformat} > This results in: > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > (1 rows) > {noformat} > Running it against the upgraded node (node1): > {noformat} > Page size: 2 > id | bar | foo > +---+- > 2 | there | hi > 1 | there | hi > (2 rows) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org