Re: Inconsistent results after restore with Cassandra 3.11.1
Cql select queries are returning 0 rows even though the data is actually available in sstables. But when i load/restore the same data with sstable loader data can be queried without any issues. Am using network topology strategy for all keyspace. Thanks On Fri, 15 Mar 2019 at 12:11 PM, Rahul Singh wrote: > Can you define "inconsistent" results.. ? What's the topology of the > cluster? What were you expecting and what did you get? > > On Thu, Mar 14, 2019 at 7:09 AM sandeep nethi > wrote: > >> Hello, >> >> Does anyone experience inconsistent results after restoring Cassandra >> 3.11.1 with refresh command? Was there any bug in this version of >> cassandra?? >> >> Thanks in advance. >> >> Regards, >> Sandeep >> >
Re: Inconsistent results after restore with Cassandra 3.11.1
Can you define "inconsistent" results.. ? What's the topology of the cluster? What were you expecting and what did you get? On Thu, Mar 14, 2019 at 7:09 AM sandeep nethi wrote: > Hello, > > Does anyone experience inconsistent results after restoring Cassandra > 3.11.1 with refresh command? Was there any bug in this version of > cassandra?? > > Thanks in advance. > > Regards, > Sandeep >
Inconsistent results after restore with Cassandra 3.11.1
Hello, Does anyone experience inconsistent results after restoring Cassandra 3.11.1 with refresh command? Was there any bug in this version of cassandra?? Thanks in advance. Regards, Sandeep
Re: Very odd & inconsistent results from SASI query
Apologies for the stream-of-consciousness replies, but are the dropped message stats output by tpstats an accumulation since the node came up, or are there processes which clear and/or time-out the info? On Mon, Mar 20, 2017 at 3:18 PM, Voytek Jarnotwrote: > No dropped messages in tpstats on any of the nodes. > > On Mon, Mar 20, 2017 at 3:11 PM, Voytek Jarnot > wrote: > >> Appreciate the reply, Kurt. >> >> I sanitized it out of the traces, but all trace outputs listed the same >> node for all three queries (1 working, 2 not working). Read repair chance >> set to 0.0 as recommended when using TWCS. >> >> I'll check tpstats - in this environment, load is not an issue, but >> network issues may be. >> >> On Mon, Mar 20, 2017 at 2:42 PM, kurt greaves >> wrote: >> >>> As secondary indexes are stored individually on each node what you're >>> suggesting sounds exactly like a consistency issue. the fact that you read >>> 0 cells on one query implies the node that got the query did not have any >>> data for the row. The reason you would sometimes see different behaviours >>> is likely because of read repairs. The fact that the repair guides the >>> issue pretty much guarantees it's a consistency issue. >>> >>> You should check for dropped mutations in tpstats/logs and if they are >>> occurring try and stop that from happening (probably load related). You >>> could also try performing reads and writes at LOCAL_QUORUM for stronger >>> consistency, however note this has a performance/latency impact. >>> >>> >>> >> >
Re: Very odd & inconsistent results from SASI query
No dropped messages in tpstats on any of the nodes. On Mon, Mar 20, 2017 at 3:11 PM, Voytek Jarnotwrote: > Appreciate the reply, Kurt. > > I sanitized it out of the traces, but all trace outputs listed the same > node for all three queries (1 working, 2 not working). Read repair chance > set to 0.0 as recommended when using TWCS. > > I'll check tpstats - in this environment, load is not an issue, but > network issues may be. > > On Mon, Mar 20, 2017 at 2:42 PM, kurt greaves > wrote: > >> As secondary indexes are stored individually on each node what you're >> suggesting sounds exactly like a consistency issue. the fact that you read >> 0 cells on one query implies the node that got the query did not have any >> data for the row. The reason you would sometimes see different behaviours >> is likely because of read repairs. The fact that the repair guides the >> issue pretty much guarantees it's a consistency issue. >> >> You should check for dropped mutations in tpstats/logs and if they are >> occurring try and stop that from happening (probably load related). You >> could also try performing reads and writes at LOCAL_QUORUM for stronger >> consistency, however note this has a performance/latency impact. >> >> >> >
Re: Very odd & inconsistent results from SASI query
Appreciate the reply, Kurt. I sanitized it out of the traces, but all trace outputs listed the same node for all three queries (1 working, 2 not working). Read repair chance set to 0.0 as recommended when using TWCS. I'll check tpstats - in this environment, load is not an issue, but network issues may be. On Mon, Mar 20, 2017 at 2:42 PM, kurt greaveswrote: > As secondary indexes are stored individually on each node what you're > suggesting sounds exactly like a consistency issue. the fact that you read > 0 cells on one query implies the node that got the query did not have any > data for the row. The reason you would sometimes see different behaviours > is likely because of read repairs. The fact that the repair guides the > issue pretty much guarantees it's a consistency issue. > > You should check for dropped mutations in tpstats/logs and if they are > occurring try and stop that from happening (probably load related). You > could also try performing reads and writes at LOCAL_QUORUM for stronger > consistency, however note this has a performance/latency impact. > > >
Re: Very odd & inconsistent results from SASI query
As secondary indexes are stored individually on each node what you're suggesting sounds exactly like a consistency issue. the fact that you read 0 cells on one query implies the node that got the query did not have any data for the row. The reason you would sometimes see different behaviours is likely because of read repairs. The fact that the repair guides the issue pretty much guarantees it's a consistency issue. You should check for dropped mutations in tpstats/logs and if they are occurring try and stop that from happening (probably load related). You could also try performing reads and writes at LOCAL_QUORUM for stronger consistency, however note this has a performance/latency impact.
Re: Very odd & inconsistent results from SASI query
A wrinkle further confounds the issue: running a repair on the node which was servicing the queries has cleared things up and all the queries now work. That doesn't make a whole lot of sense to me - my assumption was that a repair shouldn't have fixed it. On Fri, Mar 17, 2017 at 12:03 PM, Voytek Jarnotwrote: > Cassandra 3.9, 4 nodes, rf=3 > > Hi folks, we're see 0 results returned from queries that (a) should return > results, and (b) will return results with minor tweaks. > > I've attached the sanitized trace outputs for the following 3 queries (pk1 > and pk2 are partition keys, ck1 is clustering key, val1 is SASI indexed > non-key column): > > Q1: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11 AND ck1 >= > '2017-03-16' AND ck1 <= '2017-03-17' AND val1 LIKE 'abcdefgh%' LIMIT 1001 > ALLOW FILTERING; > Q1 works - it returns a list of records, one of which has > val1='abcdefghijklmn'. > > Q2: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11 AND ck1 >= > '2017-03-16' AND ck1 <= '2017-03-17' AND val1 LIKE 'abcdefghi%' LIMIT > 1001 ALLOW FILTERING; > Q2 does not work - 0 results returned. Only difference to Q1 is one > additional character provided in LIKE comparison. > > Q3: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11 AND ck1 >= > '2017-03-16' AND ck2 <= '2017-03-17' AND val1 = 'abcdefghijklmn' LIMIT > 1001 ALLOW FILTERING; > Q3 does not work - 0 results returned. > > As I've written above, the data set *does* include a record with > val1='abcdefghijklmn'. > > Confounding the issue is that this behavior is inconsistent. For > different values of val1, I'll have scenarios where Q3 works, but Q1 and Q2 > do not. Now, that particular behavior I could explain with index/like > problems, but it is Q3 that sometimes does not work and that's a simply > equality comparison (although still using the index). > > Further confounding the issue is that if my testers run these same queries > with the same parameters tomorrow, they're likely to work correctly. > > Only thing I've been able to glean from tracing execution is that the > queries that work follow "Executing read..." with "Executing single > partition query on t1" and so forth, whereas the queries that don't work > simply follow "Executing read..." with "Read 0 live and 0 tombstone cells" > with no actual work seemingly done. But that's not helping me narrow this > down much. > > Thanks for your time - appreciate any help. >
Very odd & inconsistent results from SASI query
Cassandra 3.9, 4 nodes, rf=3 Hi folks, we're see 0 results returned from queries that (a) should return results, and (b) will return results with minor tweaks. I've attached the sanitized trace outputs for the following 3 queries (pk1 and pk2 are partition keys, ck1 is clustering key, val1 is SASI indexed non-key column): Q1: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11 AND ck1 >= '2017-03-16' AND ck1 <= '2017-03-17' AND val1 LIKE 'abcdefgh%' LIMIT 1001 ALLOW FILTERING; Q1 works - it returns a list of records, one of which has val1='abcdefghijklmn'. Q2: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11 AND ck1 >= '2017-03-16' AND ck1 <= '2017-03-17' AND val1 LIKE 'abcdefghi%' LIMIT 1001 ALLOW FILTERING; Q2 does not work - 0 results returned. Only difference to Q1 is one additional character provided in LIKE comparison. Q3: SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11 AND ck1 >= '2017-03-16' AND ck2 <= '2017-03-17' AND val1 = 'abcdefghijklmn' LIMIT 1001 ALLOW FILTERING; Q3 does not work - 0 results returned. As I've written above, the data set *does* include a record with val1='abcdefghijklmn'. Confounding the issue is that this behavior is inconsistent. For different values of val1, I'll have scenarios where Q3 works, but Q1 and Q2 do not. Now, that particular behavior I could explain with index/like problems, but it is Q3 that sometimes does not work and that's a simply equality comparison (although still using the index). Further confounding the issue is that if my testers run these same queries with the same parameters tomorrow, they're likely to work correctly. Only thing I've been able to glean from tracing execution is that the queries that work follow "Executing read..." with "Executing single partition query on t1" and so forth, whereas the queries that don't work simply follow "Executing read..." with "Read 0 live and 0 tombstone cells" with no actual work seemingly done. But that's not helping me narrow this down much. Thanks for your time - appreciate any help. Results found query (which include record where val='abcdefghijklmn'): Parsing SELECT * FROM t1 WHERE pk1 = 2017 AND pk2 = 11 AND ck1 >= '2017-03-16' AND ck1 <= '2017-03-17' AND val1 LIKE 'abcdefgh%' LIMIT 1001 ALLOW FILTERING; [Native-Transport-Requests-1] Preparing statement [Native-Transport-Requests-1] Index mean cardinalities are idx_my_idx:-9223372036854775808. Scanning with idx_my_idx. [Native-Transport-Requests-1] Computing ranges to query [Native-Transport-Requests-1] Submitting range requests on 1 ranges with a concurrency of 1 (-1.08086395E16 rows per range expected) [Native-Transport-Requests-1] Submitted 1 concurrent range requests [Native-Transport-Requests-1] Executing read on keyspace.t1 using index idx_my_idx [ReadStage-2] Executing single-partition query on t1 [ReadStage-2] Acquiring sstable references [ReadStage-2] Key cache hit for sstable 2223 [ReadStage-2] Skipped 34/35 non-slice-intersecting sstables, included 1 due to tombstones [ReadStage-2] Key cache hit for sstable 2221 [ReadStage-2] Merged data from memtables and 2 sstables [ReadStage-2] Read 1 live and 0 tombstone cells [ReadStage-2]
Re: inconsistent results
Change your consistency levels in the cqlsh shell while you query, from ONE to QUORUM to ALL. If you see your results change that's a consistency issue. (Assuming these are simple inserts, if there's deletes, potentially update collections, etc. in the mix then things get a bit more complex.) To diagnose why the issue exists, a helpful metric are the various dropped messages metrics from nodetool tpstats. Overloaded clusters will experience consistency issues as a result of dropped mutations. It's helpful to think of things in terms of guarantees. If you write with CL=ONE or LOCAL_ONE, you're getting exactly one guaranteed write. In a healthy system with tons of excess capacity, you will likely see much better consistency than that; the hint system will replicate the write to other nodes, which will perform the write if they can. Since it appears you're seeing inconsistency at CL=ONE, plus timeouts at CL=QUORUM, it's quite likely your cluster is not capable of keeping up with the consistency level you require. Why your cluster is overloaded is another question entirely, but if you discover that's the case in my experience the most common cases are excessive GC due to bad heap settings and data model issues that cause massive partitions. On Tue, Feb 14, 2017 at 2:03 PM, Josh Englandwrote: > I suspect this is true, but it has proven to be significantly harder to > track down. Either cassandra is tickling some bug that nothing else does > or something strange is going on internally. On an otherwise quiet system, > I'd see instant results most of the time intermixed with queries (reads) > that would timeout and fail. I agree this needs to be addressed but I'd > like to understand what is currently going on with my queries. If it is > thought to be a consistency problem, how can that be verified? > > -JE > > > On Tue, Feb 14, 2017 at 1:46 PM, Jonathan Haddad > wrote: > >> If you're getting a lot of timeouts you will almost certainly end up with >> consistency issues. You're going to need to fix the root cause, your >> cluster instability, or this sort of issue will be commonplace. >> >> >> On Tue, Feb 14, 2017 at 1:43 PM Josh England wrote: >> >>> I'll try it the repair. Using quorum tends to lead to too many timeout >>> problems though. :( >>> >>> -JE >>> >>> >>> On Tue, Feb 14, 2017 at 1:39 PM, Oskar Kjellin >>> wrote: >>> >>> Repair might help. But you will end up in this situation again unless >>> you read/write using quorum (may be local) >>> >>> Sent from my iPhone >>> >>> On 14 Feb 2017, at 22:37, Josh England wrote: >>> >>> All client interactions are from python (python-driver 3.7.1) using >>> default consistency (LOCAL_ONE I think). Should I try repairing all nodes >>> to make sure all data is consistent? >>> >>> -JE >>> >>> >>> On Tue, Feb 14, 2017 at 1:32 PM, Oskar Kjellin >>> wrote: >>> >>> What consistency levels are you using for reads/writes? >>> >>> Sent from my iPhone >>> >>> > On 14 Feb 2017, at 22:27, Josh England wrote: >>> > >>> > I'm running Cassandra 3.9 on CentOS 6.7 in a 6-node cluster. I've got >>> a situation where the same query sometimes returns 2 records (correct), and >>> sometimes only returns 1 record (incorrect). I've ruled out the >>> application and the indexing since this is reproducible directly from a >>> cqlsh shell with a simple select statement. What is the best way to debug >>> what is happening here? >>> > >>> > -JE >>> > >>> >>> >>> >>> >
Re: inconsistent results
I suspect this is true, but it has proven to be significantly harder to track down. Either cassandra is tickling some bug that nothing else does or something strange is going on internally. On an otherwise quiet system, I'd see instant results most of the time intermixed with queries (reads) that would timeout and fail. I agree this needs to be addressed but I'd like to understand what is currently going on with my queries. If it is thought to be a consistency problem, how can that be verified? -JE On Tue, Feb 14, 2017 at 1:46 PM, Jonathan Haddadwrote: > If you're getting a lot of timeouts you will almost certainly end up with > consistency issues. You're going to need to fix the root cause, your > cluster instability, or this sort of issue will be commonplace. > > > On Tue, Feb 14, 2017 at 1:43 PM Josh England wrote: > >> I'll try it the repair. Using quorum tends to lead to too many timeout >> problems though. :( >> >> -JE >> >> >> On Tue, Feb 14, 2017 at 1:39 PM, Oskar Kjellin >> wrote: >> >> Repair might help. But you will end up in this situation again unless you >> read/write using quorum (may be local) >> >> Sent from my iPhone >> >> On 14 Feb 2017, at 22:37, Josh England wrote: >> >> All client interactions are from python (python-driver 3.7.1) using >> default consistency (LOCAL_ONE I think). Should I try repairing all nodes >> to make sure all data is consistent? >> >> -JE >> >> >> On Tue, Feb 14, 2017 at 1:32 PM, Oskar Kjellin >> wrote: >> >> What consistency levels are you using for reads/writes? >> >> Sent from my iPhone >> >> > On 14 Feb 2017, at 22:27, Josh England wrote: >> > >> > I'm running Cassandra 3.9 on CentOS 6.7 in a 6-node cluster. I've got >> a situation where the same query sometimes returns 2 records (correct), and >> sometimes only returns 1 record (incorrect). I've ruled out the >> application and the indexing since this is reproducible directly from a >> cqlsh shell with a simple select statement. What is the best way to debug >> what is happening here? >> > >> > -JE >> > >> >> >> >>
Re: inconsistent results
I'm sorry, yes. The primary key is (foo_prefix, foo), with foo_prefix being the partition key. The query is: select * from table WHERE foo_prefix='blah'; -JE
Re: inconsistent results
Well, if it's the primary key there should only ever be one result. Is this the partition key and you also have a clustering key? On Tue, Feb 14, 2017 at 1:43 PM Josh Englandwrote: > Super simple: > select * from table WHERE primary_key='foo'; > > -JE > > > On Tue, Feb 14, 2017 at 1:38 PM, sfesc...@gmail.com > wrote: > > What is your query? I've seen this once when using secondary indices as it > has to reach out to all nodes for the answer. If a node doesn't respond in > time those records seem to get dropped. > > On Tue, Feb 14, 2017 at 1:37 PM Josh England wrote: > > All client interactions are from python (python-driver 3.7.1) using > default consistency (LOCAL_ONE I think). Should I try repairing all nodes > to make sure all data is consistent? > > -JE > > > On Tue, Feb 14, 2017 at 1:32 PM, Oskar Kjellin > wrote: > > What consistency levels are you using for reads/writes? > > Sent from my iPhone > > > On 14 Feb 2017, at 22:27, Josh England wrote: > > > > I'm running Cassandra 3.9 on CentOS 6.7 in a 6-node cluster. I've got a > situation where the same query sometimes returns 2 records (correct), and > sometimes only returns 1 record (incorrect). I've ruled out the > application and the indexing since this is reproducible directly from a > cqlsh shell with a simple select statement. What is the best way to debug > what is happening here? > > > > -JE > > > > > >
Re: inconsistent results
If you're getting a lot of timeouts you will almost certainly end up with consistency issues. You're going to need to fix the root cause, your cluster instability, or this sort of issue will be commonplace. On Tue, Feb 14, 2017 at 1:43 PM Josh Englandwrote: > I'll try it the repair. Using quorum tends to lead to too many timeout > problems though. :( > > -JE > > > On Tue, Feb 14, 2017 at 1:39 PM, Oskar Kjellin > wrote: > > Repair might help. But you will end up in this situation again unless you > read/write using quorum (may be local) > > Sent from my iPhone > > On 14 Feb 2017, at 22:37, Josh England wrote: > > All client interactions are from python (python-driver 3.7.1) using > default consistency (LOCAL_ONE I think). Should I try repairing all nodes > to make sure all data is consistent? > > -JE > > > On Tue, Feb 14, 2017 at 1:32 PM, Oskar Kjellin > wrote: > > What consistency levels are you using for reads/writes? > > Sent from my iPhone > > > On 14 Feb 2017, at 22:27, Josh England wrote: > > > > I'm running Cassandra 3.9 on CentOS 6.7 in a 6-node cluster. I've got a > situation where the same query sometimes returns 2 records (correct), and > sometimes only returns 1 record (incorrect). I've ruled out the > application and the indexing since this is reproducible directly from a > cqlsh shell with a simple select statement. What is the best way to debug > what is happening here? > > > > -JE > > > > > >
Re: inconsistent results
I'll try it the repair. Using quorum tends to lead to too many timeout problems though. :( -JE On Tue, Feb 14, 2017 at 1:39 PM, Oskar Kjellinwrote: > Repair might help. But you will end up in this situation again unless you > read/write using quorum (may be local) > > Sent from my iPhone > > On 14 Feb 2017, at 22:37, Josh England wrote: > > All client interactions are from python (python-driver 3.7.1) using > default consistency (LOCAL_ONE I think). Should I try repairing all nodes > to make sure all data is consistent? > > -JE > > > On Tue, Feb 14, 2017 at 1:32 PM, Oskar Kjellin > wrote: > >> What consistency levels are you using for reads/writes? >> >> Sent from my iPhone >> >> > On 14 Feb 2017, at 22:27, Josh England wrote: >> > >> > I'm running Cassandra 3.9 on CentOS 6.7 in a 6-node cluster. I've got >> a situation where the same query sometimes returns 2 records (correct), and >> sometimes only returns 1 record (incorrect). I've ruled out the >> application and the indexing since this is reproducible directly from a >> cqlsh shell with a simple select statement. What is the best way to debug >> what is happening here? >> > >> > -JE >> > >> > >
Re: inconsistent results
Super simple: select * from table WHERE primary_key='foo'; -JE On Tue, Feb 14, 2017 at 1:38 PM, sfesc...@gmail.comwrote: > What is your query? I've seen this once when using secondary indices as it > has to reach out to all nodes for the answer. If a node doesn't respond in > time those records seem to get dropped. > > On Tue, Feb 14, 2017 at 1:37 PM Josh England wrote: > >> All client interactions are from python (python-driver 3.7.1) using >> default consistency (LOCAL_ONE I think). Should I try repairing all nodes >> to make sure all data is consistent? >> >> -JE >> >> >> On Tue, Feb 14, 2017 at 1:32 PM, Oskar Kjellin >> wrote: >> >> What consistency levels are you using for reads/writes? >> >> Sent from my iPhone >> >> > On 14 Feb 2017, at 22:27, Josh England wrote: >> > >> > I'm running Cassandra 3.9 on CentOS 6.7 in a 6-node cluster. I've got >> a situation where the same query sometimes returns 2 records (correct), and >> sometimes only returns 1 record (incorrect). I've ruled out the >> application and the indexing since this is reproducible directly from a >> cqlsh shell with a simple select statement. What is the best way to debug >> what is happening here? >> > >> > -JE >> > >> >> >>
Re: inconsistent results
Repair might help. But you will end up in this situation again unless you read/write using quorum (may be local) Sent from my iPhone > On 14 Feb 2017, at 22:37, Josh Englandwrote: > > All client interactions are from python (python-driver 3.7.1) using default > consistency (LOCAL_ONE I think). Should I try repairing all nodes to make > sure all data is consistent? > > -JE > > >> On Tue, Feb 14, 2017 at 1:32 PM, Oskar Kjellin >> wrote: >> What consistency levels are you using for reads/writes? >> >> Sent from my iPhone >> >> > On 14 Feb 2017, at 22:27, Josh England wrote: >> > >> > I'm running Cassandra 3.9 on CentOS 6.7 in a 6-node cluster. I've got a >> > situation where the same query sometimes returns 2 records (correct), and >> > sometimes only returns 1 record (incorrect). I've ruled out the >> > application and the indexing since this is reproducible directly from a >> > cqlsh shell with a simple select statement. What is the best way to debug >> > what is happening here? >> > >> > -JE >> > >
Re: inconsistent results
What is your query? I've seen this once when using secondary indices as it has to reach out to all nodes for the answer. If a node doesn't respond in time those records seem to get dropped. On Tue, Feb 14, 2017 at 1:37 PM Josh Englandwrote: > All client interactions are from python (python-driver 3.7.1) using > default consistency (LOCAL_ONE I think). Should I try repairing all nodes > to make sure all data is consistent? > > -JE > > > On Tue, Feb 14, 2017 at 1:32 PM, Oskar Kjellin > wrote: > > What consistency levels are you using for reads/writes? > > Sent from my iPhone > > > On 14 Feb 2017, at 22:27, Josh England wrote: > > > > I'm running Cassandra 3.9 on CentOS 6.7 in a 6-node cluster. I've got a > situation where the same query sometimes returns 2 records (correct), and > sometimes only returns 1 record (incorrect). I've ruled out the > application and the indexing since this is reproducible directly from a > cqlsh shell with a simple select statement. What is the best way to debug > what is happening here? > > > > -JE > > > > >
Re: inconsistent results
All client interactions are from python (python-driver 3.7.1) using default consistency (LOCAL_ONE I think). Should I try repairing all nodes to make sure all data is consistent? -JE On Tue, Feb 14, 2017 at 1:32 PM, Oskar Kjellinwrote: > What consistency levels are you using for reads/writes? > > Sent from my iPhone > > > On 14 Feb 2017, at 22:27, Josh England wrote: > > > > I'm running Cassandra 3.9 on CentOS 6.7 in a 6-node cluster. I've got a > situation where the same query sometimes returns 2 records (correct), and > sometimes only returns 1 record (incorrect). I've ruled out the > application and the indexing since this is reproducible directly from a > cqlsh shell with a simple select statement. What is the best way to debug > what is happening here? > > > > -JE > > >
Re: inconsistent results
What consistency levels are you using for reads/writes? Sent from my iPhone > On 14 Feb 2017, at 22:27, Josh Englandwrote: > > I'm running Cassandra 3.9 on CentOS 6.7 in a 6-node cluster. I've got a > situation where the same query sometimes returns 2 records (correct), and > sometimes only returns 1 record (incorrect). I've ruled out the application > and the indexing since this is reproducible directly from a cqlsh shell with > a simple select statement. What is the best way to debug what is happening > here? > > -JE >
Re: Inconsistent results with Quorum at different times
Hi Jaydeep. > Now when I read using quorum then sometimes it returns data D1 and > sometimes it returns empty results. After tracing I found that when N1 and > N2 are chosen then we get empty data, when (N1/N2) and N3 are chosen then > D1 data is returned. This is an acceptable situation (ie. a node might not have received the delete) and the inconsistencies are not supposed to happen when reading at quorum indeed. If tombstone is younger than the Data and you read data + tombstone, Cassandra should return empty. Does your tracing confirm you are actually reading using QUORUM? N3: > SSTable: Partition key K1 is valid and has data D1 with lower time-stamp > T1 (T1 < T2) > Have you checked the timestamps using an sstable2json / sstabledump ? Sometimes clock drifts can have this kind of weird effects and might have produced a T1 > T2. Can you reproduce it easily? If so, this would indeed deserve some attention and possibly a JIRA. What is expected is the Last Write Wins algorithm to apply (LWW), in your situation it should guarantee you consistency as long as you have the tombstones on the 2 other nodes, so at least for 10 days (default). After that, consistency will depend if the tombstone made its way to the last node as well or not, in which case Zombie data would reappear, as mentioned by Jaydeep. I wrote a detailed blog post about tombstone and consistency issues it might be useful. I think your understanding is correct though. thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html. C*heers, --- Alain Rodriguez - @arodream - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com 2016-09-17 1:46 GMT+02:00 Nicolas Douillet: > Hi Jaydeep, > > Yes, dealing with tombstones in Cassandra is very tricky. > > Cassandra keeps tombstones to mark deleted columns and distribute (hinted > handoff, full repair, read repair ...) to the other nodes that missed the > initial remove request. But Cassandra can't afford to keep those > tombstones lifetime and has to wipe them. The tradeoff is that after a > time, GCGraceSeconds, configured on each column family, the tombstones are > fully dropped during compactions and are not distributed to the other nodes > anymore. > If one node didn't have the chance to receive this tombstone during this > period, and kept and old column value, then the deleted column will > reappear. > > So I guess in your case that the time T2 is older than this GCGraceSeconds > ? > > The best way to avoid all those phantom columns to come back from death is > to run a full repair on your cluster at least once every GCGraceSeconds. > Did you try this? > > -- > Nicolas > > > Le sam. 17 sept. 2016 à 00:05, Jaydeep Chovatia < > chovatia.jayd...@gmail.com> a écrit : > >> Hi, >> >> We have three node (N1, N2, N3) cluster (RF=3) and data in SSTable as >> following: >> >> N1: >> SSTable: Partition key K1 is marked as tombstone at time T2 >> >> N2: >> SSTable: Partition key K1 is marked as tombstone at time T2 >> >> N3: >> SSTable: Partition key K1 is valid and has data D1 with lower time-stamp >> T1 (T1 < T2) >> >> >> Now when I read using quorum then sometimes it returns data D1 and >> sometimes it returns empty results. After tracing I found that when N1 and >> N2 are chosen then we get empty data, when (N1/N2) and N3 are chosen then >> D1 data is returned. >> >> My point is when we read with Quorum then our results have to be >> consistent, here same query give different results at different times. >> >> Isn't this a big problem with Cassandra @QUORUM (with tombstone)? >> >> >> Thanks, >> Jaydeep >> >
Re: Inconsistent results with Quorum at different times
Hi Jaydeep, Yes, dealing with tombstones in Cassandra is very tricky. Cassandra keeps tombstones to mark deleted columns and distribute (hinted handoff, full repair, read repair ...) to the other nodes that missed the initial remove request. But Cassandra can't afford to keep those tombstones lifetime and has to wipe them. The tradeoff is that after a time, GCGraceSeconds, configured on each column family, the tombstones are fully dropped during compactions and are not distributed to the other nodes anymore. If one node didn't have the chance to receive this tombstone during this period, and kept and old column value, then the deleted column will reappear. So I guess in your case that the time T2 is older than this GCGraceSeconds ? The best way to avoid all those phantom columns to come back from death is to run a full repair on your cluster at least once every GCGraceSeconds. Did you try this? -- Nicolas Le sam. 17 sept. 2016 à 00:05, Jaydeep Chovatiaa écrit : > Hi, > > We have three node (N1, N2, N3) cluster (RF=3) and data in SSTable as > following: > > N1: > SSTable: Partition key K1 is marked as tombstone at time T2 > > N2: > SSTable: Partition key K1 is marked as tombstone at time T2 > > N3: > SSTable: Partition key K1 is valid and has data D1 with lower time-stamp > T1 (T1 < T2) > > > Now when I read using quorum then sometimes it returns data D1 and > sometimes it returns empty results. After tracing I found that when N1 and > N2 are chosen then we get empty data, when (N1/N2) and N3 are chosen then > D1 data is returned. > > My point is when we read with Quorum then our results have to be > consistent, here same query give different results at different times. > > Isn't this a big problem with Cassandra @QUORUM (with tombstone)? > > > Thanks, > Jaydeep >
Inconsistent results with Quorum at different times
Hi, We have three node (N1, N2, N3) cluster (RF=3) and data in SSTable as following: N1: SSTable: Partition key K1 is marked as tombstone at time T2 N2: SSTable: Partition key K1 is marked as tombstone at time T2 N3: SSTable: Partition key K1 is valid and has data D1 with lower time-stamp T1 (T1 < T2) Now when I read using quorum then sometimes it returns data D1 and sometimes it returns empty results. After tracing I found that when N1 and N2 are chosen then we get empty data, when (N1/N2) and N3 are chosen then D1 data is returned. My point is when we read with Quorum then our results have to be consistent, here same query give different results at different times. Isn't this a big problem with Cassandra @QUORUM (with tombstone)? Thanks, Jaydeep
Re: Cassandra CLI showing inconsistent results during gets
All inserts are at LOCAL_QUORUM DC1 I am confused because attempt-1 shows up the column, attempt-2 not found, attempt-3 again shows it up. These attempts were successive with no time delay from the same CLI!!! The data also is not tinkered with CUD operations from somewhere else during these times for sure. -- Ravi On Friday, June 27, 2014, Chris Lohfink clohf...@blackbirdit.com wrote: Where was the 09_09 column inserted from? Are you sure whatever did the insert is doing a local_quorum on the same DC the cli is in? It may return before all the nodes get response back (ie 2 of the 3 in local DC) which report not having the data. After all the nodes respond it will check the digests from all the responses, see theres an inconsistency and do a read repair. Which would explain it showing up following queries. Chris On Jun 26, 2014, at 10:06 AM, Ravikumar Govindarajan ravikumar.govindara...@gmail.com javascript:_e(%7B%7D,'cvml','ravikumar.govindara...@gmail.com'); wrote: I ran the following set of commands via CLI in our servers. There is a data-discrepancy that I encountered as below during gets... We are running 1.2.4 version with replication-factor=3 (DC1) 2 (DC2). Reads and writes are at LOCAL_QUORUM create column family TestCF with key_validation_class=AsciiType AND comparator = 'CompositeType(AsciiType,LongType)' AND compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64}; [default@Sample] consistencylevel AS LOCAL_QUORUM; Consistency level is set to 'LOCAL_QUORUM'. [default@Sample] get TestCF [ascii('17732218001')] [' *177322104550009_:177322104560008*']; = (column=177322104550009_:177322104560008, value=31373733323231303030303034353530303039, timestamp=1397743374931) Elapsed time: 8.64 msec(s). //Do a full row dump which shows the above column [default@Sample] get TestCF [ascii('17732218001')]; ... = (column=177322104547019_:177322104560001, value=31373733323231303030303034353437303139, timestamp=1397743139121) = (column=*177322104550009_:177322104560008*, value=31373733323231303030303034353530303039, timestamp=1397743374931) = (column=177322104560003_:177322104560005, value=31373733323231303030303034353630303033, timestamp=1397743323261) = (column=177322104562001_:177322104564003, value=31373733323231303030303034353632303031, timestamp=1397749523707) --- Returned 4771 results. Elapsed time: 518 msec(s). //Try again [default@Sample] get TestCF[ascii('17732218001')] [' *177322104550009_:177322104560008*']; = (column=177322104550009_:177322104560008, value=31373733323231303030303034353530303039, timestamp=1397743374931) Elapsed time: 8.03 msec(s). //Here CLI flipped showing value as not found [default@Sample] get TestCF[ascii('17732218001')] [' *177322104550009_:177322104550009*']; *Value was not found* Elapsed time: 12 msec(s). //Query again, it shows as value found [default@Sample] get TestCF[ascii('17732218001')] ['177322104550009_:177322104550009']; = (column=177322104550009_:177322104550009, value=31373733323231303030303034353530303039, timestamp=1397743374931) Elapsed time: 23 msec(s). Is this just limited to CLI bug or some-thing deeper is brewing? Our app faced a serious issue in code involving this query. Is it a known issue? Any help is much appreciated -- Ravi
Cassandra CLI showing inconsistent results during gets
I ran the following set of commands via CLI in our servers. There is a data-discrepancy that I encountered as below during gets... We are running 1.2.4 version with replication-factor=3 (DC1) 2 (DC2). Reads and writes are at LOCAL_QUORUM create column family TestCF with key_validation_class=AsciiType AND comparator = 'CompositeType(AsciiType,LongType)' AND compression_options={sstable_compression:SnappyCompressor, chunk_length_kb:64}; [default@Sample] consistencylevel AS LOCAL_QUORUM; Consistency level is set to 'LOCAL_QUORUM'. [default@Sample] get TestCF [ascii('17732218001')] [' *177322104550009_:177322104560008*']; = (column=177322104550009_:177322104560008, value=31373733323231303030303034353530303039, timestamp=1397743374931) Elapsed time: 8.64 msec(s). //Do a full row dump which shows the above column [default@Sample] get TestCF [ascii('17732218001')]; ... = (column=177322104547019_:177322104560001, value=31373733323231303030303034353437303139, timestamp=1397743139121) = (column=*177322104550009_:177322104560008*, value=31373733323231303030303034353530303039, timestamp=1397743374931) = (column=177322104560003_:177322104560005, value=31373733323231303030303034353630303033, timestamp=1397743323261) = (column=177322104562001_:177322104564003, value=31373733323231303030303034353632303031, timestamp=1397749523707) --- Returned 4771 results. Elapsed time: 518 msec(s). //Try again [default@Sample] get TestCF[ascii('17732218001')] [' *177322104550009_:177322104560008*']; = (column=177322104550009_:177322104560008, value=31373733323231303030303034353530303039, timestamp=1397743374931) Elapsed time: 8.03 msec(s). //Here CLI flipped showing value as not found [default@Sample] get TestCF[ascii('17732218001')] [' *177322104550009_:177322104550009*']; *Value was not found* Elapsed time: 12 msec(s). //Query again, it shows as value found [default@Sample] get TestCF[ascii('17732218001')] ['177322104550009_:177322104550009']; = (column=177322104550009_:177322104550009, value=31373733323231303030303034353530303039, timestamp=1397743374931) Elapsed time: 23 msec(s). Is this just limited to CLI bug or some-thing deeper is brewing? Our app faced a serious issue in code involving this query. Is it a known issue? Any help is much appreciated -- Ravi
Re: Inconsistent results using secondary indexes between two DC
2011/5/23 Jonathan Ellis jbel...@gmail.com: It was installed as 0.7.2 and upgraded with each new official release. I bet that's the problem, then. https://issues.apache.org/jira/browse/CASSANDRA-2244 could cause indexes to not be updated for releases 0.7.4. You'll want to rebuild the index. By the way - is it possible to force the rebuild of the secondary indexes from scratch? Yes, just remove the index definition from the column_metadata, then re-add it. (0.7.7 adds DROP INDEX to the cli.) Thanks a lot! After dropping an index and creating index on a column once again queries using secondary indexes work as expected now. -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- KosciaK mail: kosci...@gmail.com www : http://kosciak.net/ -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Re: Inconsistent results using secondary indexes between two DC
It was installed as 0.7.2 and upgraded with each new official release. As I wrote in another message in this thread, now nodes are upgraded to 0.7.6 but it still seems that one of the problematic nodes returns inconsistent data. By the way - is it possible to force the rebuild of the secondary indexes from scratch? 2011/5/20 Jonathan Ellis jbel...@gmail.com: Has this cluster always been on 0.7.5 or was it upgraded from an earlier version? -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- KosciaK mail: kosci...@gmail.com www : http://kosciak.net/ -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Re: Inconsistent results using secondary indexes between two DC
On Mon, May 23, 2011 at 5:47 AM, Wojciech Pietrzok kosci...@gmail.com wrote: It was installed as 0.7.2 and upgraded with each new official release. I bet that's the problem, then. https://issues.apache.org/jira/browse/CASSANDRA-2244 could cause indexes to not be updated for releases 0.7.4. You'll want to rebuild the index. By the way - is it possible to force the rebuild of the secondary indexes from scratch? Yes, just remove the index definition from the column_metadata, then re-add it. (0.7.7 adds DROP INDEX to the cli.) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Inconsistent results using secondary indexes between two DC
I've already tried running nodetool repair severail times before but it didn't seem to help. Now I've upgraded Cassandra to 0.7.6, run nodetool scrub, and nodetool repair (twice). One of the problematic nodes seems to return correct results now. But the second one still returns inconsistent data. 2011/5/19 mcasandra mohitanch...@gmail.com: I am wondering if running nodetool repair will help in anyway -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- KosciaK mail: kosci...@gmail.com www : http://kosciak.net/ -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Re: Inconsistent results using secondary indexes between two DC
Has this cluster always been on 0.7.5 or was it upgraded from an earlier version? On Thu, May 19, 2011 at 3:26 AM, Wojciech Pietrzok kosci...@gmail.com wrote: Just checked. Seems to be present in CF on all nodes (in both datacenters), but are not indexed correctly On each node I've used sstablekeys for all CF_NAME-f-XX-Data.db files. In cassandra-cli I've (using node that behaves correctly) made query get CF_NAME where foo = bar, got correct number of results. Checked using grep if all the keys are present in the lists returned by sstablekeys - none was missing, so it seems that the rows are present on all nodes. When doing the same query on the nodes in the second DC (using ConsistencyLevel.ONE) the results are invalid. Sometimes I got 15 rows (expected, correct number of rows), 3 rows, or 10 rows. What's interesting every time I get only 3 rows it's the same list of 3 rows on both affected nodes. 2011/5/17 Jonathan Ellis jbel...@gmail.com: Nothing comes to mind. I'd start by using sstable2json to see if the missing rows are in the main data CF -- i.e., are they just unindexed, or are they missing completely? On Sun, May 15, 2011 at 4:33 PM, Wojciech Pietrzok kosci...@gmail.com wrote: Hello, I've noticed strange behaviour of Cassandra when using secondary indexes. There are 2 Data Centers, each with 2 nodes, RF=4, on all nodes Cassandra 0.7.5 is installed. When I connect to one of the nodes in DC1 and perform query using secondary indexes (get ColumnFamily where column = 'foo' in cassandra-cli) I always get correct number of rows returned, no matter which ConsistencyLevel is set. When I connect to one of the nodes in DC2 and perform same query using ConsistencyLevel LOCAL_QUORUM the results are correct. But using ConsistencyLevel ONE Cassandra doesn't return correct number of rows (it seems that most of the times there some of the rows are missing). Tried running nodetool repair, and nodetool scrub but this doesn't seem to help. What might the cause of such behaviour? -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- KosciaK mail: kosci...@gmail.com www : http://kosciak.net/ -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Inconsistent results using secondary indexes between two DC
Just checked. Seems to be present in CF on all nodes (in both datacenters), but are not indexed correctly On each node I've used sstablekeys for all CF_NAME-f-XX-Data.db files. In cassandra-cli I've (using node that behaves correctly) made query get CF_NAME where foo = bar, got correct number of results. Checked using grep if all the keys are present in the lists returned by sstablekeys - none was missing, so it seems that the rows are present on all nodes. When doing the same query on the nodes in the second DC (using ConsistencyLevel.ONE) the results are invalid. Sometimes I got 15 rows (expected, correct number of rows), 3 rows, or 10 rows. What's interesting every time I get only 3 rows it's the same list of 3 rows on both affected nodes. 2011/5/17 Jonathan Ellis jbel...@gmail.com: Nothing comes to mind. I'd start by using sstable2json to see if the missing rows are in the main data CF -- i.e., are they just unindexed, or are they missing completely? On Sun, May 15, 2011 at 4:33 PM, Wojciech Pietrzok kosci...@gmail.com wrote: Hello, I've noticed strange behaviour of Cassandra when using secondary indexes. There are 2 Data Centers, each with 2 nodes, RF=4, on all nodes Cassandra 0.7.5 is installed. When I connect to one of the nodes in DC1 and perform query using secondary indexes (get ColumnFamily where column = 'foo' in cassandra-cli) I always get correct number of rows returned, no matter which ConsistencyLevel is set. When I connect to one of the nodes in DC2 and perform same query using ConsistencyLevel LOCAL_QUORUM the results are correct. But using ConsistencyLevel ONE Cassandra doesn't return correct number of rows (it seems that most of the times there some of the rows are missing). Tried running nodetool repair, and nodetool scrub but this doesn't seem to help. What might the cause of such behaviour? -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- KosciaK mail: kosci...@gmail.com www : http://kosciak.net/ -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Re: Inconsistent results using secondary indexes between two DC
I am wondering if running nodetool repair will help in anyway -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Inconsistent-results-using-secondary-indexes-between-two-DC-tp632p6382819.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Inconsistent results using secondary indexes between two DC
Nothing comes to mind. I'd start by using sstable2json to see if the missing rows are in the main data CF -- i.e., are they just unindexed, or are they missing completely? On Sun, May 15, 2011 at 4:33 PM, Wojciech Pietrzok kosci...@gmail.com wrote: Hello, I've noticed strange behaviour of Cassandra when using secondary indexes. There are 2 Data Centers, each with 2 nodes, RF=4, on all nodes Cassandra 0.7.5 is installed. When I connect to one of the nodes in DC1 and perform query using secondary indexes (get ColumnFamily where column = 'foo' in cassandra-cli) I always get correct number of rows returned, no matter which ConsistencyLevel is set. When I connect to one of the nodes in DC2 and perform same query using ConsistencyLevel LOCAL_QUORUM the results are correct. But using ConsistencyLevel ONE Cassandra doesn't return correct number of rows (it seems that most of the times there some of the rows are missing). Tried running nodetool repair, and nodetool scrub but this doesn't seem to help. What might the cause of such behaviour? -- KosciaK -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Inconsistent results using secondary indexes between two DC
Hello, I've noticed strange behaviour of Cassandra when using secondary indexes. There are 2 Data Centers, each with 2 nodes, RF=4, on all nodes Cassandra 0.7.5 is installed. When I connect to one of the nodes in DC1 and perform query using secondary indexes (get ColumnFamily where column = 'foo' in cassandra-cli) I always get correct number of rows returned, no matter which ConsistencyLevel is set. When I connect to one of the nodes in DC2 and perform same query using ConsistencyLevel LOCAL_QUORUM the results are correct. But using ConsistencyLevel ONE Cassandra doesn't return correct number of rows (it seems that most of the times there some of the rows are missing). Tried running nodetool repair, and nodetool scrub but this doesn't seem to help. What might the cause of such behaviour? -- KosciaK