Re: Inconsistent results with Quorum at different times

2016-09-19 Thread Alain RODRIGUEZ
Hi Jaydeep.


> Now when I read using quorum then sometimes it returns data D1 and
> sometimes it returns empty results. After tracing I found that when N1 and
> N2 are chosen then we get empty data, when (N1/N2) and N3 are chosen then
> D1 data is returned.


This is an acceptable situation (ie. a node might not have received the
delete) and the inconsistencies are not supposed to happen when reading at
quorum indeed. If tombstone is younger than the Data and you read data +
tombstone, Cassandra should return empty.

Does your tracing confirm you are actually reading using QUORUM?

N3:
> SSTable: Partition key K1 is valid and has data D1 with lower time-stamp
> T1 (T1 < T2)
>

Have you checked the timestamps using an sstable2json / sstabledump ?
Sometimes clock drifts can have this kind of weird effects and might have
produced a T1 > T2.

Can you reproduce it easily? If so, this would indeed deserve some
attention and possibly a JIRA. What is expected is the Last Write Wins
algorithm to apply (LWW), in your situation it should guarantee you
consistency as long as you have the tombstones on the 2 other nodes, so at
least for 10 days (default). After that, consistency will depend if the
tombstone made its way to the last node as well or not, in which case
Zombie data would reappear, as mentioned by Jaydeep.

I wrote a detailed blog post about tombstone and consistency issues it
might be useful. I think your understanding is correct though.

thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html.

C*heers,
---
Alain Rodriguez - @arodream - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-09-17 1:46 GMT+02:00 Nicolas Douillet :

> Hi Jaydeep,
>
> Yes, dealing with tombstones in Cassandra is very tricky.
>
> Cassandra keeps tombstones to mark deleted columns and distribute (hinted
> handoff, full repair, read repair ...) to the other nodes that missed the
> initial remove request. But Cassandra can't afford to keep those
> tombstones lifetime and has to wipe them. The tradeoff is that after a
> time, GCGraceSeconds, configured on each column family, the tombstones are
> fully dropped during compactions and are not distributed to the other nodes
> anymore.
> If one node didn't have the chance to receive this tombstone during this
> period, and kept and old column value, then the deleted column will
> reappear.
>
> So I guess in your case that the time T2 is older than this GCGraceSeconds
> ?
>
> The best way to avoid all those phantom columns to come back from death is
> to run a full repair on your cluster at least once every GCGraceSeconds.
> Did you try this?
>
> --
> Nicolas
>
>
> Le sam. 17 sept. 2016 à 00:05, Jaydeep Chovatia <
> chovatia.jayd...@gmail.com> a écrit :
>
>> Hi,
>>
>> We have three node (N1, N2, N3) cluster (RF=3) and data in SSTable as
>> following:
>>
>> N1:
>> SSTable: Partition key K1 is marked as tombstone at time T2
>>
>> N2:
>> SSTable: Partition key K1 is marked as tombstone at time T2
>>
>> N3:
>> SSTable: Partition key K1 is valid and has data D1 with lower time-stamp
>> T1 (T1 < T2)
>>
>>
>> Now when I read using quorum then sometimes it returns data D1 and
>> sometimes it returns empty results. After tracing I found that when N1 and
>> N2 are chosen then we get empty data, when (N1/N2) and N3 are chosen then
>> D1 data is returned.
>>
>> My point is when we read with Quorum then our results have to be
>> consistent, here same query give different results at different times.
>>
>> Isn't this a big problem with Cassandra @QUORUM (with tombstone)?
>>
>>
>> Thanks,
>> Jaydeep
>>
>


Re: Inconsistent results with Quorum at different times

2016-09-16 Thread Nicolas Douillet
Hi Jaydeep,

Yes, dealing with tombstones in Cassandra is very tricky.

Cassandra keeps tombstones to mark deleted columns and distribute (hinted
handoff, full repair, read repair ...) to the other nodes that missed the
initial remove request. But Cassandra can't afford to keep those tombstones
lifetime and has to wipe them. The tradeoff is that after a time,
GCGraceSeconds, configured on each column family, the tombstones are fully
dropped during compactions and are not distributed to the other nodes
anymore.
If one node didn't have the chance to receive this tombstone during this
period, and kept and old column value, then the deleted column will
reappear.

So I guess in your case that the time T2 is older than this GCGraceSeconds ?

The best way to avoid all those phantom columns to come back from death is
to run a full repair on your cluster at least once every GCGraceSeconds.
Did you try this?

--
Nicolas


Le sam. 17 sept. 2016 à 00:05, Jaydeep Chovatia 
a écrit :

> Hi,
>
> We have three node (N1, N2, N3) cluster (RF=3) and data in SSTable as
> following:
>
> N1:
> SSTable: Partition key K1 is marked as tombstone at time T2
>
> N2:
> SSTable: Partition key K1 is marked as tombstone at time T2
>
> N3:
> SSTable: Partition key K1 is valid and has data D1 with lower time-stamp
> T1 (T1 < T2)
>
>
> Now when I read using quorum then sometimes it returns data D1 and
> sometimes it returns empty results. After tracing I found that when N1 and
> N2 are chosen then we get empty data, when (N1/N2) and N3 are chosen then
> D1 data is returned.
>
> My point is when we read with Quorum then our results have to be
> consistent, here same query give different results at different times.
>
> Isn't this a big problem with Cassandra @QUORUM (with tombstone)?
>
>
> Thanks,
> Jaydeep
>