Re: Cassandra Delete vs Update

2020-05-23 Thread Laxmikant Upadhyay
Thanks all for your answer.
Thanks Jeff for clarification. Only thing I could not get is how CAS(I
assume you r talking about compare and set) will help track the offset
consumed within partition. But I got an good idea what you r trying to
explain. Capping partition size and deleting by partition r two important
point to remember. Thanks for your help.

On Sat, May 23, 2020, 6:23 PM Jeff Jirsa  wrote:

>
> Using cassandra as a queue is possible if you really really understand the
> data model, but most people will do it wrong the first few times
>
> Cap your partition size. The times I’ve seen this done were near 10mb
> partitions and used a special hook into internals to track partition size
> via index offsets so they knew when to switch to the next partition.
> Don’t delete records, delete partitions.
> Maybe use CAS to know when to flip to the next partition.
> Maybe use CAS to track your consumed offset within a partition
> CQL Row level tombstones don’t matter in cassandra 3+ - they’re just point
> deletes after the storage engine rewrite.
>
> You’re still probably better off running Kafka in the spare cpu and memory
> you’d use for this. Understand it’s nontrivial to setup but it’s also
> nontrivial to do this properly.
>
>
>
> On May 23, 2020, at 9:26 AM, Laxmikant Upadhyay 
> wrote:
>
> 
> Thanks you so much  for quick response. I completely agree with Jeff and
> Gabor that it is an anti-pattern to build queue in Cassandra. But plan is
> to reuse the existing Cassandra infrastructure without any additional cost
> (like kafka).
> So even if the data is partioned properly (max 10mb per date ) ..so still
> it will be an issue if I read the partition only once a day ? Even with
> update status and don't delete the row?
>
> On Sat, May 23, 2020, 4:36 PM Gábor Auth  wrote:
>
>> Hi,
>>
>> On Sat, May 23, 2020 at 4:09 PM Laxmikant Upadhyay <
>> laxmikant@gmail.com> wrote:
>>
>>> I think that we should avoid tombstones specially row-level so should go
>>> with option-1. Kindly suggest on above or any other better approach ?
>>>
>>
>> Why don't you use a queue implementation, like AcitiveMQ, Kafka and
>> something? Cassandra is not suitable for this at all, it is anti-pattern in
>> the Cassandra world.
>>
>> --
>> Bye,
>> Auth Gábor (https://iotguru.cloud)
>>
>


Re: Cassandra Delete vs Update

2020-05-23 Thread Jeff Jirsa

Using cassandra as a queue is possible if you really really understand the data 
model, but most people will do it wrong the first few times 

Cap your partition size. The times I’ve seen this done were near 10mb 
partitions and used a special hook into internals to track partition size via 
index offsets so they knew when to switch to the next partition.
Don’t delete records, delete partitions. 
Maybe use CAS to know when to flip to the next partition.
Maybe use CAS to track your consumed offset within a partition 
CQL Row level tombstones don’t matter in cassandra 3+ - they’re just point 
deletes after the storage engine rewrite. 

You’re still probably better off running Kafka in the spare cpu and memory 
you’d use for this. Understand it’s nontrivial to setup but it’s also 
nontrivial to do this properly. 



> On May 23, 2020, at 9:26 AM, Laxmikant Upadhyay  
> wrote:
> 
> 
> Thanks you so much  for quick response. I completely agree with Jeff and 
> Gabor that it is an anti-pattern to build queue in Cassandra. But plan is to 
> reuse the existing Cassandra infrastructure without any additional cost (like 
> kafka).  
> So even if the data is partioned properly (max 10mb per date ) ..so still it 
> will be an issue if I read the partition only once a day ? Even with update 
> status and don't delete the row?
> 
>> On Sat, May 23, 2020, 4:36 PM Gábor Auth  wrote:
>> Hi,
>> 
>>> On Sat, May 23, 2020 at 4:09 PM Laxmikant Upadhyay 
>>>  wrote:
>>> I think that we should avoid tombstones specially row-level so should go 
>>> with option-1. Kindly suggest on above or any other better approach ?
>> 
>> Why don't you use a queue implementation, like AcitiveMQ, Kafka and 
>> something? Cassandra is not suitable for this at all, it is anti-pattern in 
>> the Cassandra world.
>> 
>> -- 
>> Bye,
>> Auth Gábor (https://iotguru.cloud)


Re: Decommissioned nodes are in UNREACHABLE state

2020-05-23 Thread Jai Bheemsen Rao Dhanwada
any inputs here?

On Sat, May 2, 2020 at 12:49 PM Jai Bheemsen Rao Dhanwada <
jaibheem...@gmail.com> wrote:

> Hello Alain,
>
> Thanks for your suggestions.
>
> Surprisingly, the node which is in unreachable state, is not present in
> any of the system tables. I am wondering, where the information is coming
> from.
> I checked system.peers for the IP in UNREACHABLE state and it's not
> present. I tried restart of Cassandra service as well.
>
> On Thu, Jun 20, 2019 at 5:59 AM Alain RODRIGUEZ 
> wrote:
>
>> Hello,
>>
>> Assuming you nodes are out for a while and you don't need the data after
>> 60 days (or cannot get it anyway), the way to fix this is to force the node
>> out. I would try, in this order:
>>
>> - nodetool removenode HOSTID
>> - nodetool removenode force
>>
>> These 2 might really not work at this stage, but if they do, this is a
>> clean way to do so.
>> Now, to really push the ghost nodes to the exit door, it often takes:
>>
>> - nodetool assassinate
>>
>> I think Cassandra 2.1 doesn't have it, you might have to use JMX, more
>> details here: https://thelastpickle.com/blog/2018/09/18/assassinate.html
>> ):
>>
>> echo "run -b org.apache.cassandra.net:type=Gossiper
>>> unsafeAssassinateEndpoint $IP_TO_ASSASSINATE"  | java -jar
>>> jmxterm-1.0.0-uber.jar -l $IP_OF_LIVE_NODE:7199
>>
>>
>> This should really remove the traces of the node, without any safety, no
>> streaming, no checks, just get rid of it. So to use with a lot of care and
>> understanding. In your situation I guess this is what will work.
>>
>> As a last attempt, you could try removing traces of the dead node(s) from
>> all the live nodes 'system.peers' table. This table is local to each node,
>> so the DELETE command is to be send to all the nodes (that have a trace of
>> an old node).
>>
>> - cqlsh -e "DELETE  $IP_TO_REMOVE FROM system.peers;"
>>
>> but I see the node IPs in UNREACHABLE state in "nodetool describecluster"
>>> output. I believe  they appear only for 72 hours, but in my case I see
>>> those nodes in UNREACHABLE for ever (more than 60 days)
>>
>>
>> To be more accurate,  you should never see leaving node as unreachable I
>> believe (not even for 72 hours). The 72 hours is the time Gossip should
>> continue referencing the old nodes. Typically when you remove the ghost
>> nodes, they should no longer appear in 'nodetool describe' cluster at all,
>>  I would say immediately, but still appear in 'nodetool gossipinfo' with a
>> 'left' or 'remove' status.
>>
>> I hope that helps and that one of the above will do the trick (I'd bet on
>> the assassinate :)). Also sorry it took us a while to answer you this
>> relatively common question :);
>>
>> C*heers,
>> ---
>> Alain Rodriguez - al...@thelastpickle.com
>> France / Spain
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>> Le jeu. 13 juin 2019 à 00:55, Jai Bheemsen Rao Dhanwada <
>> jaibheem...@gmail.com> a écrit :
>>
>>> Hello,
>>>
>>> I have a Cassandra cluster running with 2.1.16 version of Cassandra,
>>> where I have decommissioned few nodes from the cluster using "nodetool
>>> decommission", but I see the node IPs in UNREACHABLE state in "nodetool
>>> describecluster" output. I believe  they appear only for 72 hours, but in
>>> my case I see those nodes in UNREACHABLE for ever (more than 60 days).
>>> Rolling restart of the nodes didn't remove them. any idea what could be
>>> causing here?
>>>
>>> Note: I don't see them in the nodetool status output.
>>>
>>


Re: Cassandra Delete vs Update

2020-05-23 Thread Gábor Auth
Hi,

On Sat, May 23, 2020 at 6:26 PM Laxmikant Upadhyay 
wrote:

> Thanks you so much  for quick response. I completely agree with Jeff and
> Gabor that it is an anti-pattern to build queue in Cassandra. But plan is
> to reuse the existing Cassandra infrastructure without any additional cost
> (like kafka).
> So even if the data is partioned properly (max 10mb per date ) ..so still
> it will be an issue if I read the partition only once a day ? Even with
> update status and don't delete the row?
>

Both options generate unnecessary records, there is no big difference
between them. But, if the load isn't too high - so, 10 MByte per day isn't
too much, it doesn't matter.

I also have a lot of little tables (oh, column families), that wouldn't be
in Cassandra, but since they have a very minimal load, I don't give a
shit... :)

-- 
Bye,
Auth Gábor (https://iotguru.cloud)


Re: Cassandra Delete vs Update

2020-05-23 Thread Laxmikant Upadhyay
Thanks you so much  for quick response. I completely agree with Jeff and
Gabor that it is an anti-pattern to build queue in Cassandra. But plan is
to reuse the existing Cassandra infrastructure without any additional cost
(like kafka).
So even if the data is partioned properly (max 10mb per date ) ..so still
it will be an issue if I read the partition only once a day ? Even with
update status and don't delete the row?

On Sat, May 23, 2020, 4:36 PM Gábor Auth  wrote:

> Hi,
>
> On Sat, May 23, 2020 at 4:09 PM Laxmikant Upadhyay <
> laxmikant@gmail.com> wrote:
>
>> I think that we should avoid tombstones specially row-level so should go
>> with option-1. Kindly suggest on above or any other better approach ?
>>
>
> Why don't you use a queue implementation, like AcitiveMQ, Kafka and
> something? Cassandra is not suitable for this at all, it is anti-pattern in
> the Cassandra world.
>
> --
> Bye,
> Auth Gábor (https://iotguru.cloud)
>


Re: Cassandra Delete vs Update

2020-05-23 Thread Gábor Auth
Hi,

On Sat, May 23, 2020 at 4:09 PM Laxmikant Upadhyay 
wrote:

> I think that we should avoid tombstones specially row-level so should go
> with option-1. Kindly suggest on above or any other better approach ?
>

Why don't you use a queue implementation, like AcitiveMQ, Kafka and
something? Cassandra is not suitable for this at all, it is anti-pattern in
the Cassandra world.

-- 
Bye,
Auth Gábor (https://iotguru.cloud)


Re: Cassandra Delete vs Update

2020-05-23 Thread Jeff Jirsa


You’re building a queue

Just use Kafka.


> On May 23, 2020, at 7:09 AM, Laxmikant Upadhyay  
> wrote:
> 
> 
> Hi All,
> I have a query regarding Cassandra data modelling:  I have created two tables:
> 
> 1. CREATE TABLE ks.records_by_id ( id uuid PRIMARY KEY,  status text, details 
> text);
> 2. CREATE TABLE ks.records_by_date ( date date, id uuid,  status text, 
> PRIMARY KEY(date, id));
> 
> I need to fetch records by date and then process each of them.Which of the 
> following options will be better when the record is processed?
> 
> Option-1 : 
> BEGIN BATCH
> UPDATE ks.records_by_id SET status = 'processed' WHERE id = ;
> UPDATE ks.records_by_date SET status = 'processed' WHERE id =  and 
> date='date1';
> APPLY BATCH ;
> 
> Option-2
> BEGIN BATCH
> UPDATE ks.records_by_id SET status = 'processed' WHERE id = ;
> DELETE FROM ks.records_by_date WHERE id =  and date='date1';
> APPLY BATCH ;
> 
> Option-1 will not create tombstones but i need to filter the records based of 
> status='pending' at application layer for each date. Option-2 will create 
> tombstone (however number of tombstones will be limited in a partition) but 
> it will not require application side filtering.
> 
> I think that we should avoid tombstones specially row-level so should go with 
> option-1. Kindly suggest on above or any other better approach ?
> 
> -- 
> 
> regards,
> Laxmikant Upadhyay
> 


Re: Cassandra Delete vs Update

2020-05-23 Thread Aakash Pandhi
Laxmikant, 
You mentioned that you need to filter records based on status='pending' in 
option-1. I don't see that filtering is done in that option. You are setting 
status as 'processed' when partition key is matched for table. For delete 
(option-2) it will completely remove whole partition for records_by_date table 
if that's what you want. 
Regards,
Aakash Pandhi
 

On Saturday, May 23, 2020, 09:09:48 AM CDT, Laxmikant Upadhyay 
 wrote:  
 
 Hi All,I have a query regarding Cassandra data modelling:  I have created two 
tables:
1. CREATE TABLE ks.records_by_id ( id uuid PRIMARY KEY,  status text, details 
text);
2. CREATE TABLE ks.records_by_date ( date date, id uuid,  status text, PRIMARY 
KEY(date, id));

I need to fetch records by date and then process each of them.Which of the 
following options will be better when the record is processed?

Option-1 : 
BEGIN BATCH
UPDATE ks.records_by_id SET status = 'processed' WHERE id = ;
UPDATE ks.records_by_date SET status = 'processed' WHERE id =  and 
date='date1';
APPLY BATCH ;

Option-2
BEGIN BATCH
UPDATE ks.records_by_id SET status = 'processed' WHERE id = ;
DELETE FROM ks.records_by_date WHERE id =  and date='date1';
APPLY BATCH ;

Option-1 will not create tombstones but i need to filter the records based of 
status='pending' at application layer for each date. Option-2 will create 
tombstone (however number of tombstones will be limited in a partition) but it 
will not require application side filtering.

I think that we should avoid tombstones specially row-level so should go with 
option-1. Kindly suggest on above or any other better approach ?

-- 

regards,Laxmikant Upadhyay
  

Cassandra Delete vs Update

2020-05-23 Thread Laxmikant Upadhyay
Hi All,
I have a query regarding Cassandra data modelling:  I have created two
tables:

1. CREATE TABLE ks.records_by_id ( id uuid PRIMARY KEY,  status text,
details text);
2. CREATE TABLE ks.records_by_date ( date date, id uuid,  status text,
PRIMARY KEY(date, id));

I need to fetch records by date and then process each of them.Which of the
following options will be better when the record is processed?

*Option-1 : *
BEGIN BATCH
UPDATE ks.records_by_id SET status = 'processed' WHERE id = ;
UPDATE ks.records_by_date SET status = 'processed' WHERE id =  and
date='date1';
APPLY BATCH ;

*Option-2*
BEGIN BATCH
UPDATE ks.records_by_id SET status = 'processed' WHERE id = ;
DELETE FROM ks.records_by_date WHERE id =  and date='date1';
APPLY BATCH ;

Option-1 will not create tombstones but i need to filter the records based
of status='pending' at application layer for each date. Option-2 will
create tombstone (however number of tombstones will be limited in a
partition) but it will not require application side filtering.

I think that we should avoid tombstones specially row-level so should go
with option-1. Kindly suggest on above or any other better approach ?

-- 

regards,
Laxmikant Upadhyay