Cassandra collection tombstones

2019-01-24 Thread Ayub M
I have created a table with a collection. Inserted a record and took
sstabledump of it and seeing there is range tombstone for it in the
sstable. Does this tombstone ever get removed? Also when I run
sstablemetadata on the only sstable, it shows "Estimated droppable
tombstones" as 0.5", Similarly it shows one record with epoch time as
insert time for - "Estimated tombstone drop times: 1548384720: 1". Does it
mean that when I do sstablemetadata on a table having collections, the
estimated droppable tombstone ratio and drop times values are not true and
dependable values due to collection/list range tombstones?
CREATE TABLE ks.nmtest (
reservation_id text,
order_id text,
c1 int,
order_details map,
PRIMARY KEY (reservation_id, order_id)
) WITH CLUSTERING ORDER BY (order_id ASC)

user@cqlsh:ks> insert into nmtest (reservation_id , order_id , c1,
order_details ) values('3','3',3,{'key':'value'});
user@cqlsh:ks> select * from nmtest ;
 reservation_id | order_id | c1 | order_details
+--++--
  3 |3 |  3 | {'key': 'value'}
(1 rows)

[root@localhost nmtest-e1302500201d11e983bb693c02c04c62]# sstabledump
mc-5-big-Data.db
WARN  02:52:19,596 memtable_cleanup_threshold has been deprecated and
should be removed from cassandra.yaml
[
  {
"partition" : {
  "key" : [ "3" ],
  "position" : 0
},
"rows" : [
  {
"type" : "row",
"position" : 41,
"clustering" : [ "3" ],
"liveness_info" : { "tstamp" : "2019-01-25T02:51:13.574409Z" },
"cells" : [
  { "name" : "c1", "value" : 3 },
  { "name" : "order_details", "deletion_info" : { "marked_deleted"
: "2019-01-25T02:51:13.574408Z", "local_delete_time" :
"2019-01-25T02:51:13Z" } },
  { "name" : "order_details", "path" : [ "key" ], "value" : "value"
}
]
  }
]
  }
SSTable: /data/data/ks/nmtest-e1302500201d11e983bb693c02c04c62/mc-5-big
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.01
Minimum timestamp: 1548384673574408
Maximum timestamp: 1548384673574409
SSTable min local deletion time: 1548384673
SSTable max local deletion time: 2147483647
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 1.0714285714285714
TTL min: 0
TTL max: 0
First token: -155496620801056360 (key=3)
Last token: -155496620801056360 (key=3)
minClustringValues: [3]
maxClustringValues: [3]
Estimated droppable tombstones: 0.5
SSTable Level: 0
Repaired at: 0
Replay positions covered: {CommitLogPosition(segmentId=1548382769966,
position=6243201)=CommitLogPosition(segmentId=1548382769966,
position=6433666)}
totalColumnsSet: 2
totalRows: 1
Estimated tombstone drop times:
1548384720: 1

Does tombstone_threshold of compaction depend on the sstablemetadata
threshold value? If so then for tables having collections, this is not a
true threshold right?


Re: Re: Re: how to configure the Token Allocation Algorithm

2019-01-24 Thread Ahmed Eljami
Hi folks,

What about adding new keyspaces in the existing cluster, test_2 with the
same RF.

It will use the same logic as the existing kesypace test ? Or I should
restart nodes and add the new keyspace to the cassandra.yaml ?

Thanks.

Le mar. 2 oct. 2018 à 10:28, Varun Barala  a
écrit :

> Hi,
>
> Managing `initial_token` by yourself will give you more control over
> scale-in and scale-out.
> Let's say you have three node cluster with `num_token: 1`
>
> And your initial range looks like:-
>
> Datacenter: datacenter1
> ==
> AddressRackStatus State   LoadOwns
>  Token
>
>3074457345618258602
> 127.0.0.1  rack1   Up Normal  98.96 KiB   66.67%
>  -9223372036854775808
> 127.0.0.2  rack1   Up Normal  98.96 KiB   66.67%
>  -3074457345618258603
> 127.0.0.3  rack1   Up Normal  98.96 KiB   66.67%
>  3074457345618258602
>
> Now let's say you want to scale out the cluster to twice the current
> throughput(means you are adding 3 more nodes)
>
> If you are using AWS EBS volumes then you can use the same volumes and
> spin three more nodes by selecting midpoints of existing ranges which means
> your new nodes are already having data.
> Once you have mounted volumes on your new nodes:-
> * You need to delete every system table except schema related tables.
> * You need to generate system/local table by yourself which has `Bootstrap
> state` as completed and schema-version same as other existing nodes.
> * You need to remove extra data on all the machines using cleanup commands
>
> This is how you can scale out Cassandra cluster in the minutes. In case
> you want to add nodes one by one then you need to write some small tool
> which will always figure out the bigger range in the existing cluster and
> will split it into the half.
>
> However, I never tested it thoroughly but this should work conceptually.
> So here we are taking advantage of the fact that we have volumes(data) for
> the new node beforehand so we no need to bootstrap them.
>
> Thanks & Regards,
> Varun Barala
>
> On Tue, Oct 2, 2018 at 2:31 PM onmstester onmstester 
> wrote:
>
>>
>>
>> Sent using Zoho Mail 
>>
>>
>>  On Mon, 01 Oct 2018 18:36:03 +0330 *Alain RODRIGUEZ
>> >* wrote 
>>
>> Hello again :),
>>
>> I thought a little bit more about this question, and I was actually
>> wondering if something like this would work:
>>
>> Imagine 3 node cluster, and create them using:
>> For the 3 nodes: `num_token: 4`
>> Node 1: `intial_token: -9223372036854775808, -4611686018427387905, -2,
>> 4611686018427387901`
>> Node 2: `intial_token: -7686143364045646507, -3074457345618258604,
>> 1537228672809129299, 6148914691236517202`
>> Node 3: `intial_token: -6148914691236517206, -1537228672809129303,
>> 3074457345618258600, 7686143364045646503`
>>
>>  If you know the initial size of your cluster, you can calculate the
>> total number of tokens: number of nodes * vnodes and use the
>> formula/python code above to get the tokens. Then use the first token for
>> the first node, move to the second node, use the second token and repeat.
>> In my case there is a total of 12 tokens (3 nodes, 4 tokens each)
>> ```
>> >>> number_of_tokens = 12
>> >>> [str(((2**64 / number_of_tokens) * i) - 2**63) for i in
>> range(number_of_tokens)]
>> ['-9223372036854775808', '-7686143364045646507', '-6148914691236517206',
>> '-4611686018427387905', '-3074457345618258604', '-1537228672809129303',
>> '-2', '1537228672809129299', '3074457345618258600', '4611686018427387901',
>> '6148914691236517202', '7686143364045646503']
>> ```
>>
>>
>> Using manual initial_token (your idea), how could i add a new node to a
>> long running cluster (the procedure)?
>>
>>

-- 
Cordialement;

Ahmed ELJAMI