Re: nodetool load does not match du

2020-02-03 Thread Erick Ramirez
> I thought that the snapshot size was not counted in the load.
>

That's correct. I suggested looking at what nodetool tablestats reports so
you can compare that against du/df outputs for clues as to why there is
such a large discrepancy. Cheers!


Re: nodetool load does not match du

2020-02-03 Thread Sergio
   -

   The amount of file system data under the cassandra data directory after
   excluding all content in the snapshots subdirectories. Because all SSTable
   data files are included, any data that is not cleaned up, such as
   TTL-expired cell or tombstoned data) is counted.

https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/tools/toolsStatus.html



Il giorno lun 3 feb 2020 alle ore 23:43 Sergio 
ha scritto:

> Thanks, Erick!
>
> I thought that the snapshot size was not counted in the load.
>
> Il giorno lun 3 feb 2020 alle ore 23:24 Erick Ramirez <
> flightc...@gmail.com> ha scritto:
>
>> Why the df -h and du -sh shows a big discrepancy? nodetool load is it
>>> computed with df -h?
>>>
>>
>> In Linux terms, df reports the filesystem disk usage while du is an
>> *estimate* of the file space usage. What that means is that the operating
>> system uses different accounting between the two utilities. If you're
>> looking for a more detailed explanation, just do a search for "df vs du".
>>
>> With nodetool load, do you have any snapshots still on disk? This usually
>> accounts for the discrepancy. Snapshots are hard links to the same inodes
>> as the original SSTables -- put simply, they're "pointers" to the original
>> files so they don't occupy the same amount of space.
>>
>> If you think there's a real issue, one way to troubleshoot is to do a du
>> on the table subdirectory then compare it to the size reported by nodetool
>> tablestats . Cheers!
>>
>


Re: nodetool load does not match du

2020-02-03 Thread Sergio
Thanks, Erick!

I thought that the snapshot size was not counted in the load.

Il giorno lun 3 feb 2020 alle ore 23:24 Erick Ramirez 
ha scritto:

> Why the df -h and du -sh shows a big discrepancy? nodetool load is it
>> computed with df -h?
>>
>
> In Linux terms, df reports the filesystem disk usage while du is an
> *estimate* of the file space usage. What that means is that the operating
> system uses different accounting between the two utilities. If you're
> looking for a more detailed explanation, just do a search for "df vs du".
>
> With nodetool load, do you have any snapshots still on disk? This usually
> accounts for the discrepancy. Snapshots are hard links to the same inodes
> as the original SSTables -- put simply, they're "pointers" to the original
> files so they don't occupy the same amount of space.
>
> If you think there's a real issue, one way to troubleshoot is to do a du
> on the table subdirectory then compare it to the size reported by nodetool
> tablestats . Cheers!
>


Re: [EXTERNAL] How to reduce vnodes without downtime

2020-02-03 Thread Sergio
After reading this

*I would only consider moving a cluster to 4 tokens if it is larger than
100 nodes. If you read through the paper that Erick mentioned, written
by Joe Lynch & Josh Snyder, they show that the num_tokens impacts the
availability of large scale clusters.*

and

With 16 tokens, that is vastly improved, but you still have up to 64 nodes
each node needs to query against, so you're again, hitting every node
unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs).  I
wouldn't use 16 here, and I doubt any of you would either.  I've advocated
for 4 tokens because you'd have overlap with only 16 nodes, which works
well for small clusters as well as large.  Assuming I was creating a new
cluster for myself (in a hypothetical brand new application I'm building) I
would put this in production.  I have worked with several teams where I
helped them put 4 token clusters in prod and it has worked very well.  We
didn't see any wild imbalance issues.

from
https://lists.apache.org/thread.html/r55d8e68483aea30010a4162ae94e92bc63ed74d486e6c642ee66f6ae%40%3Cuser.cassandra.apache.org%3E

Sorry guys, but I am kinda confused now which should be the recommended
approach for the number of *vnodes*.
Right now I am handling a cluster with just 9 nodes and a data size of
100-200GB per node.

I am seeing some unbalancing and I was worried because I have 256 vnodes

--  Address  Load   Tokens   OwnsHost ID
Rack
UN  10.1.30.112  115.88 GiB  256  ?
e5108a8e-cc2f-4914-a86e-fccf770e3f0f  us-east-1b
UN  10.1.24.146  127.42 GiB  256  ?
adf40fa3-86c4-42c3-bf0a-0f3ee1651696  us-east-1b
UN  10.1.26.181  133.44 GiB  256  ?
0a8f07ba-a129-42b0-b73a-df649bd076ef  us-east-1b
UN  10.1.29.202  113.33 GiB  256  ?
d260d719-eae3-48ab-8a98-ea5c7b8f6eb6  us-east-1b
UN  10.1.31.60   183.63 GiB  256  ?
3647fcca-688a-4851-ab15-df36819910f4  us-east-1b
UN  10.1.24.175  118.09 GiB  256  ?
bba1e80b-8156-4399-bd6a-1b5ccb47bddb  us-east-1b
UN  10.1.29.223  137.24 GiB  256  ?
450fbb61-3817-419a-a4c6-4b652eb5ce01  us-east-1b

Weird stuff is related to this post

where I don't find a match between the load and du -sh * for the node
10.1.31.60 and I was trying to figure out the reason, if it was due to the
number of vnodes.

2 Out-of-topic questions:

1)
Does Cassandra keep a copy of the data per rack so if I need to keep the
things balanced and I would have to add 3 racks at the time in a single
Datacenter keep the things balanced?

2) Is it better to keep a single Rack with a single Datacenter in 3
different availability zones with replication factor = 3 or to have for
each Datacenter: 1 Rack and 1 Availability Zone and eventually redirect the
client to a fallback Datacenter in case one of the availability zone is not
reachable?

Right now we are separating the Datacenter for reads from the one that
handles the writes...

Thanks for your help!

Sergio




Il giorno dom 2 feb 2020 alle ore 18:36 Anthony Grasso <
anthony.gra...@gmail.com> ha scritto:

> Hi Sergio,
>
> There is a misunderstanding here. My post makes no recommendation for the
> value of num_tokens. Rather, it focuses on how to use
> the allocate_tokens_for_keyspace setting when creating a new cluster.
>
> Whilst a value of 4 is used for num_tokens in the post, it was chosen for
> demonstration purposes. Specifically it makes:
>
>- the uneven token distribution in a small cluster very obvious,
>- identifying the endpoints displayed in nodetool ring easy, and
>- the initial_token setup less verbose and easier to follow.
>
> I will add an editorial note to the post with the above information
> so there is no confusion about why 4 tokens were used.
>
> I would only consider moving a cluster to 4 tokens if it is larger than
> 100 nodes. If you read through the paper that Erick mentioned, written
> by Joe Lynch & Josh Snyder, they show that the num_tokens impacts the
> availability of large scale clusters.
>
> If you are after more details about the trade-offs between different sized
> token values, please see the discussion on the dev mailing list: "[Discuss]
> num_tokens default in Cassandra 4.0
> 
> ".
>
> Regards,
> Anthony
>
> On Sat, 1 Feb 2020 at 10:07, Sergio  wrote:
>
>>
>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>  This
>> is the article with 4 token recommendations.
>> @Erick Ramirez. which is the dev thread for the default 32 tokens
>> recommendation?
>>
>> Thanks,
>> Sergio
>>
>> Il giorno ven 31 gen 2020 alle ore 14:49 Erick Ramirez <
>> flightc...@gmail.com> ha scritto:
>>
>>> There's an active discussion going on right now in a separate dev
>>> 

Fwd: Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-03 Thread onmstester onmstester
Thank you so much



Sent using https://www.zoho.com/mail/






 Forwarded message 
From: Max C. 
To: 
Date: Tue, 04 Feb 2020 08:37:21 +0330
Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
 Forwarded message 



Let’s say you have a 6 node cluster, with RF=3, and no vnodes.  In that case 
each piece of data is stored as follows:



: 

N1: N2 N3

N2: N3 N4

N3: N4 N5

N4: N5 N6

N5: N6 N1

N6: N1 N2



With this setup, there are some circumstances where you could lose 2 nodes (ex: 
N1 & N4) and still be able to maintain CL=quorum.  If your cluster is very 
large, then you could lose even more — and that’s a good thing, because if you 
have hundreds/thousands of nodes then you don’t want the world to come tumbling 
down if  > 1 node is down.  Or maybe you want to upgrade the OS on your nodes, 
and want to (with very careful planning!) do it by taking down more than 1 node 
at a time.



… but if you have a large number of vnodes, then a given node will share a 
small segment of data with LOTS of other nodes, which destroys this property.  
The more vnodes, the less likely you’re able to handle > 1 node down.



For example, see this diagram in the Datastax docs —



https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes



In that bottom picture, you can’t knock out 2 nodes and still maintain 
CL=quorum.  Ex:  If you knock out node 1 & 4, then ranges B & L would no longer 
meet CL=quorum;  but you can do that in the top diagram, since there are no 
ranges shared between node 1 & 4.



Hope that helps.



- Max





On Feb 3, 2020, at 8:39 pm, onmstester onmstester 
 wrote:



Sorry if its trivial, but i do not understand how num_tokens affects 
availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost at 
most one node and all of the tokens assigned to that node would be also 
assigned to two other nodes no matter what num_tokens is, right?



Sent using https://www.zoho.com/mail/






 Forwarded message 
From: Jon Haddad 
To: 
Date: Tue, 04 Feb 2020 01:15:21 +0330
Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
 Forwarded message 



I think it's a good idea to take a step back and get a high level view of 
the problem we're trying to solve. 
 
First, high token counts result in decreased availability as each node has 
data overlap with with more nodes in the cluster.  Specifically, a node can 
share data with RF-1 * 2 * num_tokens.  So a 256 token cluster at RF=3 is 
going to almost always share data with every other node in the cluster that 
isn't in the same rack, unless you're doing something wild like using more 
than a thousand nodes in a cluster.  We advertise 
 
With 16 tokens, that is vastly improved, but you still have up to 64 nodes 
each node needs to query against, so you're again, hitting every node 
unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs).  I 
wouldn't use 16 here, and I doubt any of you would either.  I've advocated 
for 4 tokens because you'd have overlap with only 16 nodes, which works 
well for small clusters as well as large.  Assuming I was creating a new 
cluster for myself (in a hypothetical brand new application I'm building) I 
would put this in production.  I have worked with several teams where I 
helped them put 4 token clusters in prod and it has worked very well.  We 
didn't see any wild imbalance issues. 
 
As Mick's pointed out, our current method of using random token assignment 
for the default number of problematic for 4 tokens.  I fully agree with 
this, and I think if we were to try to use 4 tokens, we'd want to address 
this in tandem.  We can discuss how to better allocate tokens by default 
(something more predictable than random), but I'd like to avoid the 
specifics of that for the sake of this email. 
 
To Alex's point, repairs are problematic with lower token counts due to 
over streaming.  I think this is a pretty serious issue and I we'd have to 
address it before going all the way down to 4.  This, in my opinion, is a 
more complex problem to solve and I think trying to fix it here could make 
shipping 4.0 take even longer, something none of us want. 
 
For the sake of shipping 4.0 without adding extra overhead and time, I'm ok 
with moving to 16 tokens, and in the process adding extensive documentation 
outlining what we recommend for production use.  I think we should also try 
to figure out something better than random as the default to fix the data 
imbalance issues.  I've got a few ideas here I've been noodling on. 
 
As long as folks are fine with potentially changing the default again in C* 
5.0 (after another discussion / debate), 16 is enough of an improvement 
that I'm OK with the change, and willing to author 

Re: nodetool load does not match du

2020-02-03 Thread Erick Ramirez
>
> Why the df -h and du -sh shows a big discrepancy? nodetool load is it
> computed with df -h?
>

In Linux terms, df reports the filesystem disk usage while du is an
*estimate* of the file space usage. What that means is that the operating
system uses different accounting between the two utilities. If you're
looking for a more detailed explanation, just do a search for "df vs du".

With nodetool load, do you have any snapshots still on disk? This usually
accounts for the discrepancy. Snapshots are hard links to the same inodes
as the original SSTables -- put simply, they're "pointers" to the original
files so they don't occupy the same amount of space.

If you think there's a real issue, one way to troubleshoot is to do a du on
the table subdirectory then compare it to the size reported by nodetool
tablestats . Cheers!


nodetool load does not match du

2020-02-03 Thread Sergio Bilello
Hello!
I was trying to understand the below differences:
Cassandra 3.11.4
i3xlarge aws nodes

$ du -sh /mnt
123G/mnt

$ nodetool info
ID : 3647fcca-688a-4851-ab15-df36819910f4
Gossip active  : true
Thrift active  : true
Native Transport active: true
Load   : 183.55 GiB
Generation No  : 1570757970
Uptime (seconds)   : 10041867
Heap Memory (MB)   : 3574.09 / 7664.00
Off Heap Memory (MB)   : 441.70
Data Center: live
Rack   : us-east-1b
Exceptions : 0
Key Cache  : entries 1430578, size 100 MiB, capacity 100 MiB, 
10075279019 hits, 13328775396 requests, 0.756 recent hit rate, 14400 save 
period in seconds
Row Cache  : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 
requests, NaN recent hit rate, 0 save period in seconds
Counter Cache  : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 
requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache: entries 7680, size 479.97 MiB, capacity 480 MiB, 
1835784783 misses, 11836353728 requests, 0.845 recent hit rate, 141.883 
microseconds miss latency
Percent Repaired   : 0.10752808456509523%
Token  : (invoke with -T/--tokens to see all 256 tokens)

$ df -h
Filesystem  Size  Used Avail Use% Mounted on
devtmpfs 15G 0   15G   0% /dev
tmpfs15G   72K   15G   1% /dev/shm
tmpfs15G  1.4G   14G  10% /run
tmpfs15G 0   15G   0% /sys/fs/cgroup
/dev/xvda1   50G  9.9G   41G  20% /
/dev/nvme0n1885G  181G  705G  21% /mnt
tmpfs   3.0G 0  3.0G   0% /run/user/995
tmpfs   3.0G 0  3.0G   0% /run/user/1009

Why the df -h and du -sh shows a big discrepancy? nodetool load is it computed 
with df -h?



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-03 Thread Max C.
Let’s say you have a 6 node cluster, with RF=3, and no vnodes.  In that case 
each piece of data is stored as follows:

: 
N1: N2 N3
N2: N3 N4
N3: N4 N5
N4: N5 N6
N5: N6 N1
N6: N1 N2

With this setup, there are some circumstances where you could lose 2 nodes (ex: 
N1 & N4) and still be able to maintain CL=quorum.  If your cluster is very 
large, then you could lose even more — and that’s a good thing, because if you 
have hundreds/thousands of nodes then you don’t want the world to come tumbling 
down if  > 1 node is down.  Or maybe you want to upgrade the OS on your nodes, 
and want to (with very careful planning!) do it by taking down more than 1 node 
at a time.

… but if you have a large number of vnodes, then a given node will share a 
small segment of data with LOTS of other nodes, which destroys this property.  
The more vnodes, the less likely you’re able to handle > 1 node down.

For example, see this diagram in the Datastax docs —

https://docs.datastax.com/en/dse/5.1/dse-arch/datastax_enterprise/dbArch/archDataDistributeVnodesUsing.html#Distributingdatausingvnodes
 


In that bottom picture, you can’t knock out 2 nodes and still maintain 
CL=quorum.  Ex:  If you knock out node 1 & 4, then ranges B & L would no longer 
meet CL=quorum;  but you can do that in the top diagram, since there are no 
ranges shared between node 1 & 4.

Hope that helps.

- Max


> On Feb 3, 2020, at 8:39 pm, onmstester onmstester 
>  wrote:
> 
> Sorry if its trivial, but i do not understand how num_tokens affects 
> availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost 
> at most one node and all of the tokens assigned to that node would be also 
> assigned to two other nodes no matter what num_tokens is, right?
> 
> Sent using Zoho Mail 
> 
> 
>  Forwarded message 
> From: Jon Haddad mailto:j...@jonhaddad.com>>
> To: mailto:d...@cassandra.apache.org>>
> Date: Tue, 04 Feb 2020 01:15:21 +0330
> Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
>  Forwarded message 
> 
> I think it's a good idea to take a step back and get a high level view of 
> the problem we're trying to solve. 
> 
> First, high token counts result in decreased availability as each node has 
> data overlap with with more nodes in the cluster. Specifically, a node can 
> share data with RF-1 * 2 * num_tokens. So a 256 token cluster at RF=3 is 
> going to almost always share data with every other node in the cluster that 
> isn't in the same rack, unless you're doing something wild like using more 
> than a thousand nodes in a cluster. We advertise 
> 
> With 16 tokens, that is vastly improved, but you still have up to 64 nodes 
> each node needs to query against, so you're again, hitting every node 
> unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs). I 
> wouldn't use 16 here, and I doubt any of you would either. I've advocated 
> for 4 tokens because you'd have overlap with only 16 nodes, which works 
> well for small clusters as well as large. Assuming I was creating a new 
> cluster for myself (in a hypothetical brand new application I'm building) I 
> would put this in production. I have worked with several teams where I 
> helped them put 4 token clusters in prod and it has worked very well. We 
> didn't see any wild imbalance issues. 
> 
> As Mick's pointed out, our current method of using random token assignment 
> for the default number of problematic for 4 tokens. I fully agree with 
> this, and I think if we were to try to use 4 tokens, we'd want to address 
> this in tandem. We can discuss how to better allocate tokens by default 
> (something more predictable than random), but I'd like to avoid the 
> specifics of that for the sake of this email. 
> 
> To Alex's point, repairs are problematic with lower token counts due to 
> over streaming. I think this is a pretty serious issue and I we'd have to 
> address it before going all the way down to 4. This, in my opinion, is a 
> more complex problem to solve and I think trying to fix it here could make 
> shipping 4.0 take even longer, something none of us want. 
> 
> For the sake of shipping 4.0 without adding extra overhead and time, I'm ok 
> with moving to 16 tokens, and in the process adding extensive documentation 
> outlining what we recommend for production use. I think we should also try 
> to figure out something better than random as the default to fix the data 
> imbalance issues. I've got a few ideas here I've been noodling on. 
> 
> As long as folks are fine with potentially changing the default again in C* 
> 5.0 (after another discussion / debate), 16 is enough of an improvement 
> that I'm OK with the change, and willing to author the docs to help people 
> set up their first cluster. For folks that go into production 

Re: Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-03 Thread Jeff Jirsa
The more vnodes you have on each host, the more likely it becomes that any
2 hosts are adjacent/neighbors/replicas.


On Mon, Feb 3, 2020 at 8:39 PM onmstester onmstester
 wrote:

> Sorry if its trivial, but i do not understand how num_tokens affects
> availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost
> at most one node and all of the tokens assigned to that node would be also
> assigned to two other nodes no matter what num_tokens is, right?
>
> Sent using Zoho Mail 
>
>
>  Forwarded message 
> From: Jon Haddad 
> To: 
> Date: Tue, 04 Feb 2020 01:15:21 +0330
> Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
>  Forwarded message 
>
> I think it's a good idea to take a step back and get a high level view of
> the problem we're trying to solve.
>
> First, high token counts result in decreased availability as each node has
> data overlap with with more nodes in the cluster. Specifically, a node can
> share data with RF-1 * 2 * num_tokens. So a 256 token cluster at RF=3 is
> going to almost always share data with every other node in the cluster
> that
> isn't in the same rack, unless you're doing something wild like using more
> than a thousand nodes in a cluster. We advertise
>
> With 16 tokens, that is vastly improved, but you still have up to 64 nodes
> each node needs to query against, so you're again, hitting every node
> unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs). I
> wouldn't use 16 here, and I doubt any of you would either. I've advocated
> for 4 tokens because you'd have overlap with only 16 nodes, which works
> well for small clusters as well as large. Assuming I was creating a new
> cluster for myself (in a hypothetical brand new application I'm building)
> I
> would put this in production. I have worked with several teams where I
> helped them put 4 token clusters in prod and it has worked very well. We
> didn't see any wild imbalance issues.
>
> As Mick's pointed out, our current method of using random token assignment
> for the default number of problematic for 4 tokens. I fully agree with
> this, and I think if we were to try to use 4 tokens, we'd want to address
> this in tandem. We can discuss how to better allocate tokens by default
> (something more predictable than random), but I'd like to avoid the
> specifics of that for the sake of this email.
>
> To Alex's point, repairs are problematic with lower token counts due to
> over streaming. I think this is a pretty serious issue and I we'd have to
> address it before going all the way down to 4. This, in my opinion, is a
> more complex problem to solve and I think trying to fix it here could make
> shipping 4.0 take even longer, something none of us want.
>
> For the sake of shipping 4.0 without adding extra overhead and time, I'm
> ok
> with moving to 16 tokens, and in the process adding extensive
> documentation
> outlining what we recommend for production use. I think we should also try
> to figure out something better than random as the default to fix the data
> imbalance issues. I've got a few ideas here I've been noodling on.
>
> As long as folks are fine with potentially changing the default again in
> C*
> 5.0 (after another discussion / debate), 16 is enough of an improvement
> that I'm OK with the change, and willing to author the docs to help people
> set up their first cluster. For folks that go into production with the
> defaults, we're at least not setting them up for total failure once their
> clusters get large like we are now.
>
> In future versions, we'll probably want to address the issue of data
> imbalance by building something in that shifts individual tokens around. I
> don't think we should try to do this in 4.0 either.
>
> Jon
>
>
>
> On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna 
> wrote:
>
> > I think Mick and Anthony make some valid operational and skew points for
> > smaller/starting clusters with 4 num_tokens. There’s an arbitrary line
> > between small and large clusters but I think most would agree that most
> > clusters are on the small to medium side. (A small nuance is afaict the
> > probabilities have to do with quorum on a full token range, ie it has to
> do
> > with the size of a datacenter not the full cluster
> >
> > As I read this discussion I’m personally more inclined to go with 16 for
> > now. It’s true that if we could fix the skew and topology gotchas for
> those
> > starting things up, 4 would be ideal from an availability perspective.
> > However we’re still in the brainstorming stage for how to address those
> > challenges. I think we should create tickets for those issues and go
> with
> > 16 for 4.0.
> >
> > This is about an out of the box experience. It balances availability,
> > operations (such as skew and general bootstrap friendliness and
> > streaming/repair), and cluster sizing. Balancing all of those, I think
> for
> > now I’m more comfortable 

Fwd: Re: [Discuss] num_tokens default in Cassandra 4.0

2020-02-03 Thread onmstester onmstester
Sorry if its trivial, but i do not understand how num_tokens affects 
availability, with RF=3, CLW,CLR=quorum, the cluster could tolerate to lost at 
most one node and all of the tokens assigned to that node would be also 
assigned to two other nodes no matter what num_tokens is, right?


Sent using https://www.zoho.com/mail/




 Forwarded message 
From: Jon Haddad 
To: 
Date: Tue, 04 Feb 2020 01:15:21 +0330
Subject: Re: [Discuss] num_tokens default in Cassandra 4.0
 Forwarded message 


I think it's a good idea to take a step back and get a high level view of 
the problem we're trying to solve. 
 
First, high token counts result in decreased availability as each node has 
data overlap with with more nodes in the cluster.  Specifically, a node can 
share data with RF-1 * 2 * num_tokens.  So a 256 token cluster at RF=3 is 
going to almost always share data with every other node in the cluster that 
isn't in the same rack, unless you're doing something wild like using more 
than a thousand nodes in a cluster.  We advertise 
 
With 16 tokens, that is vastly improved, but you still have up to 64 nodes 
each node needs to query against, so you're again, hitting every node 
unless you go above ~96 nodes in the cluster (assuming 3 racks / AZs).  I 
wouldn't use 16 here, and I doubt any of you would either.  I've advocated 
for 4 tokens because you'd have overlap with only 16 nodes, which works 
well for small clusters as well as large.  Assuming I was creating a new 
cluster for myself (in a hypothetical brand new application I'm building) I 
would put this in production.  I have worked with several teams where I 
helped them put 4 token clusters in prod and it has worked very well.  We 
didn't see any wild imbalance issues. 
 
As Mick's pointed out, our current method of using random token assignment 
for the default number of problematic for 4 tokens.  I fully agree with 
this, and I think if we were to try to use 4 tokens, we'd want to address 
this in tandem.  We can discuss how to better allocate tokens by default 
(something more predictable than random), but I'd like to avoid the 
specifics of that for the sake of this email. 
 
To Alex's point, repairs are problematic with lower token counts due to 
over streaming.  I think this is a pretty serious issue and I we'd have to 
address it before going all the way down to 4.  This, in my opinion, is a 
more complex problem to solve and I think trying to fix it here could make 
shipping 4.0 take even longer, something none of us want. 
 
For the sake of shipping 4.0 without adding extra overhead and time, I'm ok 
with moving to 16 tokens, and in the process adding extensive documentation 
outlining what we recommend for production use.  I think we should also try 
to figure out something better than random as the default to fix the data 
imbalance issues.  I've got a few ideas here I've been noodling on. 
 
As long as folks are fine with potentially changing the default again in C* 
5.0 (after another discussion / debate), 16 is enough of an improvement 
that I'm OK with the change, and willing to author the docs to help people 
set up their first cluster.  For folks that go into production with the 
defaults, we're at least not setting them up for total failure once their 
clusters get large like we are now. 
 
In future versions, we'll probably want to address the issue of data 
imbalance by building something in that shifts individual tokens around.  I 
don't think we should try to do this in 4.0 either. 
 
Jon 
 
 
 
On Fri, Jan 31, 2020 at 2:04 PM Jeremy Hanna 
 
wrote: 
 
> I think Mick and Anthony make some valid operational and skew points for 
> smaller/starting clusters with 4 num_tokens. There’s an arbitrary line 
> between small and large clusters but I think most would agree that most 
> clusters are on the small to medium side. (A small nuance is afaict the 
> probabilities have to do with quorum on a full token range, ie it has to do 
> with the size of a datacenter not the full cluster 
> 
> As I read this discussion I’m personally more inclined to go with 16 for 
> now. It’s true that if we could fix the skew and topology gotchas for those 
> starting things up, 4 would be ideal from an availability perspective. 
> However we’re still in the brainstorming stage for how to address those 
> challenges. I think we should create tickets for those issues and go with 
> 16 for 4.0. 
> 
> This is about an out of the box experience. It balances availability, 
> operations (such as skew and general bootstrap friendliness and 
> streaming/repair), and cluster sizing. Balancing all of those, I think for 
> now I’m more comfortable with 16 as the default with docs on considerations 
> and tickets to unblock 4 as the default for all users. 
> 
> >>> On Feb 1, 2020, at 6:30 AM, Jeff Jirsa  

Re: Apache vs Datastax cassandra

2020-02-03 Thread Erick Ramirez
Adarsh, a very *friendly* note that anyone is more than welcome to ask
questions -- in fact as a group it's encouraged -- but a *gentle reminder*
that this mailing list is for open-source Apache Cassandra. By all means,
feel free to respond and not saying at all that it's not allowed (I'm just
another user after all, not affiliated with the ASF) though it might be
more appropriate to post your question on community.datastax.com. Cheers!

On Mon, Feb 3, 2020 at 8:30 PM Adarsh Kumar  wrote:

> Hello All,
>
> We have a product that uses Postgres/Cassandra as datastore. We user both
> Apache and DataStax Cassandra depending on the client's requirements. But
> never got the chance to explore what exact difference between these two.
> Apart from Opscenter, DSEGraph, want to know more from Cassandra
> perspective.
>
> Also please provide link to any blog if available.
>
> Thanks in advance.
>
> Thanks & Regards,
> Adarsh K
>


Re: [EXTERNAL] How to reduce vnodes without downtime

2020-02-03 Thread Sergio
Thanks Erick!

Best,

Sergio

On Sun, Feb 2, 2020, 10:07 PM Erick Ramirez  wrote:

> If you are after more details about the trade-offs between different sized
>> token values, please see the discussion on the dev mailing list: "[Discuss]
>> num_tokens default in Cassandra 4.0
>> 
>> ".
>>
>> Regards,
>> Anthony
>>
>> On Sat, 1 Feb 2020 at 10:07, Sergio  wrote:
>>
>>>
>>> https://thelastpickle.com/blog/2019/02/21/set-up-a-cluster-with-even-token-distribution.html
>>>  This
>>> is the article with 4 token recommendations.
>>> @Erick Ramirez. which is the dev thread for the default 32 tokens
>>> recommendation?
>>>
>>> Thanks,
>>> Sergio
>>>
>>
> Sergio, my apologies for not replying. For some reason, your reply went to
> my spam folder and I didn't see it.
>
> Thanks, Anthony, for responding. I was indeed referring to that dev
> thread. Cheers!
>
>


Re: [EXTERNAL] How to reduce vnodes without downtime

2020-02-03 Thread Maxim Parkachov
Hi guys,

thanks a lot for useful tips. I obviously underestimated complexity of such
change.

Thanks again,
Maxim.

>


Apache vs Datastax cassandra

2020-02-03 Thread Adarsh Kumar
Hello All,

We have a product that uses Postgres/Cassandra as datastore. We user both
Apache and DataStax Cassandra depending on the client's requirements. But
never got the chance to explore what exact difference between these two.
Apart from Opscenter, DSEGraph, want to know more from Cassandra
perspective.

Also please provide link to any blog if available.

Thanks in advance.

Thanks & Regards,
Adarsh K