Adding nodes with NTS is easier, in my opinion. You don’t need to worry about replica placement, if you do it right.
> On May 4, 2017, at 7:43 AM, Cogumelos Maravilha <[email protected]> > wrote: > > Hi Alain thanks for your kick reply. > > > Regarding SimpleStrategy perhaps you are right but it's so easy to add nodes. > > I'm not using vnodes and the default 256. The information that I've posted it > a regular nodetool status keyspace. > > My partition key is a sequencial big int but nodetool cfstatus shows that the > number of keys are not balanced (data from 3 nodes): > > Number of keys (estimate): 442779640 > > Number of keys (estimate): 736380940 > > Number of keys (estimate): 451097313 > > Should I use nodetool rebuild? > > Running: > > nodetool getendpoints mykeyspace data 9213395123941039285 > > 10.1.1.52 > 10.1.1.185 > > nodetool getendpoints mykeyspace data 9213395123941039286 > > 10.1.1.161 > 10.1.1.19 > All nodes are working hard because my TTL is for 18 days and daily data > ingestion is around 120,000,000 records: > nodetool compactionstats -H > pending tasks: 3 > - mykeyspace.data: 3 > > id compaction type keyspace table > completed total unit progress > c49599b1-308d-11e7-ba5b-67e232f1bee1 Remove deleted data mykeyspace data > 133.89 GiB 158.33 GiB bytes 84.56% > c49599b0-308d-11e7-ba5b-67e232f1bee1 Remove deleted data mykeyspace data > 136.2 GiB 278.96 GiB bytes 48.83% > > Active compaction remaining time : 0h00m00s > > > nodetool compactionstats -H > pending tasks: 2 > - mykeyspace.data: 2 > > id compaction type keyspace table > completed total unit progress > b6e8ce80-30d4-11e7-a2be-9b830f114108 Compaction mykeyspace data 4.05 GiB > 133.02 GiB bytes 3.04% > Active compaction remaining time : 2h17m34s > > The nodetool repair by default in this C* version is incremental and since > the repair is run in all nodes in different hours and I don't want snapshots > that's why I'm cleaning twice a day (not sure that with -pr a snapshot is > created). > > The cleanup was already remove was there because last node was created a few > days ago. > > I'm using garbagecollect to force the cleanup since I'm running out of space. > > > Regards. > > > > On 05/04/2017 12:50 PM, Alain RODRIGUEZ wrote: >> Hi, >> >> CREATE KEYSPACE mykeyspace WITH replication = {'class': >> 'SimpleStrategy', 'replication_factor': '2'} AND durable_writes = false; >> >> The SimpleStrategy is never recommended for production clusters as it does >> not recognise racks or datacenter, inducing possible availability issues and >> unpredictable latency when using those. I would not even use it for testing >> purposes, I see no point in most cases. >> >> Even if this should be changed, carefully but as soon as possible imho, it >> is probably not related to your main issue at hand. >> >> If nodes are imbalanced, there are 3 mains questions that come to my mind: >> >> Are the token well distributed among the available nodes? >> Is the data correctly balanced on the token ring (i.e. are the 'id' values >> of 'mykeyspace.data' table well spread between the nodes? >> Are the compaction processes running smoothly on every nodes >> >> Point 1 depends on whether you are using vnodes or not and what number of >> vnodes ('num_token' in cassandra.yaml). >> If not using vnodes, you have to manually set the positions of the nodes and >> move them around when adding more nodes so thing remain balanced >> If using vnodes, make sure to use a high enough number of vnodes so >> distribution is 'good enough' (More than 32 in most cases, default is 256, >> which lead to quite balanced rings, but brings other issues) >> >> UN 10.1.1.161 398.39 GiB 256 28.9% >> UN 10.1.1.19 765.32 GiB 256 29.9% >> UN 10.1.1.52 574.24 GiB 256 28.2% >> UN 10.1.1.213 817.56 GiB 256 28.2% >> UN 10.1.1.85 638.82 GiB 256 28.2% >> UN 10.1.1.245 408.95 GiB 256 28.7% >> UN 10.1.1.185 574.63 GiB 256 27.9% >> >> You can have the token ownership information by running 'nodetool status >> <mykeyspace>'. Adding the keyspace name in the command give you the real >> ownership. Also, RF = 2 means the total of the ownership should be 200%, >> ideally evenly balanced. I am not sure about the command you ran here. Also >> as a global advice, let us the command you ran and what you expect us to see >> in the output. >> >> Still the tokens seems to be well distributed, and I guess you are using the >> default 'num_token': 256. So I believe you are not having this issue. But >> the delta between the data hold on each node is up to x2 (400 GB on some >> nodes, 800 GB on some others). >> >> Point 2 highly depends on the workload. Are your partitions evenly >> distributed among the nodes? It depends on your primary key. Using an UUID >> as the partition key is often a good idea, but it depends on your needs as >> well, of course. You could look at the distribution on the distinct nodes >> through: 'nodetool cfstats'. >> >> Point 3 : even if the tokens are perfectly distributed and the primary key >> perfectly randomized, some node can have some disk issue or any other reason >> having the compactions falling behind. This would lead to this node to hold >> more data and note evicting tombstones properly in some cases, increasing >> disk space used. Other than that, you can have a big SSTable being compacted >> on a node, having the size of the node growing quite suddenly (that's why 50 >> to 20% of the disk should always be free, depending on the compaction >> strategy in use and the number of concurrent compactions). Here, running >> 'nodetool compactionstats -H' on all the nodes would probably help you to >> troubleshoot. >> >> About crontab >> >> 08 05 * * * root nodetool repair -pr >> 11 11 * * * root fstrim -a >> 04 12 * * * root nodetool clearsnapshot >> 33 13 * * 2 root nodetool cleanup >> 35 15 * * * root nodetool garbagecollect >> 46 19 * * * root nodetool clearsnapshot >> 50 23 * * * root nodetool flush >> >> I don't understand what you try to achieve with some of the commands: >> >> nodetool repair -pr >> >> Repairing the cluster regularly is good in most cases, but as default >> changes with version, I would specify if the repair is supposed to be >> 'incremental' or 'full', if it is supposed to be 'sequential' or 'parallel' >> for example. Also, as the dataset growth, some issue will appear with >> repairs.Just search for 'repairs cassandra' on google or any search engine >> you are using and you will see that repair is a complex topic. Look for >> videos and you will find a lot of informations about it from nice talks like >> these 2 from the last summit: >> >> https://www.youtube.com/watch?v=FrF8wQuXXks >> <https://www.youtube.com/watch?v=FrF8wQuXXks> >> https://www.youtube.com/watch?v=1Sz_K8UID6E >> <https://www.youtube.com/watch?v=1Sz_K8UID6E> >> >> Also some nice tools exist to help with repairs: >> >> The Reaper (originally made at Spotify now maintained by The Last Pickle): >> https://github.com/thelastpickle/cassandra-reaper >> <https://github.com/thelastpickle/cassandra-reaper> >> 'cassandra_range_repair.py': >> https://github.com/BrianGallew/cassandra_range_repair >> <https://github.com/BrianGallew/cassandra_range_repair> >> >> 11 11 * * * root fstrim -a >> >> I am not really sure about this one but it looks good as long as the >> 'fstrim' do not create performance issue while it is running it seems fine. >> >> 04 12 * * * root nodetool clearsnapshot >> >> This will automatically erase any snapshot you might want to keep. It might >> be good to specify what snapshot you want to remove and name it. Some >> snapshots will be created and not removed when using a sequential repair. So >> I believe clearing specific snapshots is a good idea to save disk space. >> >> 33 13 * * 2 root nodetool cleanup >> >> This is to be ran on all the nodes after adding a new node. It will just >> remove data from existing node that 'gave' some token ranges to the new >> node. To do so it will compact all the SSTables. It doesn't seem to be a >> good idea to 'cron' that. >> >> 35 15 * * * root nodetool garbagecollect >> >> This is also an heavy operation that you should not need in a regular basis: >> http://cassandra.apache.org/doc/latest/tools/nodetool/garbagecollect.html >> <http://cassandra.apache.org/doc/latest/tools/nodetool/garbagecollect.html>. >> What problem are you trying to solve here? Your data uses TTLs and TWCS, so >> expired SSTable should be going away without any issue. >> >> 46 19 * * * root nodetool clearsnapshot >> >> Again? What for? >> >> 50 23 * * * root nodetool flush >> >> This will produce tables to be flushed at the same time, no matter their >> sizes or any other considerations. It is not to be used unless you are doing >> some testing, debugging or on your way to shut down the node. >> >> C*heers, >> ----------------------- >> Alain Rodriguez - @arodream - [email protected] >> <mailto:[email protected]> >> France >> >> The Last Pickle - Apache Cassandra Consulting >> http://www.thelastpickle.com <http://www.thelastpickle.com/> >> >> 2017-05-04 11:38 GMT+01:00 Cogumelos Maravilha <[email protected] >> <mailto:[email protected]>>: >> Hi all, >> >> I'm using C* 3.10. >> >> CREATE KEYSPACE mykeyspace WITH replication = {'class': >> 'SimpleStrategy', 'replication_factor': '2'} AND durable_writes = false; >> >> CREATE TABLE mykeyspace.data ( >> id bigint PRIMARY KEY, >> kafka text >> ) WITH bloom_filter_fp_chance = 0.5 >> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} >> AND comment = '' >> AND compaction = {'class': >> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', >> 'compaction_window_size': '10', 'compaction_window_unit': 'HOURS', >> 'max_threshold': '32', 'min_threshold': '6'} >> AND compression = {'chunk_length_in_kb': '64', 'class': >> 'org.apache.cassandra.io >> <http://org.apache.cassandra.io/>.compress.LZ4Compressor'} >> AND crc_check_chance = 0.0 >> AND dclocal_read_repair_chance = 0.1 >> AND default_time_to_live = 1555200 >> AND gc_grace_seconds = 10800 >> AND max_index_interval = 2048 >> AND memtable_flush_period_in_ms = 0 >> AND min_index_interval = 128 >> AND read_repair_chance = 0.0 >> AND speculative_retry = '99PERCENTILE'; >> >> UN 10.1.1.161 398.39 GiB 256 28.9% >> UN 10.1.1.19 765.32 GiB 256 29.9% >> UN 10.1.1.52 574.24 GiB 256 28.2% >> UN 10.1.1.213 817.56 GiB 256 28.2% >> UN 10.1.1.85 638.82 GiB 256 28.2% >> UN 10.1.1.245 408.95 GiB 256 28.7% >> UN 10.1.1.185 574.63 GiB 256 27.9% >> >> At crontab in all nodes (only changes the time): >> >> 08 05 * * * root nodetool repair -pr >> 11 11 * * * root fstrim -a >> 04 12 * * * root nodetool clearsnapshot >> 33 13 * * 2 root nodetool cleanup >> 35 15 * * * root nodetool garbagecollect >> 46 19 * * * root nodetool clearsnapshot >> 50 23 * * * root nodetool flush >> >> I can I fixed this? >> >> Thanks in advance. >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> <mailto:[email protected]> >> For additional commands, e-mail: [email protected] >> <mailto:[email protected]> >> >> >
