Hi Sean, > I will start – knowing that others will have additional help/questions I hope that, I really need help with this :)
> What heap size are you using? Sounds like you are using the CMS garbage collector. Yes, I'm using CMS garbage Collector. I have not used G1 because I read it isn't recommended but if you are saying that is going to help me with my use case I have no objection in using it. I will try. I have 3 nodes: node1 has 32GB and node2 and node3 16 GB. I'm currently using 50% RAM for each node. > Spinning disks are a problem, too. Can you tell if the IO is getting overwhelmed? SSDs are much preferred. I'm not sure about it, 'dstat' and 'iostat' tell me that rMB/s is constantly above 100MB/s and %util is closed to 100% and in these conditions the node is frozen. HDD specifics says that maximum transfer rate is 175MB/s for node1 and 155MB/s for node2 and node3. Unfortunately switching to spinning disk to SSD is not an option. > Read before write is usually an anti-pattern for Cassandra. From your queries, it seems you have a partition key and clustering key. Can you give us the table schema? I’m also concerned about the IF EXISTS in your delete. I think that invokes a light weight transaction – costly for performance. Is it really required for your use case? I don't need the 'IF EXISTS' parameter. Actually is pretty much a refuse from an old query and I can try to remove this. Here the schema: CREATE KEYSPACE my_keyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3'} AND durable_writes = false; CREATE TABLE my_keyspace.my_table ( pkey text, event_datetime timestamp, f1 text, f2 text, f3 text, f4 text, f5 int, f6 bigint, f7 bigint, f8 text, f9 text, PRIMARY KEY (pkey, event_datetime) ) WITH CLUSTERING ORDER BY (event_datetime DESC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 90000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE'; Thank you very much Marco Il giorno ven 11 gen 2019 alle ore 16:14 Durity, Sean R < sean_r_dur...@homedepot.com> ha scritto: > I will start – knowing that others will have additional help/questions. > > > > What heap size are you using? Sounds like you are using the CMS garbage > collector. That takes some arcane knowledge and lots of testing to tune. I > would start with G1 and using ½ the available RAM as the heap size. I would > want 32 GB RAM as a minimum on the hosts. > > > > Spinning disks are a problem, too. Can you tell if the IO is getting > overwhelmed? SSDs are much preferred. > > > > Read before write is usually an anti-pattern for Cassandra. From your > queries, it seems you have a partition key and clustering key. Can you give > us the table schema? I’m also concerned about the IF EXISTS in your delete. > I think that invokes a light weight transaction – costly for performance. > Is it really required for your use case? > > > > > > Sean Durity > > > > *From:* Marco Gasparini <marco.gaspar...@competitoor.com> > *Sent:* Friday, January 11, 2019 8:20 AM > *To:* user@cassandra.apache.org > *Subject:* [EXTERNAL] fine tuning for wide rows and mixed worload system > > > > Hello everyone, > > > > I need some advise in order to solve my use case problem. I have already > tried some solutions but it didn't work out. > > Can you help me with the following configuration please? any help is very > appreciate > > > > I'm using: > > - Cassandra 3.11.3 > > - java version "1.8.0_191" > > > > My use case is composed by the following constraints: > > - about 1M reads per day (it is going to rise up) > > - about 2M writes per day (it is going to rise up) > > - there is a high peek of requests in less than 2 hours in which the > system receives half of all day traffic (500K reads, 1M writes) > > - each request is composed by 1 read and 2 writes (1 delete + 1 write) > > > > * the read query selects max 3 records based on the primary > key (select * from my_keyspace.my_table where pkey = ? limit 3) > > * then is performed a deletion of one record (delete from > my_keyspace.my_table where pkey = ? and event_datetime = ? IF EXISTS) > > * finally the new data is stored (insert into > my_keyspace.my_table (event_datetime, pkey, agent, some_id, ft, ftt..) > values (?,?,?,?,?,?...)) > > > > - each row is pretty wide. I don't really know the exact size because > there are 2 dynamic text columns that stores data between 1MB to 50MB > length each. > > So, reads are going to be huge because I read 3 records of that > dimension every time. Writes are complex as well because each row is that > wide. > > > > Currently, I own 3 nodes with the following properties: > > - node1: > > * Intel Core i7-3770 > > * 2x HDD SATA 3,0 TB > > * 4x RAM 8192 MB DDR3 > > * nominative bit rate 175MB/s > > # blockdev --report /dev/sd[ab] > > RO RA SSZ BSZ StartSec Size > Device > > rw 256 512 4096 0 3000592982016 > /dev/sda > > rw 256 512 4096 0 3000592982016 > /dev/sdb > > > > - node2,3: > > * Intel Core i7-2600 > > * 2x HDD SATA 3,0 TB > > * 4x RAM 4096 MB DDR3 > > * nominative bit rate 155MB/s > > # blockdev --report /dev/sd[ab] > > RO RA SSZ BSZ StartSec Size > Device > > rw 256 512 4096 0 3000592982016 > /dev/sda > > rw 256 512 4096 0 3000592982016 > /dev/sdb > > > > Each node has 2 disks but I have disabled RAID option and I have created a > virtual single disk in order to get much free space. > > Can this configuration create issues? > > > > I have already tried some configurations in order to make it work, like: > > 1) straigthforward attempt > > - default Cassandra configuration (cassandra.yaml) > > - RF=1 > > - SizeTieredCompactionStrategy (write strategy) > > - no row cache (because of wide rows dimension is better to > have no row cache) > > - gc_grace_seconds = 1 day (unfortunately, I did no repair > schedule at all) > > results: > > too many timeouts, losing data > > > > 2) > > - added repair schedules > > - RF=3 (in order increase reads speed) > > results: > > - too many timeouts, losing data > > - high I/O consumption on each nodes (iostat shows > 100% in %util on each nodes, dstat shows hundred of M read for each > iteration) > > - node2 frozen until I stopped data writes. > > - node3 almost frozen > > - many panding MutationStage events in TPSTATS in > node2 > > - many full GC > > - many HintsDispatchExecutor events in system.log > > > > actual) > > - added repair schedules > > - RF=3 > > - set durable_writes = false in order to speed up writes > > - increased young heap > > - decreased SurviviorRatio in order to get much young size > available because of wide rows data > > - increased from 1 to 3 MaxTenuringThreshold in order to > decrease reads latency > > - increased Cassandra's memtable onheap and offheap dimensions > beacause of wide rows data > > - changed memtable_allocation_type to offheap_objects bacause > of wide rows data > > results: > > - better GC performance on nodes1 and > node3 > > - still high I/O consumption on each nodes (iostat > shows 100% in %util on each nodes, dstat shows hundred of M read for each > iteration) > > - still node2 completely frozen > > - many panding MutationStage events in TPSTATS in > node2 > > - many HintsDispatchExecutor events in system.log > in each nodes > > > > > > I cannot go to AWS but I can only get dedicated server. > > Do you have any suggestions to fine tune the system on this use case? > > > > Thank you > > Marco > > > > ------------------------------ > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. >