- How many event_datetime records can you have per pkey? during a day of work I can have less than 10 event_datetime records per pkey. Every day I maintain maximum 3 of them, so each new event_datetime for a pkey determines a delete and an insert into Cassandra.
- How many pkeys (roughly) do you have? Few millions but it is going to rise up. - In general, you only want to have at most 100 MB of data per partition (pkey). If it is larger than that, I would expect some timeouts. I suspect you either have very wide rows or lots of tombstones I ran some nodetool commands in order to give you more data: CFSTATS output: nodetool cfstats my_keyspace.my_table -H Total number of tables: 52 ---------------- Keyspace : my_keyspace Read Count: 2441795 Read Latency: 400.53986035478 ms Write Count: 5097368 Write Latency: 6.494159368913525 ms Pending Flushes: 0 Table: my_table SSTable count: 13 Space used (live): 185.45 GiB Space used (total): 185.45 GiB Space used by snapshots (total): 0 bytes Off heap memory used (total): 80.66 MiB SSTable Compression Ratio: 0.2973552755387901 Number of partitions (estimate): 762039 Memtable cell count: 915 Memtable data size: 43.75 MiB Memtable off heap memory used: 0 bytes Memtable switch count: 598 Local read count: 2441795 Local read latency: 93.186 ms Local write count: 5097368 Local write latency: 3.189 ms Pending flushes: 0 Percent repaired: 0.0 Bloom filter false positives: 5719 Bloom filter false ratio: 0.00000 Bloom filter space used: 1.65 MiB Bloom filter off heap memory used: 1.65 MiB Index summary off heap memory used: 1.17 MiB Compression metadata off heap memory used: 77.83 MiB Compacted partition minimum bytes: 104 Compacted partition maximum bytes: 20924300 Compacted partition mean bytes: 529420 Average live cells per slice (last five minutes): 2.0 Maximum live cells per slice (last five minutes): 3 Average tombstones per slice (last five minutes): 7.423841059602649 Maximum tombstones per slice (last five minutes): 50 Dropped Mutations: 0 bytes ---------------- CFHISTOGRAMS output: nodetool cfhistograms my_keyspace my_table my_keyspace/my_table histograms Percentile SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes) 50% 10.00 379.02 1955.67 379022 8 75% 12.00 654.95 186563.16 654949 17 95% 12.00 20924.30 268650.95 1629722 35 98% 12.00 20924.30 322381.14 2346799 42 99% 12.00 20924.30 386857.37 3379391 50 Min 0.00 6.87 88.15 104 0 Max 12.00 25109.16 464228.84 20924300 179 I tried to enable 'tracing on' on CQLSH cli and make some queries in order to find out if there are tombstones scanned frequentely but, in my little sample of queries, I got almost similar answers like the following: Preparing statement [Native-Transport-Requests-1] Executing single-partition query on my_table [ReadStage-2] Acquiring sstable references [ReadStage-2] Bloom filter allows skipping sstable 2581 [ReadStage-2] Bloom filter allows skipping sstable 2580 [ReadStage-2] Bloom filter allows skipping sstable 2575 [ReadStage-2] Partition index with 2 entries found for sstable 2570 [ReadStage-2] Bloom filter allows skipping sstable 2548 [ReadStage-2] Bloom filter allows skipping sstable 2463 [ReadStage-2] Bloom filter allows skipping sstable 2416 [ReadStage-2] Partition index with 3 entries found for sstable 2354 [ReadStage-2] Bloom filter allows skipping sstable 1784 [ReadStage-2] Partition index with 5 entries found for sstable 1296 [ReadStage-2] Partition index with 3 entries found for sstable 1002 [ReadStage-2] Partition index with 3 entries found for sstable 372 [ReadStage-2] Skipped 0/12 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-2] Merged data from memtables and 5 sstables [ReadStage-2] Read 3 live rows and 0 tombstone cells [ReadStage-2] Request complete - Since you mention lots of deletes, I am thinking it could be tombstones. Are you getting any tombstone warnings or errors in your system.log? For each pkey, I get a new event_datetime that makes me delete one of (max) 3 previously saved records in Cassandra. If an pkey doesn't exist in Cassandra I will store it with its event_datetime without deleting anything. In Cassandra's logs I don't have any tombstone warning or error. - When you delete, are you deleting a full partition? Query for deletes: delete from my_keyspace.my_table where pkey = ? and event_datetime = ? IF EXISTS; - [..] And because only one node has the data, a single timeout means you won’t get any data. I will try to increase RF from 1 to 3. I hope to have answered to all your questions Thank you very much! Regards Marco Il giorno gio 27 dic 2018 alle ore 21:09 Durity, Sean R < sean_r_dur...@homedepot.com> ha scritto: > Your RF is only 1, so the data only exists on one node. This is not > typically how Cassandra is used. If you need the high availability and low > latency, you typically set RF to 3 per DC. > > > > How many event_datetime records can you have per pkey? How many pkeys > (roughly) do you have? In general, you only want to have at most 100 MB of > data per partition (pkey). If it is larger than that, I would expect some > timeouts. And because only one node has the data, a single timeout means > you won’t get any data. Server timeouts default to just 10 seconds. The > secret to Cassandra is to always select your data by at least the primary > key (which you are doing). So, I suspect you either have very wide rows or > lots of tombstones. > > > > Since you mention lots of deletes, I am thinking it could be tombstones. > Are you getting any tombstone warnings or errors in your system.log? When > you delete, are you deleting a full partition? If you are deleting just > part of a partition over and over, I think you will be creating too many > tombstones. I try to design my data partitions so that deletes are for a > full partition. Then I won’t be reading through 1000s (or more) tombstones > trying to find the live data. > > > > > > Sean Durity > > > > *From:* Marco Gasparini <marco.gaspar...@competitoor.com> > *Sent:* Thursday, December 27, 2018 3:01 AM > *To:* user@cassandra.apache.org > *Subject:* Re: [EXTERNAL] Writes and Reads with high latency > > > > Hello Sean, > > > > here my schema and RF: > > > > ------------------------------------------------------------------------- > > CREATE KEYSPACE my_keyspace WITH replication = {'class': > 'NetworkTopologyStrategy', 'DC1': '1'} AND durable_writes = true; > > > > CREATE TABLE my_keyspace.my_table ( > > pkey text, > > event_datetime timestamp, > > agent text, > > ft text, > > ftt text, > > some_id bigint, > > PRIMARY KEY (pkey, event_datetime) > > ) WITH CLUSTERING ORDER BY (event_datetime DESC) > > AND bloom_filter_fp_chance = 0.01 > > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > > AND comment = '' > > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > > AND crc_check_chance = 1.0 > > AND dclocal_read_repair_chance = 0.1 > > AND default_time_to_live = 0 > > AND gc_grace_seconds = 90000 > > AND max_index_interval = 2048 > > AND memtable_flush_period_in_ms = 0 > > AND min_index_interval = 128 > > AND read_repair_chance = 0.0 > > AND speculative_retry = '99PERCENTILE'; > > > > ------------------------------------------------------------------------- > > > > Queries I make are very simple: > > > > select pkey, event_datetime, ft, some_id, ftt from my_keyspace.my_table > where pkey = ? limit ?; > > and > > insert into my_keyspace.my_table (event_datetime, pkey, agent, some_id, > ft, ftt) values (?,?,?,?,?,?); > > > > About Retry policy, the answer is yes, actually when a write fails I store > it somewhere else and, after a period, a try to write it to Cassandra > again. This way I can store almost all my data, but when the problem is the > read I don't apply any Retry policy (but this is my problem) > > > > > > Thanks > > Marco > > > > > > Il giorno ven 21 dic 2018 alle ore 17:18 Durity, Sean R < > sean_r_dur...@homedepot.com> ha scritto: > > Can you provide the schema and the queries? What is the RF of the keyspace > for the data? Are you using any Retry policy on your Cluster object? > > > > > > Sean Durity > > > > *From:* Marco Gasparini <marco.gaspar...@competitoor.com> > *Sent:* Friday, December 21, 2018 10:45 AM > *To:* user@cassandra.apache.org > *Subject:* [EXTERNAL] Writes and Reads with high latency > > > > hello all, > > > > I have 1 DC of 3 nodes in which is running Cassandra 3.11.3 with > consistency level ONE and Java 1.8.0_191. > > > > Every day, there are many nodejs programs that send data to the > cassandra's cluster via NodeJs cassandra-driver. > > Every day I got like 600k requests. Each request makes the server to: > > 1_ READ some data in Cassandra (by an id, usually I get 3 records), > > 2_ DELETE one of those records > > 3_ WRITE the data into Cassandra. > > > > So every day I make many deletes. > > > > Every day I find errors like: > > "All host(s) tried for query failed. First host tried, 10.8.0.10:9042 > <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.8.0.10-3A9042_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=Y2zNzOyvqOiHqZ5yvB1rO_X6C-HivNjXYN0bLLL-yZQ&s=2v42cyvuxcXJ0oMfUrRcY-kRno1SkM4CTEMi4n1k0Wo&e=>: > Host considered as DOWN. See innerErrors...." > > "Server timeout during write query at consistency LOCAL_ONE (0 peer(s) > acknowledged the write over 1 required)...." > > "Server timeout during write query at consistency SERIAL (0 peer(s) > acknowledged the write over 1 required)...." > > "Server timeout during read query at consistency LOCAL_ONE (0 peer(s) > acknowledged the read over 1 required)...." > > > > nodetool tablehistograms tells me this: > > > > Percentile SSTables Write Latency Read Latency Partition > Size Cell Count > > (micros) (micros) (bytes) > > 50% 8.00 379.02 1955.67 > 379022 8 > > 75% 10.00 785.94 155469.30 > 654949 17 > > 95% 12.00 17436.92 268650.95 > 1629722 35 > > 98% 12.00 25109.16 322381.14 > 2346799 42 > > 99% 12.00 30130.99 386857.37 > 3379391 50 > > Min 0.00 6.87 88.15 > 104 0 > > Max 12.00 43388.63 386857.37 > 20924300 179 > > > > in the 99% I noted that write and read latency is pretty high, but I don't > know how to improve that. > > I can provide more statistics if needed. > > > > Is there any improvement I can make to the Cassandra's configuration in > order to not to lose any data? > > > > Thanks > > > > Regards > > Marco > > > ------------------------------ > > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. > > > ------------------------------ > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. >