Re: [EXTERNAL] Writes and Reads with high latency

Marco Gasparini Fri, 28 Dec 2018 06:17:58 -0800

- How many event_datetime records can you have per pkey?
during a day of work I can have less than 10 event_datetime records per
pkey.
Every day I maintain maximum 3 of them, so each new event_datetime for a
pkey determines a delete and an insert into Cassandra.


- How many pkeys (roughly) do you have?
Few millions but it is going to rise up.


- In general, you only want to have at most 100 MB of data per partition
(pkey). If it is larger than that, I would expect some timeouts. I suspect
you either have very wide rows or lots of tombstones

I ran some nodetool commands in order to give you more data:

CFSTATS output:

nodetool cfstats my_keyspace.my_table -H
Total number of tables: 52
----------------
Keyspace : my_keyspace
Read Count: 2441795
Read Latency: 400.53986035478 ms
Write Count: 5097368
Write Latency: 6.494159368913525 ms
Pending Flushes: 0
Table: my_table
SSTable count: 13
Space used (live): 185.45 GiB
Space used (total): 185.45 GiB
Space used by snapshots (total): 0 bytes
Off heap memory used (total): 80.66 MiB
SSTable Compression Ratio: 0.2973552755387901
Number of partitions (estimate): 762039
Memtable cell count: 915
Memtable data size: 43.75 MiB
Memtable off heap memory used: 0 bytes
Memtable switch count: 598
Local read count: 2441795
Local read latency: 93.186 ms
Local write count: 5097368
Local write latency: 3.189 ms
Pending flushes: 0
Percent repaired: 0.0
Bloom filter false positives: 5719
Bloom filter false ratio: 0.00000
Bloom filter space used: 1.65 MiB
Bloom filter off heap memory used: 1.65 MiB
Index summary off heap memory used: 1.17 MiB
Compression metadata off heap memory used: 77.83 MiB
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 20924300
Compacted partition mean bytes: 529420
Average live cells per slice (last five minutes): 2.0
Maximum live cells per slice (last five minutes): 3
Average tombstones per slice (last five minutes): 7.423841059602649
Maximum tombstones per slice (last five minutes): 50
Dropped Mutations: 0 bytes

----------------

CFHISTOGRAMS output:

nodetool cfhistograms my_keyspace my_table
my_keyspace/my_table histograms
Percentile  SSTables     Write Latency      Read Latency    Partition Size
      Cell Count
  (micros)          (micros)           (bytes)
50%            10.00            379.02           1955.67            379022
               8
75%            12.00            654.95         186563.16            654949
              17
95%            12.00          20924.30         268650.95           1629722
              35
98%            12.00          20924.30         322381.14           2346799
              42
99%            12.00          20924.30         386857.37           3379391
              50
Min             0.00              6.87             88.15               104
               0
Max            12.00          25109.16         464228.84          20924300
             179

I tried to enable 'tracing on' on CQLSH cli and make some queries in order
to find out if there are tombstones scanned frequentely
but, in my little sample of queries, I got almost similar answers like the
following:

Preparing statement [Native-Transport-Requests-1]
Executing single-partition query on my_table [ReadStage-2]
Acquiring sstable references [ReadStage-2]
Bloom filter allows skipping sstable 2581 [ReadStage-2]
Bloom filter allows skipping sstable 2580 [ReadStage-2]
Bloom filter allows skipping sstable 2575 [ReadStage-2]
Partition index with 2 entries found for sstable 2570 [ReadStage-2]
Bloom filter allows skipping sstable 2548 [ReadStage-2]
Bloom filter allows skipping sstable 2463 [ReadStage-2]
Bloom filter allows skipping sstable 2416 [ReadStage-2]
Partition index with 3 entries found for sstable 2354 [ReadStage-2]
Bloom filter allows skipping sstable 1784 [ReadStage-2]
Partition index with 5 entries found for sstable 1296 [ReadStage-2]
Partition index with 3 entries found for sstable 1002 [ReadStage-2]
Partition index with 3 entries found for sstable 372 [ReadStage-2]
Skipped 0/12 non-slice-intersecting sstables, included 0 due to tombstones
[ReadStage-2]
Merged data from memtables and 5 sstables [ReadStage-2]
Read 3 live rows and 0 tombstone cells [ReadStage-2]
Request complete


- Since you mention lots of deletes, I am thinking it could be tombstones.
Are you getting any tombstone warnings or errors in your system.log?

For each pkey, I get a new event_datetime that makes me delete one of (max)
3 previously saved records in Cassandra.
If an pkey doesn't exist in Cassandra I will store it with its
event_datetime without deleting anything.

In Cassandra's logs I don't have any tombstone warning or error.


- When you delete, are you deleting a full partition?

Query for deletes:
delete from my_keyspace.my_table where pkey = ? and event_datetime = ? IF
EXISTS;


-  [..] And because only one node has the data, a single timeout means you
won’t get any data.

I will try to increase RF from 1 to 3.


I hope to have answered to all your questions
Thank you very much!

Regards
Marco


Il giorno gio 27 dic 2018 alle ore 21:09 Durity, Sean R <
sean_r_dur...@homedepot.com> ha scritto:

> Your RF is only 1, so the data only exists on one node. This is not
> typically how Cassandra is used. If you need the high availability and low
> latency, you typically set RF to 3 per DC.
>
>
>
> How many event_datetime records can you have per pkey? How many pkeys
> (roughly) do you have? In general, you only want to have at most 100 MB of
> data per partition (pkey). If it is larger than that, I would expect some
> timeouts. And because only one node has the data, a single timeout means
> you won’t get any data. Server timeouts default to just 10 seconds. The
> secret to Cassandra is to always select your data by at least the primary
> key (which you are doing). So, I suspect you either have very wide rows or
> lots of tombstones.
>
>
>
> Since you mention lots of deletes, I am thinking it could be tombstones.
> Are you getting any tombstone warnings or errors in your system.log? When
> you delete, are you deleting a full partition? If you are deleting just
> part of a partition over and over, I think you will be creating too many
> tombstones. I try to design my data partitions so that deletes are for a
> full partition. Then I won’t be reading through 1000s (or more) tombstones
> trying to find the live data.
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Marco Gasparini <marco.gaspar...@competitoor.com>
> *Sent:* Thursday, December 27, 2018 3:01 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [EXTERNAL] Writes and Reads with high latency
>
>
>
> Hello Sean,
>
>
>
> here my schema and RF:
>
>
>
> -------------------------------------------------------------------------
>
> CREATE KEYSPACE my_keyspace WITH replication = {'class':
> 'NetworkTopologyStrategy', 'DC1': '1'}  AND durable_writes = true;
>
>
>
> CREATE TABLE my_keyspace.my_table (
>
>     pkey text,
>
>     event_datetime timestamp,
>
>     agent text,
>
>     ft text,
>
>     ftt text,
>
>     some_id bigint,
>
>     PRIMARY KEY (pkey, event_datetime)
>
> ) WITH CLUSTERING ORDER BY (event_datetime DESC)
>
>     AND bloom_filter_fp_chance = 0.01
>
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
>     AND comment = ''
>
>     AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>
>     AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>
>     AND crc_check_chance = 1.0
>
>     AND dclocal_read_repair_chance = 0.1
>
>     AND default_time_to_live = 0
>
>     AND gc_grace_seconds = 90000
>
>     AND max_index_interval = 2048
>
>     AND memtable_flush_period_in_ms = 0
>
>     AND min_index_interval = 128
>
>     AND read_repair_chance = 0.0
>
>     AND speculative_retry = '99PERCENTILE';
>
>
>
> -------------------------------------------------------------------------
>
>
>
> Queries I make are very simple:
>
>
>
> select pkey, event_datetime, ft, some_id, ftt from my_keyspace.my_table
> where pkey = ? limit ?;
>
> and
>
> insert into my_keyspace.my_table (event_datetime, pkey, agent, some_id,
> ft, ftt) values (?,?,?,?,?,?);
>
>
>
> About Retry policy, the answer is yes, actually when a write fails I store
> it somewhere else and, after a period, a try to write it to Cassandra
> again. This way I can store almost all my data, but when the problem is the
> read I don't apply any Retry policy (but this is my problem)
>
>
>
>
>
> Thanks
>
> Marco
>
>
>
>
>
> Il giorno ven 21 dic 2018 alle ore 17:18 Durity, Sean R <
> sean_r_dur...@homedepot.com> ha scritto:
>
> Can you provide the schema and the queries? What is the RF of the keyspace
> for the data? Are you using any Retry policy on your Cluster object?
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Marco Gasparini <marco.gaspar...@competitoor.com>
> *Sent:* Friday, December 21, 2018 10:45 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Writes and Reads with high latency
>
>
>
> hello all,
>
>
>
> I have 1 DC of 3 nodes in which is running Cassandra 3.11.3 with
> consistency level ONE and Java 1.8.0_191.
>
>
>
> Every day, there are many nodejs programs that send data to the
> cassandra's cluster via NodeJs cassandra-driver.
>
> Every day I got like 600k requests. Each request makes the server to:
>
> 1_ READ some data in Cassandra (by an id, usually I get 3 records),
>
> 2_ DELETE one of those records
>
> 3_ WRITE the data into Cassandra.
>
>
>
> So every day I make many deletes.
>
>
>
> Every day I find errors like:
>
> "All host(s) tried for query failed. First host tried, 10.8.0.10:9042
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.8.0.10-3A9042_&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=Y2zNzOyvqOiHqZ5yvB1rO_X6C-HivNjXYN0bLLL-yZQ&s=2v42cyvuxcXJ0oMfUrRcY-kRno1SkM4CTEMi4n1k0Wo&e=>:
> Host considered as DOWN. See innerErrors...."
>
> "Server timeout during write query at consistency LOCAL_ONE (0 peer(s)
> acknowledged the write over 1 required)...."
>
> "Server timeout during write query at consistency SERIAL (0 peer(s)
> acknowledged the write over 1 required)...."
>
> "Server timeout during read query at consistency LOCAL_ONE (0 peer(s)
> acknowledged the read over 1 required)...."
>
>
>
> nodetool tablehistograms tells me this:
>
>
>
> Percentile  SSTables     Write Latency      Read Latency    Partition
> Size        Cell Count
>
>                               (micros)          (micros)           (bytes)
>
> 50%             8.00            379.02           1955.67
> 379022                 8
>
> 75%            10.00            785.94         155469.30
> 654949                17
>
> 95%            12.00          17436.92         268650.95
>  1629722                35
>
> 98%            12.00          25109.16         322381.14
>  2346799                42
>
> 99%            12.00          30130.99         386857.37
>  3379391                50
>
> Min             0.00              6.87             88.15
>  104                 0
>
> Max            12.00          43388.63         386857.37
> 20924300               179
>
>
>
> in the 99% I noted that write and read latency is pretty high, but I don't
> know how to improve that.
>
> I can provide more statistics if needed.
>
>
>
> Is there any improvement I can make to the Cassandra's configuration in
> order to not to lose any data?
>
>
>
> Thanks
>
>
>
> Regards
>
> Marco
>
>
> ------------------------------
>
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
>
> ------------------------------
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>

Re: [EXTERNAL] Writes and Reads with high latency

Reply via email to