Re: Compaction Strategy

rajasekhar kommineni Thu, 20 Sep 2018 11:14:49 -0700

Hi Ali,

Please find my answers

1) The table holds customer history data, where we receive the transaction data everyday for multiple vendors and batch job is executed which updates the data if the customer do any transactions that day, and insert will happen if he is new customer.

Reads will happen if the customer visits to calculate the relevancy of items based on the transactions he had done. I attached the tablestats & tablehistograms output to file.

2) RAM : 30GB, CPU:4, hard drive : Amazon EBS

3) Attached output to file

Thanks,

Keyspace : keyspace
        Read Count: 365449
        Read Latency: 1.6671318460304994 ms
        Write Count: 90843181
        Write Latency: 0.028631624447408993 ms
        Pending Flushes: 0
                Table: table_name
                SSTable count: 53
                Space used (live): 226.53 GiB
                Space used (total): 226.53 GiB
                Space used by snapshots (total): 0 bytes
                Off heap memory used (total): 292.33 MiB
                SSTable Compression Ratio: 0.1844571292083186
                Number of partitions (estimate): 7673279
                Memtable cell count: 10203
                Memtable data size: 617.24 MiB
                Memtable off heap memory used: 0 bytes
                Memtable switch count: 2904
                Local read count: 365449
                Local read latency: 0.114 ms
                Local write count: 90843181
                Local write latency: NaN ms
                Pending flushes: 0
                Percent repaired: 95.69
                Bloom filter false positives: 0
                Bloom filter false ratio: 0.00000
                Bloom filter space used: 117.26 MiB
                Bloom filter off heap memory used: 117.26 MiB
                Index summary off heap memory used: 23.64 MiB
                Compression metadata off heap memory used: 151.43 MiB
                Compacted partition minimum bytes: 73
                Compacted partition maximum bytes: 126934
                Compacted partition mean bytes: 29683
                Average live cells per slice (last five minutes): 1.0
                Maximum live cells per slice (last five minutes): 1
                Average tombstones per slice (last five minutes): 1.0
                Maximum tombstones per slice (last five minutes): 1
                Dropped Mutations: 50.27 KiB


----------------

Percentile  SSTables     Write Latency      Read Latency    Partition Size      
  Cell Count
                              (micros)          (micros)           (bytes)
50%             1.00              0.00             73.46             35425      
           1
75%             1.00              0.00            105.78             42510      
           1
95%             1.00              0.00            126.93             51012      
           1
98%             1.00              0.00            126.93             51012      
           1
99%             1.00              0.00            126.93             73457      
           1
Min             1.00              0.00             61.22                73      
           0
Max             1.00              0.00            126.93            126934      
           1



cluster_name: 'Cluster'
num_tokens: 256
hinted_handoff_enabled: true
max_hint_window_in_ms: 10800000 # 3 hours
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
hints_directory: /var/lib/cassandra/hints
hints_flush_period_in_ms: 10000
max_hints_file_size_in_mb: 128
batchlog_replay_throttle_in_kb: 1024
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer
role_manager: CassandraRoleManager
roles_validity_in_ms: 2000
permissions_validity_in_ms: 2000
credentials_validity_in_ms: 2000
partitioner: org.apache.cassandra.dht.Murmur3Partitioner
data_file_directories:
    - /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
cdc_enabled: false
disk_failure_policy: stop
commit_failure_policy: stop
prepared_statements_cache_size_mb:
thrift_prepared_statements_cache_size_mb:
key_cache_size_in_mb: 1024
key_cache_save_period: 14400
row_cache_size_in_mb: 0
row_cache_save_period: 0
counter_cache_size_in_mb:
counter_cache_save_period: 7200
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32
concurrent_reads: 32
concurrent_writes: 32
concurrent_counter_writes: 32
concurrent_materialized_view_writes: 32
memtable_allocation_type: heap_buffers
index_summary_capacity_in_mb:
index_summary_resize_interval_in_minutes: 60
trickle_fsync: false
trickle_fsync_interval_in_kb: 10240
storage_port: 7000
ssl_storage_port: 7001
listen_address: prd-relevancy-csdra1.softcoin.com
start_native_transport: true
native_transport_port: 9042
start_rpc: false
rpc_address: prd-relevancy-csdra1.softcoin.com
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
incremental_backups: false
snapshot_before_compaction: false
auto_snapshot: true
column_index_size_in_kb: 64
column_index_cache_size_in_kb: 2
compaction_throughput_mb_per_sec: 16
sstable_preemptive_open_interval_in_mb: 50
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 10000
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 60000
request_timeout_in_ms: 10000
slow_query_log_timeout_in_ms: 500
cross_node_timeout: false
endpoint_snitch: SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
server_encryption_options:
    internode_encryption: none
    keystore: conf/.keystore
    keystore_password: cassandra
    truststore: conf/.truststore
    truststore_password: cassandra
client_encryption_options:
    enabled: false
    optional: false
    keystore: conf/.keystore
    keystore_password: cassandra
internode_compression: dc
inter_dc_tcp_nodelay: false
tracetype_query_ttl: 86400
tracetype_repair_ttl: 604800
enable_user_defined_functions: false
enable_scripted_user_defined_functions: false
windows_timer_interval: 1
transparent_data_encryption_options:
    enabled: false
    chunk_length_kb: 64
    cipher: AES/CBC/PKCS5Padding
    key_alias: testing:1
    key_provider:
      - class_name: org.apache.cassandra.security.JKSKeyProvider
        parameters:
          - keystore: conf/.keystore
            keystore_password: cassandra
            store_type: JCEKS
            key_password: cassandra
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
batch_size_warn_threshold_in_kb: 5
batch_size_fail_threshold_in_kb: 50
unlogged_batch_across_partitions_warn_threshold: 10
compaction_large_partition_warning_threshold_mb: 100
gc_warn_threshold_in_ms: 1000
back_pressure_enabled: false
back_pressure_strategy:
    - class_name: org.apache.cassandra.net.RateBasedBackPressure
      parameters:
        - high_ratio: 0.90
          factor: 5
          flow: FAST
prd-relevancy-csdra1:/tmp >

On Sep 20, 2018, at 10:53 AM, Ali Hubail <ali.hub...@petrolink.com> wrote:

Hello Rajasekhar,

It's not really clear to me what your workload is. As I understand it, you do heavy writes, but what about reads?
So, could you:

1) execute
nodetool tablestats
nodetool tablehistograms
nodetool compactionstats

we should be able to see the latency, workload type, and the # of sstable used for reads

2) specify your hardware specs. i.e., memory size, cpu, # of drives (for data sstables), and type of harddrives (ssd/hdd)
3) cassandra.yaml (make sure to sanitize it)

You have a lot of updates, and your data is most likely scattered across different sstables. size compaction strategy (STCS) is much less expensive than level compaction strategy (LCS).

Stopping the background compaction should be approached with caution, I think your problem is more to do with why STCS compaction is taking more resources than you expect.

Regards,

Ali Hubail

Petrolink International Ltd
Confidentiality warning: This message and any attachments are intended only for the persons to whom this message is addressed, are confidential, and may be privileged. If you are not the intended recipient, you are hereby notified that any review, retransmission, conversion to hard copy, copying, modification, circulation or other use of this message and any attachments is strictly prohibited. If you receive this message in error, please notify the sender immediately by return email, and delete this message and any attachments from your system. Petrolink International Limited its subsidiaries, holding companies and affiliates disclaims all responsibility from and accepts no liability whatsoever for the consequences of any unauthorized person acting, or refraining from acting, on any information contained in this message. For security purposes, staff training, to assist in resolving complaints and to improve our customer service, email communications may be monitored and telephone calls may be recorded.

rajasekhar kommineni <rajaco...@gmail.com>
09/19/2018 04:44 PM

Please respond to
user@cassandra.apache.org

To
user@cassandra.apache.org,

cc

Subject
Re: Compaction Strategy

Hello, Can any one respond to my questions. Is it a good idea to disable auto compaction and schedule it every 3 days. I am unable to control compaction and it is causing timeouts. Also will reducing or increasing compaction_throughput_mb_per_sec eliminate timeouts ? Thanks, > On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni <rajaco...@gmail.com> wrote: > > Hello Folks, > > I need advice in deciding the compaction strategy for my C cluster. There are multiple jobs that will load the data with less inserts and more updates but no deletes. Currently I am using Size Tired compaction, but seeing auto compactions after the data load kicks, and also read timeouts during compaction. > > Can anyone suggest good compaction strategy for my cluster which will reduce the timeouts. > > > Thanks, > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Compaction Strategy

Reply via email to