subject:"Re\: compaction strategy"

Re: Newsletter / Marketing: Re: Compaction Strategy

2018-09-21 Thread Ali Hubail

I suspect that you are CPU bound rather than IO bound. There are a lot of
areas to look into, but I would start with a few.
I could not tell much from the results you shared since at the time, there
were no writes happening. Switching to a different compaction strategy
will most likely make it worse for you. as of now, you only use 1 sstable
per read, and STCS is the least expensive compaction type.

For starters,

1) Revise cassandra.yaml for Common disk settings, i.e., concurrent_reads,
concurrent_writes, etc

https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html

2) Ensure that you optimize your OS for C*
https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/config/configRecommendedSettings.html

What I would do next is to monitor the system. The bottleneck you
explained is triggered by clients and it's out of your control. So
3) monitor system resources.
If you have DSE, then use OpsCenter. Otherwise, you can use dstat.
something like 'dstat -taf' would do it. You will have to run this for a
long period of time until the timeouts occur.
So, now you can have a general idea of what resources are saturating.

4) If this is CPU bound, then reduce contention by setting
concurrent_compactors to 1 in cassandra.yaml

5) monitor GC. There are a lot of tools that you can use to do so.
most of the time, it's the GC that is not tuned well. If you are not using
G1GC, then you might want to do so
you can read about GC here briefly:
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsTuneJVM.html
https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/gcPauses.html

6) this sounds naive, but check the logs to see if there is something
interesting there, you can also see the GC pauses there as well.

Ali Hubail

Petrolink International Ltd.

Confidentiality warning: This message and any attachments are intended
only for the persons to whom this message is addressed, are confidential,
and may be privileged. If you are not the intended recipient, you are
hereby notified that any review, retransmission, conversion to hard copy,
copying, modification, circulation or other use of this message and any
attachments is strictly prohibited. If you receive this message in error,
please notify the sender immediately by return email, and delete this
message and any attachments from your system. Petrolink International
Limited its subsidiaries, holding companies and affiliates disclaims all
responsibility from and accepts no liability whatsoever for the
consequences of any unauthorized person acting, or refraining from acting,
on any information contained in this message. For security purposes, staff
training, to assist in resolving complaints and to improve our customer
service, email communications may be monitored and telephone calls may be
recorded.

rajasekhar kommineni
09/20/2018 01:14 PM
Please respond to
user@cassandra.apache.org

To
user@cassandra.apache.org,
cc

Subject
Newsletter / Marketing: Re: Compaction Strategy

Hi Ali,

Please find my answers

1) The table holds customer history data, where we receive the transaction
data everyday for multiple vendors and batch job is executed which updates
the data if the customer do any transactions that day, and insert will
happen if he is new customer.
Reads will happen if the customer visits to calculate the relevancy of
items based on the transactions he had done. I attached the tablestats &
tablehistograms output to file.

2) RAM : 30GB, CPU:4, hard drive : Amazon EBS

3) Attached output to file

Thanks,

On Sep 20, 2018, at 10:53 AM, Ali Hubail wrote:

Hello Rajasekhar,

It's not really clear to me what your workload is. As I understand it, you
do heavy writes, but what about reads?
So, could you:

1) execute
nodetool tablestats
nodetool tablehistograms
nodetool compactionstats

we should be able to see the latency, workload type, and the # of sstable
used for reads

2) specify your hardware specs. i.e., memory size, cpu, # of drives (for
data sstables), and type of harddrives (ssd/hdd)
3) cassandra.yaml (make sure to sanitize it)

You have a lot of updates, and your data is most likely scattered across
different sstables. size compaction strategy (STCS) is much less expensive
than level compaction strategy (LCS).

Stopping the background compaction should be approached with caution, I
think your problem is more to do with why STCS compaction is taking more
resources than you expect.

Regards,

Ali Hubail

Petrolink International Ltd
Confidentiality warning: This message and any attachments are intended
only for the persons to whom this message is addressed, are confidential,
and may be privileged. If you are not the intended recipient, you are
hereby notified that any review, retransmission, conversion to hard copy,
copying, modification, circulation or other use of this message and any
attachments is strictly prohib

Re: Compaction Strategy

2018-09-20 Thread rajasekhar kommineni

kups: false
snapshot_before_compaction: false
auto_snapshot: true
column_index_size_in_kb: 64
column_index_cache_size_in_kb: 2
compaction_throughput_mb_per_sec: 16
sstable_preemptive_open_interval_in_mb: 50
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 1
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 6
request_timeout_in_ms: 1
slow_query_log_timeout_in_ms: 500
cross_node_timeout: false
endpoint_snitch: SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 60
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra
client_encryption_options:
enabled: false
optional: false
keystore: conf/.keystore
keystore_password: cassandra
internode_compression: dc
inter_dc_tcp_nodelay: false
tracetype_query_ttl: 86400
tracetype_repair_ttl: 604800
enable_user_defined_functions: false
enable_scripted_user_defined_functions: false
windows_timer_interval: 1
transparent_data_encryption_options:
enabled: false
chunk_length_kb: 64
cipher: AES/CBC/PKCS5Padding
key_alias: testing:1
key_provider:
  - class_name: org.apache.cassandra.security.JKSKeyProvider
parameters:
  - keystore: conf/.keystore
keystore_password: cassandra
store_type: JCEKS
key_password: cassandra
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 10
batch_size_warn_threshold_in_kb: 5
batch_size_fail_threshold_in_kb: 50
unlogged_batch_across_partitions_warn_threshold: 10
compaction_large_partition_warning_threshold_mb: 100
gc_warn_threshold_in_ms: 1000
back_pressure_enabled: false
back_pressure_strategy:
- class_name: org.apache.cassandra.net.RateBasedBackPressure
  parameters:
- high_ratio: 0.90
  factor: 5
  flow: FAST
prd-relevancy-csdra1:/tmp >
On Sep 20, 2018, at 10:53 AM, Ali Hubail <ali.hub...@petrolink.com> wrote:Hello Rajasekhar,

It's not really clear to me what your
workload is. As I understand it, you do heavy writes, but what about reads?
So, could you:

1) execute 
nodetool tablestats 
nodetool tablehistograms
nodetool compactionstats

we should be able to see the latency,
workload type, and the # of sstable used for reads

2) specify your hardware specs. i.e.,
memory size, cpu, # of drives (for data sstables), and type of harddrives
(ssd/hdd)
3) cassandra.yaml (make sure to sanitize
it)

You have a lot of updates, and your
data is most likely scattered across different sstables. size compaction
strategy (STCS) is much less expensive than level compaction strategy (LCS).

Stopping the background compaction should
be approached with caution, I think your problem is more to do with why
STCS compaction is taking more resources than you expect.

Regards,

Ali Hubail

Petrolink International Ltd
Confidentiality warning: This message and any attachments are intended
only for the persons to whom this message is addressed, are confidential,
and may be privileged. If you are not the intended recipient, you are hereby
notified that any review, retransmission, conversion to hard copy, copying,
modification, circulation or other use of this message and any attachments
is strictly prohibited. If you receive this message in error, please notify
the sender immediately by return email, and delete this message and any
attachments from your system. Petrolink International Limited its subsidiaries,
holding companies and affiliates disclaims all responsibility from and
accepts no liability whatsoever for the consequences of any unauthorized
person acting, or refraining from acting, on any information contained
in this message. For security purposes, staff training, to assist in resolving
complaints and to improve our customer service, email communications may
be monitored and telephone calls may be recorded.

rajasekhar
kommineni <rajaco...@gmail.com> 09/19/2018 04:44 PM

Please respond to
user@cassandra.apache.org

To
user@cassandra.apache.org,

cc

Subject
Re:
Compaction Strategy

Hello,

Can any one respond to my questions. Is it a good idea to disable auto
compaction and schedule it every 3 days. I am unable to control compaction
and it is causing timeouts. 

Also will reducing or increasing compaction_throughput_mb_per_sec eliminate
timeouts ?

Thanks,

> On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni <rajaco...@gmail.com>
wrote:
> 
> Hello Folks,
> 
> I need advice in deciding the compaction strategy for my C cluster.
There are multiple jobs that will load the data with less inserts and more
updates but no deletes. Currently I am using Size Tired compaction, but
seeing auto compact

Re: Compaction Strategy

2018-09-20 Thread Ali Hubail

Hello Rajasekhar,

It's not really clear to me what your workload is. As I understand it, you 
do heavy writes, but what about reads?
So, could you:

1) execute 
nodetool tablestats 
nodetool tablehistograms
nodetool compactionstats

we should be able to see the latency, workload type, and the # of sstable 
used for reads

2) specify your hardware specs. i.e., memory size, cpu, # of drives (for 
data sstables), and type of harddrives (ssd/hdd)
3) cassandra.yaml (make sure to sanitize it)

You have a lot of updates, and your data is most likely scattered across 
different sstables. size compaction strategy (STCS) is much less expensive 
than level compaction strategy (LCS). 

Stopping the background compaction should be approached with caution, I 
think your problem is more to do with why STCS compaction is taking more 
resources than you expect.

Regards,

Ali Hubail

Petrolink International Ltd
Confidentiality warning: This message and any attachments are intended 
only for the persons to whom this message is addressed, are confidential, 
and may be privileged. If you are not the intended recipient, you are 
hereby notified that any review, retransmission, conversion to hard copy, 
copying, modification, circulation or other use of this message and any 
attachments is strictly prohibited. If you receive this message in error, 
please notify the sender immediately by return email, and delete this 
message and any attachments from your system. Petrolink International 
Limited its subsidiaries, holding companies and affiliates disclaims all 
responsibility from and accepts no liability whatsoever for the 
consequences of any unauthorized person acting, or refraining from acting, 
on any information contained in this message. For security purposes, staff 
training, to assist in resolving complaints and to improve our customer 
service, email communications may be monitored and telephone calls may be 
recorded.

rajasekhar kommineni  
09/19/2018 04:44 PM
Please respond to
user@cassandra.apache.org

To
user@cassandra.apache.org, 
cc

Subject
Re: Compaction Strategy

Hello,

Can any one respond to my questions. Is it a good idea to disable auto 
compaction and schedule it every 3 days. I am unable to control compaction 
and it is causing timeouts. 

Also will reducing or increasing compaction_throughput_mb_per_sec 
eliminate timeouts ?

Thanks,

> On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni  
wrote:
> 
> Hello Folks,
> 
> I need advice in deciding the compaction strategy for my C cluster. 
There are multiple jobs that will load the data with less inserts and more 
updates but no deletes. Currently I am using Size Tired compaction, but 
seeing auto compactions after the data load kicks, and also read timeouts 
during compaction.
> 
> Can anyone suggest good compaction strategy for my cluster which will 
reduce the timeouts.
> 
> 
> Thanks,
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Compaction Strategy

2018-09-19 Thread Nitan Kainth

It’s not recommended to disable compaction, you will end up with hundreds to 
thousands of sstables and increased read latency. If your data is immitable, 
means no update/deletes it will have least impact.

Decreasing compaction throughput will release resources for application but 
don’t accumulate too many pending compaction tasks.

Sent from my iPhone

> On Sep 19, 2018, at 4:44 PM, rajasekhar kommineni  wrote:
> 
> Hello,
> 
> Can any one respond to my questions. Is it a good idea to disable auto 
> compaction and schedule it every 3 days. I am unable to control compaction 
> and it is causing timeouts. 
> 
> Also will reducing or increasing compaction_throughput_mb_per_sec eliminate 
> timeouts ?
> 
> Thanks,
> 
> 
>> On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni  
>> wrote:
>> 
>> Hello Folks,
>> 
>> I need advice in deciding the compaction strategy for my C cluster. There 
>> are multiple jobs that will load the data with less inserts and more updates 
>> but no deletes. Currently I am using Size Tired compaction, but seeing auto 
>> compactions after the data load kicks, and also read timeouts during 
>> compaction.
>> 
>> Can anyone suggest good compaction strategy for my cluster which will reduce 
>> the timeouts.
>> 
>> 
>> Thanks,
>> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Compaction Strategy

2018-09-19 Thread rajasekhar kommineni

Hello,

Can any one respond to my questions. Is it a good idea to disable auto 
compaction and schedule it every 3 days. I am unable to control compaction and 
it is causing timeouts. 

Also will reducing or increasing compaction_throughput_mb_per_sec eliminate 
timeouts ?

Thanks,


> On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni  wrote:
> 
> Hello Folks,
> 
> I need advice in deciding the compaction strategy for my C cluster. There are 
> multiple jobs that will load the data with less inserts and more updates but 
> no deletes. Currently I am using Size Tired compaction, but seeing auto 
> compactions after the data load kicks, and also read timeouts during 
> compaction.
> 
> Can anyone suggest good compaction strategy for my cluster which will reduce 
> the timeouts.
> 
> 
> Thanks,
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Compaction strategy for update heavy workload

2018-06-13 Thread kurt greaves

>
> I wouldn't use TWCS if there's updates, you're going to risk having
> data that's never deleted and really small sstables sticking around
> forever.

How do you risk having data sticking around forever when everything is
TTL'd?

If you use really large buckets, what's the point of TWCS?

No one said anything about really large buckets. I'd also note that if the
data was so small per partition it would be entirely reasonable to not
bucket by partition key (and window) and thus updates would become
irrelevant.

Honestly this is such a small workload you could easily use STCS or
> LCS and you'd likely never, ever see a problem.


While the numbers sound small, there must be some logical reason to have so
many nodes. In my experience STCS and LCS both have their own drawbacks in
regards to updates, more so when you have high data density, which sounds
like it might be the case here. It's not hard to test these things and it's
important to get these things right at the start to save yourself some
serious pain down the track.

On 13 June 2018 at 22:41, Jonathan Haddad  wrote:

> I wouldn't use TWCS if there's updates, you're going to risk having
> data that's never deleted and really small sstables sticking around
> forever.  If you use really large buckets, what's the point of TWCS?
>
> Honestly this is such a small workload you could easily use STCS or
> LCS and you'd likely never, ever see a problem.
> On Wed, Jun 13, 2018 at 3:34 PM kurt greaves  wrote:
> >
> > TWCS is probably still worth trying. If you mean updating old rows in
> TWCS "out of order updates" will only really mean you'll hit more SSTables
> on read. This might add a bit of complexity in your client if your
> bucketing partitions (not strictly necessary), but that's about it. As long
> as you're not specifying "USING TIMESTAMP" you still get the main benefit
> of efficient dropping of SSTables - C* only cares about the write timestamp
> of the data in regards to TTL's, not timestamps stored in your
> partition/clustering key.
> > Also keep in mind that you can specify the window size in TWCS, so if
> you can increase it enough to cover the "out of order" updates then that
> will also solve the problem w.r.t old buckets.
> >
> > In regards to LCS, the only way to really know if it'll be too much
> compaction overhead is to test it, but for the most part you should
> consider your read/write ratio, rather than the total number of
> reads/writes (unless it's so small that it's irrelevant, which it may well
> be).
> >
> > On 13 June 2018 at 19:25, manuj singh  wrote:
> >>
> >> Hi all,
> >> I am trying to determine compaction strategy for our use case.
> >> In our use case we will have updates on a row a few times. And we have
> a ttl also defined on the table level.
> >> Our typical workload is less then 1000 writes + reads per second. At
> the max it could go up to 2500 per second.
> >> We use SSD and have around 64 gb of ram on each node. Our cluster size
> is around 70 nodes.
> >>
> >> I looked at time series but we cant guarantee that the updates will
> happen within a give time window. And if we have out of order updates it
> might impact on when we remove that data from the disk.
> >>
> >> So i was looking at level tiered, which supposedly is good when you
> have updates. However its io bound and will affect the writes. everywhere i
> read it says its not good for write heavy workload.
> >> But Looking at our write velocity, is it really write heavy ?
> >>
> >> I guess what i am trying to find out is will level tiered compaction
> will impact the writes in our use case or it will be fine given our write
> rate is not that much.
> >> Also is there anything else i should keep in mind while deciding on the
> compaction strategy.
> >>
> >> Thanks!!
> >
> >
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

Re: Compaction strategy for update heavy workload

2018-06-13 Thread Jonathan Haddad

I wouldn't use TWCS if there's updates, you're going to risk having
data that's never deleted and really small sstables sticking around
forever.  If you use really large buckets, what's the point of TWCS?

Honestly this is such a small workload you could easily use STCS or
LCS and you'd likely never, ever see a problem.
On Wed, Jun 13, 2018 at 3:34 PM kurt greaves  wrote:
>
> TWCS is probably still worth trying. If you mean updating old rows in TWCS 
> "out of order updates" will only really mean you'll hit more SSTables on 
> read. This might add a bit of complexity in your client if your bucketing 
> partitions (not strictly necessary), but that's about it. As long as you're 
> not specifying "USING TIMESTAMP" you still get the main benefit of efficient 
> dropping of SSTables - C* only cares about the write timestamp of the data in 
> regards to TTL's, not timestamps stored in your partition/clustering key.
> Also keep in mind that you can specify the window size in TWCS, so if you can 
> increase it enough to cover the "out of order" updates then that will also 
> solve the problem w.r.t old buckets.
>
> In regards to LCS, the only way to really know if it'll be too much 
> compaction overhead is to test it, but for the most part you should consider 
> your read/write ratio, rather than the total number of reads/writes (unless 
> it's so small that it's irrelevant, which it may well be).
>
> On 13 June 2018 at 19:25, manuj singh  wrote:
>>
>> Hi all,
>> I am trying to determine compaction strategy for our use case.
>> In our use case we will have updates on a row a few times. And we have a ttl 
>> also defined on the table level.
>> Our typical workload is less then 1000 writes + reads per second. At the max 
>> it could go up to 2500 per second.
>> We use SSD and have around 64 gb of ram on each node. Our cluster size is 
>> around 70 nodes.
>>
>> I looked at time series but we cant guarantee that the updates will happen 
>> within a give time window. And if we have out of order updates it might 
>> impact on when we remove that data from the disk.
>>
>> So i was looking at level tiered, which supposedly is good when you have 
>> updates. However its io bound and will affect the writes. everywhere i read 
>> it says its not good for write heavy workload.
>> But Looking at our write velocity, is it really write heavy ?
>>
>> I guess what i am trying to find out is will level tiered compaction will 
>> impact the writes in our use case or it will be fine given our write rate is 
>> not that much.
>> Also is there anything else i should keep in mind while deciding on the 
>> compaction strategy.
>>
>> Thanks!!
>
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Compaction strategy for update heavy workload

2018-06-13 Thread kurt greaves

TWCS is probably still worth trying. If you mean updating old rows in TWCS
"out of order updates" will only really mean you'll hit more SSTables on
read. This might add a bit of complexity in your client if your bucketing
partitions (not strictly necessary), but that's about it. As long as you're
not specifying "USING TIMESTAMP" you still get the main benefit of
efficient dropping of SSTables - C* only cares about the *write timestamp* of
the data in regards to TTL's, not timestamps stored in your
partition/clustering key.
Also keep in mind that you can specify the window size in TWCS, so if you
can increase it enough to cover the "out of order" updates then that will
also solve the problem w.r.t old buckets.

In regards to LCS, the only way to really know if it'll be too much
compaction overhead is to test it, but for the most part you should
consider your read/write ratio, rather than the total number of
reads/writes (unless it's so small that it's irrelevant, which it may well
be).

On 13 June 2018 at 19:25, manuj singh  wrote:

> Hi all,
> I am trying to determine compaction strategy for our use case.
> In our use case we will have updates on a row a few times. And we have a
> ttl also defined on the table level.
> Our typical workload is less then 1000 writes + reads per second. At the
> max it could go up to 2500 per second.
> We use SSD and have around 64 gb of ram on each node. Our cluster size is
> around 70 nodes.
>
> I looked at time series but we cant guarantee that the updates will happen
> within a give time window. And if we have out of order updates it might
> impact on when we remove that data from the disk.
>
> So i was looking at level tiered, which supposedly is good when you have
> updates. However its io bound and will affect the writes. everywhere i read
> it says its not good for write heavy workload.
> But Looking at our write velocity, is it really write heavy ?
>
> I guess what i am trying to find out is will level tiered compaction will
> impact the writes in our use case or it will be fine given our write rate
> is not that much.
> Also is there anything else i should keep in mind while deciding on the
> compaction strategy.
>
> Thanks!!
>

43 matches

Mail list logo