Re: Newsletter / Marketing: Re: Compaction Strategy

2018-09-21 Thread Ali Hubail
I suspect that you are CPU bound rather than IO bound. There are a lot of 
areas to look into, but I would start with a few.
I could not tell much from the results you shared since at the time, there 
were no writes happening. Switching to a different compaction strategy 
will most likely make it worse for you. as of now, you only use 1 sstable 
per read, and STCS is the least expensive compaction type.

For starters,

1) Revise cassandra.yaml for Common disk settings, i.e., concurrent_reads, 
concurrent_writes, etc
 
https://docs.datastax.com/en/cassandra/3.0/cassandra/configuration/configCassandra_yaml.html

2) Ensure that you optimize your OS for C*
https://docs.datastax.com/en/dse/6.0/dse-admin/datastax_enterprise/config/configRecommendedSettings.html

What I would do next is to monitor the system. The bottleneck you 
explained is triggered by clients and it's out of your control. So
3) monitor system resources.
 If you have DSE, then use OpsCenter. Otherwise, you can use dstat. 
something like 'dstat -taf' would do it. You will have to run this for a 
long period of time until the timeouts occur.
So, now you can have a general idea of what resources are saturating.

4) If this is CPU bound, then reduce contention by setting 
concurrent_compactors to 1 in cassandra.yaml

5) monitor GC. There are a lot of tools that you can use to do so.
most of the time, it's the GC that is not tuned well. If you are not using 
G1GC, then you might want to do so
you can read about GC here briefly:
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsTuneJVM.html
https://docs.datastax.com/en/dse-trblshoot/doc/troubleshooting/gcPauses.html

6) this sounds naive, but check the logs to see if there is something 
interesting there, you can also see the GC pauses there as well.

Ali Hubail

Petrolink International Ltd.

Confidentiality warning: This message and any attachments are intended 
only for the persons to whom this message is addressed, are confidential, 
and may be privileged. If you are not the intended recipient, you are 
hereby notified that any review, retransmission, conversion to hard copy, 
copying, modification, circulation or other use of this message and any 
attachments is strictly prohibited. If you receive this message in error, 
please notify the sender immediately by return email, and delete this 
message and any attachments from your system. Petrolink International 
Limited its subsidiaries, holding companies and affiliates disclaims all 
responsibility from and accepts no liability whatsoever for the 
consequences of any unauthorized person acting, or refraining from acting, 
on any information contained in this message. For security purposes, staff 
training, to assist in resolving complaints and to improve our customer 
service, email communications may be monitored and telephone calls may be 
recorded.



rajasekhar kommineni  
09/20/2018 01:14 PM
Please respond to
user@cassandra.apache.org


To
user@cassandra.apache.org, 
cc

Subject
Newsletter / Marketing: Re: Compaction Strategy






Hi Ali,

Please find my answers 

1) The table holds customer history data, where we receive the transaction 
data everyday for multiple vendors and batch job is executed which updates 
the data if the customer do any transactions that day, and insert will 
happen if he is new customer. 
Reads will happen if the customer visits to calculate the relevancy of 
items based on the transactions he had done.  I attached the tablestats & 
tablehistograms output to file.

2) RAM : 30GB, CPU:4, hard drive : Amazon EBS

3) Attached output to file

Thanks,


On Sep 20, 2018, at 10:53 AM, Ali Hubail  wrote:

Hello Rajasekhar, 

It's not really clear to me what your workload is. As I understand it, you 
do heavy writes, but what about reads? 
So, could you: 

1) execute 
nodetool tablestats 
nodetool tablehistograms 
nodetool compactionstats 

we should be able to see the latency, workload type, and the # of sstable 
used for reads 

2) specify your hardware specs. i.e., memory size, cpu, # of drives (for 
data sstables), and type of harddrives (ssd/hdd) 
3) cassandra.yaml (make sure to sanitize it) 

You have a lot of updates, and your data is most likely scattered across 
different sstables. size compaction strategy (STCS) is much less expensive 
than level compaction strategy (LCS). 

Stopping the background compaction should be approached with caution, I 
think your problem is more to do with why STCS compaction is taking more 
resources than you expect. 

Regards, 

Ali Hubail

Petrolink International Ltd
Confidentiality warning: This message and any attachments are intended 
only for the persons to whom this message is addressed, are confidential, 
and may be privileged. If you are not the intended recipient, you are 
hereby notified that any review, retransmission, conversion to hard copy, 
copying, modification, circulation or other use of this message and any 
attachments is strictly prohib

Re: Compaction Strategy

2018-09-20 Thread rajasekhar kommineni
kups: false
snapshot_before_compaction: false
auto_snapshot: true
column_index_size_in_kb: 64
column_index_cache_size_in_kb: 2
compaction_throughput_mb_per_sec: 16
sstable_preemptive_open_interval_in_mb: 50
read_request_timeout_in_ms: 5000
range_request_timeout_in_ms: 1
write_request_timeout_in_ms: 2000
counter_write_request_timeout_in_ms: 5000
cas_contention_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 6
request_timeout_in_ms: 1
slow_query_log_timeout_in_ms: 500
cross_node_timeout: false
endpoint_snitch: SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 60
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
server_encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra
client_encryption_options:
enabled: false
optional: false
keystore: conf/.keystore
keystore_password: cassandra
internode_compression: dc
inter_dc_tcp_nodelay: false
tracetype_query_ttl: 86400
tracetype_repair_ttl: 604800
enable_user_defined_functions: false
enable_scripted_user_defined_functions: false
windows_timer_interval: 1
transparent_data_encryption_options:
enabled: false
chunk_length_kb: 64
cipher: AES/CBC/PKCS5Padding
key_alias: testing:1
key_provider:
  - class_name: org.apache.cassandra.security.JKSKeyProvider
parameters:
  - keystore: conf/.keystore
keystore_password: cassandra
store_type: JCEKS
key_password: cassandra
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 10
batch_size_warn_threshold_in_kb: 5
batch_size_fail_threshold_in_kb: 50
unlogged_batch_across_partitions_warn_threshold: 10
compaction_large_partition_warning_threshold_mb: 100
gc_warn_threshold_in_ms: 1000
back_pressure_enabled: false
back_pressure_strategy:
- class_name: org.apache.cassandra.net.RateBasedBackPressure
  parameters:
- high_ratio: 0.90
  factor: 5
  flow: FAST
prd-relevancy-csdra1:/tmp >
On Sep 20, 2018, at 10:53 AM, Ali Hubail <ali.hub...@petrolink.com> wrote:Hello Rajasekhar,

It's not really clear to me what your
workload is. As I understand it, you do heavy writes, but what about reads?
So, could you:

1) execute 
nodetool tablestats 
nodetool tablehistograms
nodetool compactionstats

we should be able to see the latency,
workload type, and the # of sstable used for reads

2) specify your hardware specs. i.e.,
memory size, cpu, # of drives (for data sstables), and type of harddrives
(ssd/hdd)
3) cassandra.yaml (make sure to sanitize
it)

You have a lot of updates, and your
data is most likely scattered across different sstables. size compaction
strategy (STCS) is much less expensive than level compaction strategy (LCS).


Stopping the background compaction should
be approached with caution, I think your problem is more to do with why
STCS compaction is taking more resources than you expect.

Regards,

Ali Hubail

Petrolink International Ltd
Confidentiality warning: This message and any attachments are intended
only for the persons to whom this message is addressed, are confidential,
and may be privileged. If you are not the intended recipient, you are hereby
notified that any review, retransmission, conversion to hard copy, copying,
modification, circulation or other use of this message and any attachments
is strictly prohibited. If you receive this message in error, please notify
the sender immediately by return email, and delete this message and any
attachments from your system. Petrolink International Limited its subsidiaries,
holding companies and affiliates disclaims all responsibility from and
accepts no liability whatsoever for the consequences of any unauthorized
person acting, or refraining from acting, on any information contained
in this message. For security purposes, staff training, to assist in resolving
complaints and to improve our customer service, email communications may
be monitored and telephone calls may be recorded.





rajasekhar
kommineni <rajaco...@gmail.com> 09/19/2018 04:44 PM



Please respond to
user@cassandra.apache.org





To
user@cassandra.apache.org,



cc



Subject
Re:
Compaction Strategy








Hello,

Can any one respond to my questions. Is it a good idea to disable auto
compaction and schedule it every 3 days. I am unable to control compaction
and it is causing timeouts. 

Also will reducing or increasing compaction_throughput_mb_per_sec eliminate
timeouts ?

Thanks,


> On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni <rajaco...@gmail.com>
wrote:
> 
> Hello Folks,
> 
> I need advice in deciding the compaction strategy for my C cluster.
There are multiple jobs that will load the data with less inserts and more
updates but no deletes. Currently I am using Size Tired compaction, but
seeing auto compact

Re: Compaction Strategy

2018-09-20 Thread Ali Hubail
Hello Rajasekhar,

It's not really clear to me what your workload is. As I understand it, you 
do heavy writes, but what about reads?
So, could you:

1) execute 
nodetool tablestats 
nodetool tablehistograms
nodetool compactionstats

we should be able to see the latency, workload type, and the # of sstable 
used for reads

2) specify your hardware specs. i.e., memory size, cpu, # of drives (for 
data sstables), and type of harddrives (ssd/hdd)
3) cassandra.yaml (make sure to sanitize it)

You have a lot of updates, and your data is most likely scattered across 
different sstables. size compaction strategy (STCS) is much less expensive 
than level compaction strategy (LCS). 

Stopping the background compaction should be approached with caution, I 
think your problem is more to do with why STCS compaction is taking more 
resources than you expect.

Regards,

Ali Hubail

Petrolink International Ltd
Confidentiality warning: This message and any attachments are intended 
only for the persons to whom this message is addressed, are confidential, 
and may be privileged. If you are not the intended recipient, you are 
hereby notified that any review, retransmission, conversion to hard copy, 
copying, modification, circulation or other use of this message and any 
attachments is strictly prohibited. If you receive this message in error, 
please notify the sender immediately by return email, and delete this 
message and any attachments from your system. Petrolink International 
Limited its subsidiaries, holding companies and affiliates disclaims all 
responsibility from and accepts no liability whatsoever for the 
consequences of any unauthorized person acting, or refraining from acting, 
on any information contained in this message. For security purposes, staff 
training, to assist in resolving complaints and to improve our customer 
service, email communications may be monitored and telephone calls may be 
recorded.



rajasekhar kommineni  
09/19/2018 04:44 PM
Please respond to
user@cassandra.apache.org


To
user@cassandra.apache.org, 
cc

Subject
Re: Compaction Strategy






Hello,

Can any one respond to my questions. Is it a good idea to disable auto 
compaction and schedule it every 3 days. I am unable to control compaction 
and it is causing timeouts. 

Also will reducing or increasing compaction_throughput_mb_per_sec 
eliminate timeouts ?

Thanks,


> On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni  
wrote:
> 
> Hello Folks,
> 
> I need advice in deciding the compaction strategy for my C cluster. 
There are multiple jobs that will load the data with less inserts and more 
updates but no deletes. Currently I am using Size Tired compaction, but 
seeing auto compactions after the data load kicks, and also read timeouts 
during compaction.
> 
> Can anyone suggest good compaction strategy for my cluster which will 
reduce the timeouts.
> 
> 
> Thanks,
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org




Re: Compaction Strategy

2018-09-19 Thread Nitan Kainth
It’s not recommended to disable compaction, you will end up with hundreds to 
thousands of sstables and increased read latency. If your data is immitable, 
means no update/deletes it will have least impact.

Decreasing compaction throughput will release resources for application but 
don’t accumulate too many pending compaction tasks.

Sent from my iPhone

> On Sep 19, 2018, at 4:44 PM, rajasekhar kommineni  wrote:
> 
> Hello,
> 
> Can any one respond to my questions. Is it a good idea to disable auto 
> compaction and schedule it every 3 days. I am unable to control compaction 
> and it is causing timeouts. 
> 
> Also will reducing or increasing compaction_throughput_mb_per_sec eliminate 
> timeouts ?
> 
> Thanks,
> 
> 
>> On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni  
>> wrote:
>> 
>> Hello Folks,
>> 
>> I need advice in deciding the compaction strategy for my C cluster. There 
>> are multiple jobs that will load the data with less inserts and more updates 
>> but no deletes. Currently I am using Size Tired compaction, but seeing auto 
>> compactions after the data load kicks, and also read timeouts during 
>> compaction.
>> 
>> Can anyone suggest good compaction strategy for my cluster which will reduce 
>> the timeouts.
>> 
>> 
>> Thanks,
>> 
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Compaction Strategy

2018-09-19 Thread rajasekhar kommineni
Hello,

Can any one respond to my questions. Is it a good idea to disable auto 
compaction and schedule it every 3 days. I am unable to control compaction and 
it is causing timeouts. 

Also will reducing or increasing compaction_throughput_mb_per_sec eliminate 
timeouts ?

Thanks,


> On Sep 17, 2018, at 9:38 PM, rajasekhar kommineni  wrote:
> 
> Hello Folks,
> 
> I need advice in deciding the compaction strategy for my C cluster. There are 
> multiple jobs that will load the data with less inserts and more updates but 
> no deletes. Currently I am using Size Tired compaction, but seeing auto 
> compactions after the data load kicks, and also read timeouts during 
> compaction.
> 
> Can anyone suggest good compaction strategy for my cluster which will reduce 
> the timeouts.
> 
> 
> Thanks,
> 


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Compaction strategy for update heavy workload

2018-06-13 Thread kurt greaves
>
> I wouldn't use TWCS if there's updates, you're going to risk having
> data that's never deleted and really small sstables sticking around
> forever.

How do you risk having data sticking around forever when everything is
TTL'd?

If you use really large buckets, what's the point of TWCS?

No one said anything about really large buckets. I'd also note that if the
data was so small per partition it would be entirely reasonable to not
bucket by partition key (and window) and thus updates would become
irrelevant.

Honestly this is such a small workload you could easily use STCS or
> LCS and you'd likely never, ever see a problem.


While the numbers sound small, there must be some logical reason to have so
many nodes. In my experience STCS and LCS both have their own drawbacks in
regards to updates, more so when you have high data density, which sounds
like it might be the case here. It's not hard to test these things and it's
important to get these things right at the start to save yourself some
serious pain down the track.

On 13 June 2018 at 22:41, Jonathan Haddad  wrote:

> I wouldn't use TWCS if there's updates, you're going to risk having
> data that's never deleted and really small sstables sticking around
> forever.  If you use really large buckets, what's the point of TWCS?
>
> Honestly this is such a small workload you could easily use STCS or
> LCS and you'd likely never, ever see a problem.
> On Wed, Jun 13, 2018 at 3:34 PM kurt greaves  wrote:
> >
> > TWCS is probably still worth trying. If you mean updating old rows in
> TWCS "out of order updates" will only really mean you'll hit more SSTables
> on read. This might add a bit of complexity in your client if your
> bucketing partitions (not strictly necessary), but that's about it. As long
> as you're not specifying "USING TIMESTAMP" you still get the main benefit
> of efficient dropping of SSTables - C* only cares about the write timestamp
> of the data in regards to TTL's, not timestamps stored in your
> partition/clustering key.
> > Also keep in mind that you can specify the window size in TWCS, so if
> you can increase it enough to cover the "out of order" updates then that
> will also solve the problem w.r.t old buckets.
> >
> > In regards to LCS, the only way to really know if it'll be too much
> compaction overhead is to test it, but for the most part you should
> consider your read/write ratio, rather than the total number of
> reads/writes (unless it's so small that it's irrelevant, which it may well
> be).
> >
> > On 13 June 2018 at 19:25, manuj singh  wrote:
> >>
> >> Hi all,
> >> I am trying to determine compaction strategy for our use case.
> >> In our use case we will have updates on a row a few times. And we have
> a ttl also defined on the table level.
> >> Our typical workload is less then 1000 writes + reads per second. At
> the max it could go up to 2500 per second.
> >> We use SSD and have around 64 gb of ram on each node. Our cluster size
> is around 70 nodes.
> >>
> >> I looked at time series but we cant guarantee that the updates will
> happen within a give time window. And if we have out of order updates it
> might impact on when we remove that data from the disk.
> >>
> >> So i was looking at level tiered, which supposedly is good when you
> have updates. However its io bound and will affect the writes. everywhere i
> read it says its not good for write heavy workload.
> >> But Looking at our write velocity, is it really write heavy ?
> >>
> >> I guess what i am trying to find out is will level tiered compaction
> will impact the writes in our use case or it will be fine given our write
> rate is not that much.
> >> Also is there anything else i should keep in mind while deciding on the
> compaction strategy.
> >>
> >> Thanks!!
> >
> >
>
>
> --
> Jon Haddad
> http://www.rustyrazorblade.com
> twitter: rustyrazorblade
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: Compaction strategy for update heavy workload

2018-06-13 Thread Jonathan Haddad
I wouldn't use TWCS if there's updates, you're going to risk having
data that's never deleted and really small sstables sticking around
forever.  If you use really large buckets, what's the point of TWCS?

Honestly this is such a small workload you could easily use STCS or
LCS and you'd likely never, ever see a problem.
On Wed, Jun 13, 2018 at 3:34 PM kurt greaves  wrote:
>
> TWCS is probably still worth trying. If you mean updating old rows in TWCS 
> "out of order updates" will only really mean you'll hit more SSTables on 
> read. This might add a bit of complexity in your client if your bucketing 
> partitions (not strictly necessary), but that's about it. As long as you're 
> not specifying "USING TIMESTAMP" you still get the main benefit of efficient 
> dropping of SSTables - C* only cares about the write timestamp of the data in 
> regards to TTL's, not timestamps stored in your partition/clustering key.
> Also keep in mind that you can specify the window size in TWCS, so if you can 
> increase it enough to cover the "out of order" updates then that will also 
> solve the problem w.r.t old buckets.
>
> In regards to LCS, the only way to really know if it'll be too much 
> compaction overhead is to test it, but for the most part you should consider 
> your read/write ratio, rather than the total number of reads/writes (unless 
> it's so small that it's irrelevant, which it may well be).
>
> On 13 June 2018 at 19:25, manuj singh  wrote:
>>
>> Hi all,
>> I am trying to determine compaction strategy for our use case.
>> In our use case we will have updates on a row a few times. And we have a ttl 
>> also defined on the table level.
>> Our typical workload is less then 1000 writes + reads per second. At the max 
>> it could go up to 2500 per second.
>> We use SSD and have around 64 gb of ram on each node. Our cluster size is 
>> around 70 nodes.
>>
>> I looked at time series but we cant guarantee that the updates will happen 
>> within a give time window. And if we have out of order updates it might 
>> impact on when we remove that data from the disk.
>>
>> So i was looking at level tiered, which supposedly is good when you have 
>> updates. However its io bound and will affect the writes. everywhere i read 
>> it says its not good for write heavy workload.
>> But Looking at our write velocity, is it really write heavy ?
>>
>> I guess what i am trying to find out is will level tiered compaction will 
>> impact the writes in our use case or it will be fine given our write rate is 
>> not that much.
>> Also is there anything else i should keep in mind while deciding on the 
>> compaction strategy.
>>
>> Thanks!!
>
>


-- 
Jon Haddad
http://www.rustyrazorblade.com
twitter: rustyrazorblade

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Compaction strategy for update heavy workload

2018-06-13 Thread kurt greaves
TWCS is probably still worth trying. If you mean updating old rows in TWCS
"out of order updates" will only really mean you'll hit more SSTables on
read. This might add a bit of complexity in your client if your bucketing
partitions (not strictly necessary), but that's about it. As long as you're
not specifying "USING TIMESTAMP" you still get the main benefit of
efficient dropping of SSTables - C* only cares about the *write timestamp* of
the data in regards to TTL's, not timestamps stored in your
partition/clustering key.
Also keep in mind that you can specify the window size in TWCS, so if you
can increase it enough to cover the "out of order" updates then that will
also solve the problem w.r.t old buckets.

In regards to LCS, the only way to really know if it'll be too much
compaction overhead is to test it, but for the most part you should
consider your read/write ratio, rather than the total number of
reads/writes (unless it's so small that it's irrelevant, which it may well
be).

On 13 June 2018 at 19:25, manuj singh  wrote:

> Hi all,
> I am trying to determine compaction strategy for our use case.
> In our use case we will have updates on a row a few times. And we have a
> ttl also defined on the table level.
> Our typical workload is less then 1000 writes + reads per second. At the
> max it could go up to 2500 per second.
> We use SSD and have around 64 gb of ram on each node. Our cluster size is
> around 70 nodes.
>
> I looked at time series but we cant guarantee that the updates will happen
> within a give time window. And if we have out of order updates it might
> impact on when we remove that data from the disk.
>
> So i was looking at level tiered, which supposedly is good when you have
> updates. However its io bound and will affect the writes. everywhere i read
> it says its not good for write heavy workload.
> But Looking at our write velocity, is it really write heavy ?
>
> I guess what i am trying to find out is will level tiered compaction will
> impact the writes in our use case or it will be fine given our write rate
> is not that much.
> Also is there anything else i should keep in mind while deciding on the
> compaction strategy.
>
> Thanks!!
>


Re: Compaction Strategy guidance

2014-11-25 Thread Jean-Armel Luce
Hi Andrei, Hi Nicolai,

Which version of C* are you using ?

There are some recommendations about the max storage per node :
http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2

For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
handle 10x
(3-5TB).

I have the feeling that those recommendations are sensitive according many
criteria such as :
- your hardware
- the compaction strategy
- ...

It looks that LCS lower those limitations.

Increasing the size of sstables might help if you have enough CPU and you
can put more load on your I/O system (@Andrei, I am interested by the
results of your  experimentation about large sstable files)

From my point of view, there are some usage patterns where it is better to
have many small servers than a few large servers. Probably, it is better to
have many small servers if you need LCS for large tables.

Just my 2 cents.

Jean-Armel

2014-11-24 19:56 GMT+01:00 Robert Coli rc...@eventbrite.com:

 On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev ngrigor...@gmail.com
 wrote:

 One of the obvious recommendations I have received was to run more than
 one instance of C* per host. Makes sense - it will reduce the amount of
 data per node and will make better use of the resources.


 This is usually a Bad Idea to do in production.

 =Rob




Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Hi Jean-Armel, Nikolai,

1. Increasing sstable size doesn't work (well, I think, unless we
overscale - add more nodes than really necessary, which is
prohibitive for us in a way). Essentially there is no change.  I gave
up and will go for STCS;-(
2. We use 2.0.11 as of now
3. We are running on EC2 c3.8xlarge instances with EBS volumes for data (GP SSD)

Jean-Armel, I believe that what you say about many small instances is
absolutely true. But, is not good in our case - we write a lot and
almost never read what we've written. That is, we want to be able to
read everything, but in reality we hardly read 1%, I think. This
implies that smaller instances are of no use in terms of read
performance for us. And generally nstances/cpu/ram is more expensive
than storage. So, we really would like to have instances with large
storage.

Andrei.





On Tue, Nov 25, 2014 at 11:23 AM, Jean-Armel Luce jaluc...@gmail.com wrote:
 Hi Andrei, Hi Nicolai,

 Which version of C* are you using ?

 There are some recommendations about the max storage per node :
 http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2

 For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to handle
 10x
 (3-5TB).

 I have the feeling that those recommendations are sensitive according many
 criteria such as :
 - your hardware
 - the compaction strategy
 - ...

 It looks that LCS lower those limitations.

 Increasing the size of sstables might help if you have enough CPU and you
 can put more load on your I/O system (@Andrei, I am interested by the
 results of your  experimentation about large sstable files)

 From my point of view, there are some usage patterns where it is better to
 have many small servers than a few large servers. Probably, it is better to
 have many small servers if you need LCS for large tables.

 Just my 2 cents.

 Jean-Armel

 2014-11-24 19:56 GMT+01:00 Robert Coli rc...@eventbrite.com:

 On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev ngrigor...@gmail.com
 wrote:

 One of the obvious recommendations I have received was to run more than
 one instance of C* per host. Makes sense - it will reduce the amount of data
 per node and will make better use of the resources.


 This is usually a Bad Idea to do in production.

 =Rob





Re: Compaction Strategy guidance

2014-11-25 Thread Marcus Eriksson
If you are that write-heavy you should definitely go with STCS, LCS
optimizes for reads by doing more compactions

/Marcus

On Tue, Nov 25, 2014 at 11:22 AM, Andrei Ivanov aiva...@iponweb.net wrote:

 Hi Jean-Armel, Nikolai,

 1. Increasing sstable size doesn't work (well, I think, unless we
 overscale - add more nodes than really necessary, which is
 prohibitive for us in a way). Essentially there is no change.  I gave
 up and will go for STCS;-(
 2. We use 2.0.11 as of now
 3. We are running on EC2 c3.8xlarge instances with EBS volumes for data
 (GP SSD)

 Jean-Armel, I believe that what you say about many small instances is
 absolutely true. But, is not good in our case - we write a lot and
 almost never read what we've written. That is, we want to be able to
 read everything, but in reality we hardly read 1%, I think. This
 implies that smaller instances are of no use in terms of read
 performance for us. And generally nstances/cpu/ram is more expensive
 than storage. So, we really would like to have instances with large
 storage.

 Andrei.





 On Tue, Nov 25, 2014 at 11:23 AM, Jean-Armel Luce jaluc...@gmail.com
 wrote:
  Hi Andrei, Hi Nicolai,
 
  Which version of C* are you using ?
 
  There are some recommendations about the max storage per node :
 
 http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
 
  For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
 handle
  10x
  (3-5TB).
 
  I have the feeling that those recommendations are sensitive according
 many
  criteria such as :
  - your hardware
  - the compaction strategy
  - ...
 
  It looks that LCS lower those limitations.
 
  Increasing the size of sstables might help if you have enough CPU and you
  can put more load on your I/O system (@Andrei, I am interested by the
  results of your  experimentation about large sstable files)
 
  From my point of view, there are some usage patterns where it is better
 to
  have many small servers than a few large servers. Probably, it is better
 to
  have many small servers if you need LCS for large tables.
 
  Just my 2 cents.
 
  Jean-Armel
 
  2014-11-24 19:56 GMT+01:00 Robert Coli rc...@eventbrite.com:
 
  On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev 
 ngrigor...@gmail.com
  wrote:
 
  One of the obvious recommendations I have received was to run more than
  one instance of C* per host. Makes sense - it will reduce the amount
 of data
  per node and will make better use of the resources.
 
 
  This is usually a Bad Idea to do in production.
 
  =Rob
 
 
 



Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Yep, Marcus, I know. It's mainly a question of cost of those extra x2
disks, you know. Our final setup will be more like 30TB, so doubling
it is still some cost. But i guess, we will have to live with it

On Tue, Nov 25, 2014 at 1:26 PM, Marcus Eriksson krum...@gmail.com wrote:
 If you are that write-heavy you should definitely go with STCS, LCS
 optimizes for reads by doing more compactions

 /Marcus

 On Tue, Nov 25, 2014 at 11:22 AM, Andrei Ivanov aiva...@iponweb.net wrote:

 Hi Jean-Armel, Nikolai,

 1. Increasing sstable size doesn't work (well, I think, unless we
 overscale - add more nodes than really necessary, which is
 prohibitive for us in a way). Essentially there is no change.  I gave
 up and will go for STCS;-(
 2. We use 2.0.11 as of now
 3. We are running on EC2 c3.8xlarge instances with EBS volumes for data
 (GP SSD)

 Jean-Armel, I believe that what you say about many small instances is
 absolutely true. But, is not good in our case - we write a lot and
 almost never read what we've written. That is, we want to be able to
 read everything, but in reality we hardly read 1%, I think. This
 implies that smaller instances are of no use in terms of read
 performance for us. And generally nstances/cpu/ram is more expensive
 than storage. So, we really would like to have instances with large
 storage.

 Andrei.





 On Tue, Nov 25, 2014 at 11:23 AM, Jean-Armel Luce jaluc...@gmail.com
 wrote:
  Hi Andrei, Hi Nicolai,
 
  Which version of C* are you using ?
 
  There are some recommendations about the max storage per node :
 
  http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
 
  For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
  handle
  10x
  (3-5TB).
 
  I have the feeling that those recommendations are sensitive according
  many
  criteria such as :
  - your hardware
  - the compaction strategy
  - ...
 
  It looks that LCS lower those limitations.
 
  Increasing the size of sstables might help if you have enough CPU and
  you
  can put more load on your I/O system (@Andrei, I am interested by the
  results of your  experimentation about large sstable files)
 
  From my point of view, there are some usage patterns where it is better
  to
  have many small servers than a few large servers. Probably, it is better
  to
  have many small servers if you need LCS for large tables.
 
  Just my 2 cents.
 
  Jean-Armel
 
  2014-11-24 19:56 GMT+01:00 Robert Coli rc...@eventbrite.com:
 
  On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev
  ngrigor...@gmail.com
  wrote:
 
  One of the obvious recommendations I have received was to run more
  than
  one instance of C* per host. Makes sense - it will reduce the amount
  of data
  per node and will make better use of the resources.
 
 
  This is usually a Bad Idea to do in production.
 
  =Rob
 
 
 




Re: Compaction Strategy guidance

2014-11-25 Thread Nikolai Grigoriev
Hi Jean-Armel,

I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but
there are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra
2.0.10.

I have about 1,8Tb of data per node now in total, which falls into that
range.

As I said, it is really a problem with large amount of data in a single CF,
not total amount of data. Quite often the nodes are idle yet having quite a
bit of pending compactions. I have discussed it with other members of C*
community and DataStax guys and, they have confirmed my observation.

I believe that increasing the sstable size won't help at all and probably
will make the things worse - everything else being equal, of course. But I
would like to hear from Andrei when he is done with his test.

Regarding the last statement - yes, C* clearly likes many small servers
more than fewer large ones. But it is all relative - and can be all
recalculated to $$$ :) C* is all about partitioning of everything -
storage, traffic...Less data per node and more nodes give you lower
latency, lower heap usage etc, etc. I think I have learned this with my
project. Somewhat hard way but still, nothing is better than the personal
experience :)

On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce jaluc...@gmail.com wrote:

 Hi Andrei, Hi Nicolai,

 Which version of C* are you using ?

 There are some recommendations about the max storage per node :
 http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2

 For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
 handle 10x
 (3-5TB).

 I have the feeling that those recommendations are sensitive according many
 criteria such as :
 - your hardware
 - the compaction strategy
 - ...

 It looks that LCS lower those limitations.

 Increasing the size of sstables might help if you have enough CPU and you
 can put more load on your I/O system (@Andrei, I am interested by the
 results of your  experimentation about large sstable files)

 From my point of view, there are some usage patterns where it is better to
 have many small servers than a few large servers. Probably, it is better to
 have many small servers if you need LCS for large tables.

 Just my 2 cents.

 Jean-Armel

 2014-11-24 19:56 GMT+01:00 Robert Coli rc...@eventbrite.com:

 On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev ngrigor...@gmail.com
 wrote:

 One of the obvious recommendations I have received was to run more than
 one instance of C* per host. Makes sense - it will reduce the amount of
 data per node and will make better use of the resources.


 This is usually a Bad Idea to do in production.

 =Rob






-- 
Nikolai Grigoriev
(514) 772-5178


Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Nikolai,

Just in case you've missed my comment in the thread (guess you have) -
increasing sstable size does nothing (in our case at least). That is,
it's not worse but the load pattern is still the same - doing nothing
most of the time. So, I switched to STCS and we will have to live with
extra storage cost - storage is way cheaper than cpu etc anyhow:-)

On Tue, Nov 25, 2014 at 5:53 PM, Nikolai Grigoriev ngrigor...@gmail.com wrote:
 Hi Jean-Armel,

 I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but there
 are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra 2.0.10.

 I have about 1,8Tb of data per node now in total, which falls into that
 range.

 As I said, it is really a problem with large amount of data in a single CF,
 not total amount of data. Quite often the nodes are idle yet having quite a
 bit of pending compactions. I have discussed it with other members of C*
 community and DataStax guys and, they have confirmed my observation.

 I believe that increasing the sstable size won't help at all and probably
 will make the things worse - everything else being equal, of course. But I
 would like to hear from Andrei when he is done with his test.

 Regarding the last statement - yes, C* clearly likes many small servers more
 than fewer large ones. But it is all relative - and can be all recalculated
 to $$$ :) C* is all about partitioning of everything - storage,
 traffic...Less data per node and more nodes give you lower latency, lower
 heap usage etc, etc. I think I have learned this with my project. Somewhat
 hard way but still, nothing is better than the personal experience :)

 On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce jaluc...@gmail.com wrote:

 Hi Andrei, Hi Nicolai,

 Which version of C* are you using ?

 There are some recommendations about the max storage per node :
 http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2

 For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
 handle 10x
 (3-5TB).

 I have the feeling that those recommendations are sensitive according many
 criteria such as :
 - your hardware
 - the compaction strategy
 - ...

 It looks that LCS lower those limitations.

 Increasing the size of sstables might help if you have enough CPU and you
 can put more load on your I/O system (@Andrei, I am interested by the
 results of your  experimentation about large sstable files)

 From my point of view, there are some usage patterns where it is better to
 have many small servers than a few large servers. Probably, it is better to
 have many small servers if you need LCS for large tables.

 Just my 2 cents.

 Jean-Armel

 2014-11-24 19:56 GMT+01:00 Robert Coli rc...@eventbrite.com:

 On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev ngrigor...@gmail.com
 wrote:

 One of the obvious recommendations I have received was to run more than
 one instance of C* per host. Makes sense - it will reduce the amount of 
 data
 per node and will make better use of the resources.


 This is usually a Bad Idea to do in production.

 =Rob






 --
 Nikolai Grigoriev
 (514) 772-5178


Re: Compaction Strategy guidance

2014-11-25 Thread Nikolai Grigoriev
Andrei,

Oh, yes, I have scanned the top of your previous email but overlooked the
last part.

I am using SSDs so I prefer to put extra work to keep my system performing
and save expensive disk space. So far I've been able to size the system
more or less correctly so these LCS limitations do not cause too much
troubles. But I do keep the CF sharding option as backup - for me it will
be relatively easy to implement it.

On Tue, Nov 25, 2014 at 1:25 PM, Andrei Ivanov aiva...@iponweb.net wrote:

 Nikolai,

 Just in case you've missed my comment in the thread (guess you have) -
 increasing sstable size does nothing (in our case at least). That is,
 it's not worse but the load pattern is still the same - doing nothing
 most of the time. So, I switched to STCS and we will have to live with
 extra storage cost - storage is way cheaper than cpu etc anyhow:-)

 On Tue, Nov 25, 2014 at 5:53 PM, Nikolai Grigoriev ngrigor...@gmail.com
 wrote:
  Hi Jean-Armel,
 
  I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but
 there
  are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra
 2.0.10.
 
  I have about 1,8Tb of data per node now in total, which falls into that
  range.
 
  As I said, it is really a problem with large amount of data in a single
 CF,
  not total amount of data. Quite often the nodes are idle yet having
 quite a
  bit of pending compactions. I have discussed it with other members of C*
  community and DataStax guys and, they have confirmed my observation.
 
  I believe that increasing the sstable size won't help at all and probably
  will make the things worse - everything else being equal, of course. But
 I
  would like to hear from Andrei when he is done with his test.
 
  Regarding the last statement - yes, C* clearly likes many small servers
 more
  than fewer large ones. But it is all relative - and can be all
 recalculated
  to $$$ :) C* is all about partitioning of everything - storage,
  traffic...Less data per node and more nodes give you lower latency, lower
  heap usage etc, etc. I think I have learned this with my project.
 Somewhat
  hard way but still, nothing is better than the personal experience :)
 
  On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce jaluc...@gmail.com
 wrote:
 
  Hi Andrei, Hi Nicolai,
 
  Which version of C* are you using ?
 
  There are some recommendations about the max storage per node :
 
 http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
 
  For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
  handle 10x
  (3-5TB).
 
  I have the feeling that those recommendations are sensitive according
 many
  criteria such as :
  - your hardware
  - the compaction strategy
  - ...
 
  It looks that LCS lower those limitations.
 
  Increasing the size of sstables might help if you have enough CPU and
 you
  can put more load on your I/O system (@Andrei, I am interested by the
  results of your  experimentation about large sstable files)
 
  From my point of view, there are some usage patterns where it is better
 to
  have many small servers than a few large servers. Probably, it is
 better to
  have many small servers if you need LCS for large tables.
 
  Just my 2 cents.
 
  Jean-Armel
 
  2014-11-24 19:56 GMT+01:00 Robert Coli rc...@eventbrite.com:
 
  On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev 
 ngrigor...@gmail.com
  wrote:
 
  One of the obvious recommendations I have received was to run more
 than
  one instance of C* per host. Makes sense - it will reduce the amount
 of data
  per node and will make better use of the resources.
 
 
  This is usually a Bad Idea to do in production.
 
  =Rob
 
 
 
 
 
 
  --
  Nikolai Grigoriev
  (514) 772-5178




-- 
Nikolai Grigoriev
(514) 772-5178


Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Ah, clear then. SSD usage imposes a different bias in terms of costs;-)

On Tue, Nov 25, 2014 at 9:48 PM, Nikolai Grigoriev ngrigor...@gmail.com wrote:
 Andrei,

 Oh, yes, I have scanned the top of your previous email but overlooked the
 last part.

 I am using SSDs so I prefer to put extra work to keep my system performing
 and save expensive disk space. So far I've been able to size the system more
 or less correctly so these LCS limitations do not cause too much troubles.
 But I do keep the CF sharding option as backup - for me it will be
 relatively easy to implement it.


 On Tue, Nov 25, 2014 at 1:25 PM, Andrei Ivanov aiva...@iponweb.net wrote:

 Nikolai,

 Just in case you've missed my comment in the thread (guess you have) -
 increasing sstable size does nothing (in our case at least). That is,
 it's not worse but the load pattern is still the same - doing nothing
 most of the time. So, I switched to STCS and we will have to live with
 extra storage cost - storage is way cheaper than cpu etc anyhow:-)

 On Tue, Nov 25, 2014 at 5:53 PM, Nikolai Grigoriev ngrigor...@gmail.com
 wrote:
  Hi Jean-Armel,
 
  I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but
  there
  are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra
  2.0.10.
 
  I have about 1,8Tb of data per node now in total, which falls into that
  range.
 
  As I said, it is really a problem with large amount of data in a single
  CF,
  not total amount of data. Quite often the nodes are idle yet having
  quite a
  bit of pending compactions. I have discussed it with other members of C*
  community and DataStax guys and, they have confirmed my observation.
 
  I believe that increasing the sstable size won't help at all and
  probably
  will make the things worse - everything else being equal, of course. But
  I
  would like to hear from Andrei when he is done with his test.
 
  Regarding the last statement - yes, C* clearly likes many small servers
  more
  than fewer large ones. But it is all relative - and can be all
  recalculated
  to $$$ :) C* is all about partitioning of everything - storage,
  traffic...Less data per node and more nodes give you lower latency,
  lower
  heap usage etc, etc. I think I have learned this with my project.
  Somewhat
  hard way but still, nothing is better than the personal experience :)
 
  On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce jaluc...@gmail.com
  wrote:
 
  Hi Andrei, Hi Nicolai,
 
  Which version of C* are you using ?
 
  There are some recommendations about the max storage per node :
 
  http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
 
  For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
  handle 10x
  (3-5TB).
 
  I have the feeling that those recommendations are sensitive according
  many
  criteria such as :
  - your hardware
  - the compaction strategy
  - ...
 
  It looks that LCS lower those limitations.
 
  Increasing the size of sstables might help if you have enough CPU and
  you
  can put more load on your I/O system (@Andrei, I am interested by the
  results of your  experimentation about large sstable files)
 
  From my point of view, there are some usage patterns where it is better
  to
  have many small servers than a few large servers. Probably, it is
  better to
  have many small servers if you need LCS for large tables.
 
  Just my 2 cents.
 
  Jean-Armel
 
  2014-11-24 19:56 GMT+01:00 Robert Coli rc...@eventbrite.com:
 
  On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev
  ngrigor...@gmail.com
  wrote:
 
  One of the obvious recommendations I have received was to run more
  than
  one instance of C* per host. Makes sense - it will reduce the amount
  of data
  per node and will make better use of the resources.
 
 
  This is usually a Bad Idea to do in production.
 
  =Rob
 
 
 
 
 
 
  --
  Nikolai Grigoriev
  (514) 772-5178




 --
 Nikolai Grigoriev
 (514) 772-5178


Re: Compaction Strategy guidance

2014-11-24 Thread Nikolai Grigoriev
Jean-Armel,

I have only two large tables, the rest is super-small. In the test cluster
of 15 nodes the largest table has about 110M rows. Its total size is about
1,26Gb per node (total disk space used per node for that CF). It's got
about 5K sstables per node - the sstable size is 256Mb. cfstats on a
healthy node look like this:

Read Count: 8973748
Read Latency: 16.130059053251774 ms.
Write Count: 32099455
Write Latency: 1.6124713938912671 ms.
Pending Tasks: 0
Table: wm_contacts
SSTable count: 5195
SSTables in each level: [27/4, 11/10, 104/100, 1053/1000, 4000, 0,
0, 0, 0]
Space used (live), bytes: 1266060391852
Space used (total), bytes: 1266144170869
SSTable Compression Ratio: 0.32604853410787327
Number of keys (estimate): 25696000
Memtable cell count: 71402
Memtable data size, bytes: 26938402
Memtable switch count: 9489
Local read count: 8973748
Local read latency: 17.696 ms
Local write count: 32099471
Local write latency: 1.732 ms
Pending tasks: 0
Bloom filter false positives: 32248
Bloom filter false ratio: 0.50685
Bloom filter space used, bytes: 20744432
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 3379391
Compacted partition mean bytes: 172660
Average live cells per slice (last five minutes): 495.0
Average tombstones per slice (last five minutes): 0.0

Another table of similar structure (same number of rows) is about 4x times
smaller. That table does not suffer from those issues - it compacts well
and efficiently.

On Mon, Nov 24, 2014 at 2:30 AM, Jean-Armel Luce jaluc...@gmail.com wrote:

 Hi Nikolai,

 Please could you clarify a little bit what you call a large amount of
 data ?

 How many tables ?
 How many rows in your largest table ?
 How many GB in your largest table ?
 How many GB per node ?

 Thanks.



 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce jaluc...@gmail.com:

 Hi Nikolai,

 Thanks for those informations.

 Please could you clarify a little bit what you call 

 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev ngrigor...@gmail.com:

 Just to clarify - when I was talking about the large amount of data I
 really meant large amount of data per node in a single CF (table). LCS does
 not seem to like it when it gets thousands of sstables (makes 4-5 levels).

 When bootstraping a new node you'd better enable that option from
 CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
 mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it
 had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does
 not go down. Number of sstables at L0  is over 11K and it is slowly slowly
 building upper levels. Total number of sstables is 4x the normal amount.
 Now I am not entirely sure if this node will ever get back to normal life.
 And believe me - this is not because of I/O, I have SSDs everywhere and 16
 physical cores. This machine is barely using 1-3 cores at most of the time.
 The problem is that allowing STCS fallback is not a good option either - it
 will quickly result in a few 200Gb+ sstables in my configuration and then
 these sstables will never be compacted. Plus, it will require close to 2x
 disk space on EVERY disk in my JBOD configuration...this will kill the node
 sooner or later. This is all because all sstables after bootstrap end at L0
 and then the process slowly slowly moves them to other levels. If you have
 write traffic to that CF then the number of sstables and L0 will grow
 quickly - like it happens in my case now.

 Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301
 is implemented it may be better.


 On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov aiva...@iponweb.net
 wrote:

 Stephane,

 We are having a somewhat similar C* load profile. Hence some comments
 in addition Nikolai's answer.
 1. Fallback to STCS - you can disable it actually
 2. Based on our experience, if you have a lot of data per node, LCS
 may work just fine. That is, till the moment you decide to join
 another node - chances are that the newly added node will not be able
 to compact what it gets from old nodes. In your case, if you switch
 strategy the same thing may happen. This is all due to limitations
 mentioned by Nikolai.

 Andrei,


 On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. smg...@gmail.com
 wrote:
  ABUSE
 
 
 
  YA NO QUIERO MAS MAILS SOY DE MEXICO
 
 
 
  De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
  Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
  Para: user@cassandra.apache.org
  Asunto: Re: Compaction Strategy guidance
  Importancia: Alta
 
 
 
  Stephane,
 
  As everything good, LCS comes at certain price.
 
  LCS will put most load on you I/O system (if you use spindles - you
 may need
  to be careful about that) and on CPU. Also LCS (by default) may fall
 back to
  STCS

Re: Compaction Strategy guidance

2014-11-24 Thread Andrei Ivanov
Nikolai,

Are you sure about 1.26Gb? Like it doesn't look right - 5195 tables
with 256Mb table size...

Andrei

On Mon, Nov 24, 2014 at 5:09 PM, Nikolai Grigoriev ngrigor...@gmail.com wrote:
 Jean-Armel,

 I have only two large tables, the rest is super-small. In the test cluster
 of 15 nodes the largest table has about 110M rows. Its total size is about
 1,26Gb per node (total disk space used per node for that CF). It's got about
 5K sstables per node - the sstable size is 256Mb. cfstats on a healthy
 node look like this:

 Read Count: 8973748
 Read Latency: 16.130059053251774 ms.
 Write Count: 32099455
 Write Latency: 1.6124713938912671 ms.
 Pending Tasks: 0
 Table: wm_contacts
 SSTable count: 5195
 SSTables in each level: [27/4, 11/10, 104/100, 1053/1000, 4000, 0,
 0, 0, 0]
 Space used (live), bytes: 1266060391852
 Space used (total), bytes: 1266144170869
 SSTable Compression Ratio: 0.32604853410787327
 Number of keys (estimate): 25696000
 Memtable cell count: 71402
 Memtable data size, bytes: 26938402
 Memtable switch count: 9489
 Local read count: 8973748
 Local read latency: 17.696 ms
 Local write count: 32099471
 Local write latency: 1.732 ms
 Pending tasks: 0
 Bloom filter false positives: 32248
 Bloom filter false ratio: 0.50685
 Bloom filter space used, bytes: 20744432
 Compacted partition minimum bytes: 104
 Compacted partition maximum bytes: 3379391
 Compacted partition mean bytes: 172660
 Average live cells per slice (last five minutes): 495.0
 Average tombstones per slice (last five minutes): 0.0

 Another table of similar structure (same number of rows) is about 4x times
 smaller. That table does not suffer from those issues - it compacts well and
 efficiently.

 On Mon, Nov 24, 2014 at 2:30 AM, Jean-Armel Luce jaluc...@gmail.com wrote:

 Hi Nikolai,

 Please could you clarify a little bit what you call a large amount of
 data ?

 How many tables ?
 How many rows in your largest table ?
 How many GB in your largest table ?
 How many GB per node ?

 Thanks.



 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce jaluc...@gmail.com:

 Hi Nikolai,

 Thanks for those informations.

 Please could you clarify a little bit what you call 

 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev ngrigor...@gmail.com:

 Just to clarify - when I was talking about the large amount of data I
 really meant large amount of data per node in a single CF (table). LCS does
 not seem to like it when it gets thousands of sstables (makes 4-5 levels).

 When bootstraping a new node you'd better enable that option from
 CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
 mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it 
 had
 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does not go
 down. Number of sstables at L0  is over 11K and it is slowly slowly 
 building
 upper levels. Total number of sstables is 4x the normal amount. Now I am 
 not
 entirely sure if this node will ever get back to normal life. And believe 
 me
 - this is not because of I/O, I have SSDs everywhere and 16 physical cores.
 This machine is barely using 1-3 cores at most of the time. The problem is
 that allowing STCS fallback is not a good option either - it will quickly
 result in a few 200Gb+ sstables in my configuration and then these sstables
 will never be compacted. Plus, it will require close to 2x disk space on
 EVERY disk in my JBOD configuration...this will kill the node sooner or
 later. This is all because all sstables after bootstrap end at L0 and then
 the process slowly slowly moves them to other levels. If you have write
 traffic to that CF then the number of sstables and L0 will grow quickly -
 like it happens in my case now.

 Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301
 is implemented it may be better.


 On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov aiva...@iponweb.net
 wrote:

 Stephane,

 We are having a somewhat similar C* load profile. Hence some comments
 in addition Nikolai's answer.
 1. Fallback to STCS - you can disable it actually
 2. Based on our experience, if you have a lot of data per node, LCS
 may work just fine. That is, till the moment you decide to join
 another node - chances are that the newly added node will not be able
 to compact what it gets from old nodes. In your case, if you switch
 strategy the same thing may happen. This is all due to limitations
 mentioned by Nikolai.

 Andrei,


 On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. smg...@gmail.com
 wrote:
  ABUSE
 
 
 
  YA NO QUIERO MAS MAILS SOY DE MEXICO
 
 
 
  De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
  Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
  Para: user@cassandra.apache.org
  Asunto: Re: Compaction Strategy guidance
  Importancia: Alta

Re: Compaction Strategy guidance

2014-11-24 Thread Nikolai Grigoriev
  
  
  
   YA NO QUIERO MAS MAILS SOY DE MEXICO
  
  
  
   De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
   Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
   Para: user@cassandra.apache.org
   Asunto: Re: Compaction Strategy guidance
   Importancia: Alta
  
  
  
   Stephane,
  
   As everything good, LCS comes at certain price.
  
   LCS will put most load on you I/O system (if you use spindles - you
   may need
   to be careful about that) and on CPU. Also LCS (by default) may
 fall
   back to
   STCS if it is falling behind (which is very possible with heavy
   writing
   activity) and this will result in higher disk space usage. Also LCS
   has
   certain limitation I have discovered lately. Sometimes LCS may not
 be
   able
   to use all your node's resources (algorithm limitations) and this
   reduces
   the overall compaction throughput. This may happen if you have a
   large
   column family with lots of data per node. STCS won't have this
   limitation.
  
  
  
   By the way, the primary goal of LCS is to reduce the number of
   sstables C*
   has to look at to find your data. With LCS properly functioning
 this
   number
   will be most likely between something like 1 and 3 for most of the
   reads.
   But if you do few reads and not concerned about the latency today,
   most
   likely LCS may only save you some disk space.
  
  
  
   On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay
   sle...@looplogic.com
   wrote:
  
   Hi there,
  
  
  
   use case:
  
  
  
   - Heavy write app, few reads.
  
   - Lots of updates of rows / columns.
  
   - Current performance is fine, for both writes and reads..
  
   - Currently using SizedCompactionStrategy
  
  
  
   We're trying to limit the amount of storage used during compaction.
   Should
   we switch to LeveledCompactionStrategy?
  
  
  
   Thanks
  
  
  
  
   --
  
   Nikolai Grigoriev
 
 
 
 
  --
  Nikolai Grigoriev
 
 
 
 
 
 
  --
  Nikolai Grigoriev
 




-- 
Nikolai Grigoriev
(514) 772-5178


Re: Compaction Strategy guidance

2014-11-24 Thread Andrei Ivanov
. In your case, if you switch
  strategy the same thing may happen. This is all due to limitations
  mentioned by Nikolai.
 
  Andrei,
 
 
  On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G.
  smg...@gmail.com
  wrote:
   ABUSE
  
  
  
   YA NO QUIERO MAS MAILS SOY DE MEXICO
  
  
  
   De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
   Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
   Para: user@cassandra.apache.org
   Asunto: Re: Compaction Strategy guidance
   Importancia: Alta
  
  
  
   Stephane,
  
   As everything good, LCS comes at certain price.
  
   LCS will put most load on you I/O system (if you use spindles -
   you
   may need
   to be careful about that) and on CPU. Also LCS (by default) may
   fall
   back to
   STCS if it is falling behind (which is very possible with heavy
   writing
   activity) and this will result in higher disk space usage. Also
   LCS
   has
   certain limitation I have discovered lately. Sometimes LCS may not
   be
   able
   to use all your node's resources (algorithm limitations) and this
   reduces
   the overall compaction throughput. This may happen if you have a
   large
   column family with lots of data per node. STCS won't have this
   limitation.
  
  
  
   By the way, the primary goal of LCS is to reduce the number of
   sstables C*
   has to look at to find your data. With LCS properly functioning
   this
   number
   will be most likely between something like 1 and 3 for most of the
   reads.
   But if you do few reads and not concerned about the latency today,
   most
   likely LCS may only save you some disk space.
  
  
  
   On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay
   sle...@looplogic.com
   wrote:
  
   Hi there,
  
  
  
   use case:
  
  
  
   - Heavy write app, few reads.
  
   - Lots of updates of rows / columns.
  
   - Current performance is fine, for both writes and reads..
  
   - Currently using SizedCompactionStrategy
  
  
  
   We're trying to limit the amount of storage used during
   compaction.
   Should
   we switch to LeveledCompactionStrategy?
  
  
  
   Thanks
  
  
  
  
   --
  
   Nikolai Grigoriev
 
 
 
 
  --
  Nikolai Grigoriev
 
 
 
 
 
 
  --
  Nikolai Grigoriev
 




 --
 Nikolai Grigoriev
 (514) 772-5178


Re: Compaction Strategy guidance

2014-11-24 Thread Nikolai Grigoriev
 bootstrapped ~2 weeks ago.
 Initially
   it had
   7,5K pending compactions, now it has almost stabilized ad 4,6K.
 Does
   not go
   down. Number of sstables at L0  is over 11K and it is slowly slowly
   building
   upper levels. Total number of sstables is 4x the normal amount.
 Now I
   am not
   entirely sure if this node will ever get back to normal life. And
   believe me
   - this is not because of I/O, I have SSDs everywhere and 16
 physical
   cores.
   This machine is barely using 1-3 cores at most of the time. The
   problem is
   that allowing STCS fallback is not a good option either - it will
   quickly
   result in a few 200Gb+ sstables in my configuration and then these
   sstables
   will never be compacted. Plus, it will require close to 2x disk
 space
   on
   EVERY disk in my JBOD configuration...this will kill the node
 sooner
   or
   later. This is all because all sstables after bootstrap end at L0
 and
   then
   the process slowly slowly moves them to other levels. If you have
   write
   traffic to that CF then the number of sstables and L0 will grow
   quickly -
   like it happens in my case now.
  
   Once something like
   https://issues.apache.org/jira/browse/CASSANDRA-8301
   is implemented it may be better.
  
  
   On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov 
 aiva...@iponweb.net
   wrote:
  
   Stephane,
  
   We are having a somewhat similar C* load profile. Hence some
   comments
   in addition Nikolai's answer.
   1. Fallback to STCS - you can disable it actually
   2. Based on our experience, if you have a lot of data per node,
 LCS
   may work just fine. That is, till the moment you decide to join
   another node - chances are that the newly added node will not be
   able
   to compact what it gets from old nodes. In your case, if you
 switch
   strategy the same thing may happen. This is all due to limitations
   mentioned by Nikolai.
  
   Andrei,
  
  
   On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G.
   smg...@gmail.com
   wrote:
ABUSE
   
   
   
YA NO QUIERO MAS MAILS SOY DE MEXICO
   
   
   
De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
Para: user@cassandra.apache.org
Asunto: Re: Compaction Strategy guidance
Importancia: Alta
   
   
   
Stephane,
   
As everything good, LCS comes at certain price.
   
LCS will put most load on you I/O system (if you use spindles -
you
may need
to be careful about that) and on CPU. Also LCS (by default) may
fall
back to
STCS if it is falling behind (which is very possible with heavy
writing
activity) and this will result in higher disk space usage. Also
LCS
has
certain limitation I have discovered lately. Sometimes LCS may
 not
be
able
to use all your node's resources (algorithm limitations) and
 this
reduces
the overall compaction throughput. This may happen if you have a
large
column family with lots of data per node. STCS won't have this
limitation.
   
   
   
By the way, the primary goal of LCS is to reduce the number of
sstables C*
has to look at to find your data. With LCS properly functioning
this
number
will be most likely between something like 1 and 3 for most of
 the
reads.
But if you do few reads and not concerned about the latency
 today,
most
likely LCS may only save you some disk space.
   
   
   
On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay
sle...@looplogic.com
wrote:
   
Hi there,
   
   
   
use case:
   
   
   
- Heavy write app, few reads.
   
- Lots of updates of rows / columns.
   
- Current performance is fine, for both writes and reads..
   
- Currently using SizedCompactionStrategy
   
   
   
We're trying to limit the amount of storage used during
compaction.
Should
we switch to LeveledCompactionStrategy?
   
   
   
Thanks
   
   
   
   
--
   
Nikolai Grigoriev
  
  
  
  
   --
   Nikolai Grigoriev
  
  
  
  
  
  
   --
   Nikolai Grigoriev
  
 
 
 
 
  --
  Nikolai Grigoriev
  (514) 772-5178




-- 
Nikolai Grigoriev
(514) 772-5178


Re: Compaction Strategy guidance

2014-11-24 Thread Andrei Ivanov
   levels).
  
   When bootstraping a new node you'd better enable that option from
   CASSANDRA-6621 (the one that disables STCS in L0). But it will
   still
   be a
   mess - I have a node that I have bootstrapped ~2 weeks ago.
   Initially
   it had
   7,5K pending compactions, now it has almost stabilized ad 4,6K.
   Does
   not go
   down. Number of sstables at L0  is over 11K and it is slowly
   slowly
   building
   upper levels. Total number of sstables is 4x the normal amount.
   Now I
   am not
   entirely sure if this node will ever get back to normal life. And
   believe me
   - this is not because of I/O, I have SSDs everywhere and 16
   physical
   cores.
   This machine is barely using 1-3 cores at most of the time. The
   problem is
   that allowing STCS fallback is not a good option either - it will
   quickly
   result in a few 200Gb+ sstables in my configuration and then these
   sstables
   will never be compacted. Plus, it will require close to 2x disk
   space
   on
   EVERY disk in my JBOD configuration...this will kill the node
   sooner
   or
   later. This is all because all sstables after bootstrap end at L0
   and
   then
   the process slowly slowly moves them to other levels. If you have
   write
   traffic to that CF then the number of sstables and L0 will grow
   quickly -
   like it happens in my case now.
  
   Once something like
   https://issues.apache.org/jira/browse/CASSANDRA-8301
   is implemented it may be better.
  
  
   On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov
   aiva...@iponweb.net
   wrote:
  
   Stephane,
  
   We are having a somewhat similar C* load profile. Hence some
   comments
   in addition Nikolai's answer.
   1. Fallback to STCS - you can disable it actually
   2. Based on our experience, if you have a lot of data per node,
   LCS
   may work just fine. That is, till the moment you decide to join
   another node - chances are that the newly added node will not be
   able
   to compact what it gets from old nodes. In your case, if you
   switch
   strategy the same thing may happen. This is all due to
   limitations
   mentioned by Nikolai.
  
   Andrei,
  
  
   On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G.
   smg...@gmail.com
   wrote:
ABUSE
   
   
   
YA NO QUIERO MAS MAILS SOY DE MEXICO
   
   
   
De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
Para: user@cassandra.apache.org
Asunto: Re: Compaction Strategy guidance
Importancia: Alta
   
   
   
Stephane,
   
As everything good, LCS comes at certain price.
   
LCS will put most load on you I/O system (if you use spindles -
you
may need
to be careful about that) and on CPU. Also LCS (by default) may
fall
back to
STCS if it is falling behind (which is very possible with heavy
writing
activity) and this will result in higher disk space usage. Also
LCS
has
certain limitation I have discovered lately. Sometimes LCS may
not
be
able
to use all your node's resources (algorithm limitations) and
this
reduces
the overall compaction throughput. This may happen if you have
a
large
column family with lots of data per node. STCS won't have this
limitation.
   
   
   
By the way, the primary goal of LCS is to reduce the number of
sstables C*
has to look at to find your data. With LCS properly functioning
this
number
will be most likely between something like 1 and 3 for most of
the
reads.
But if you do few reads and not concerned about the latency
today,
most
likely LCS may only save you some disk space.
   
   
   
On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay
sle...@looplogic.com
wrote:
   
Hi there,
   
   
   
use case:
   
   
   
- Heavy write app, few reads.
   
- Lots of updates of rows / columns.
   
- Current performance is fine, for both writes and reads..
   
- Currently using SizedCompactionStrategy
   
   
   
We're trying to limit the amount of storage used during
compaction.
Should
we switch to LeveledCompactionStrategy?
   
   
   
Thanks
   
   
   
   
--
   
Nikolai Grigoriev
  
  
  
  
   --
   Nikolai Grigoriev
  
  
  
  
  
  
   --
   Nikolai Grigoriev
  
 
 
 
 
  --
  Nikolai Grigoriev
  (514) 772-5178




 --
 Nikolai Grigoriev
 (514) 772-5178


Re: Compaction Strategy guidance

2014-11-24 Thread Robert Coli
On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev ngrigor...@gmail.com
wrote:

 One of the obvious recommendations I have received was to run more than
 one instance of C* per host. Makes sense - it will reduce the amount of
 data per node and will make better use of the resources.


This is usually a Bad Idea to do in production.

=Rob


Re: Compaction Strategy guidance

2014-11-23 Thread Andrei Ivanov
Stephane,

We are having a somewhat similar C* load profile. Hence some comments
in addition Nikolai's answer.
1. Fallback to STCS - you can disable it actually
2. Based on our experience, if you have a lot of data per node, LCS
may work just fine. That is, till the moment you decide to join
another node - chances are that the newly added node will not be able
to compact what it gets from old nodes. In your case, if you switch
strategy the same thing may happen. This is all due to limitations
mentioned by Nikolai.

Andrei,


On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. smg...@gmail.com wrote:
 ABUSE



 YA NO QUIERO MAS MAILS SOY DE MEXICO



 De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
 Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
 Para: user@cassandra.apache.org
 Asunto: Re: Compaction Strategy guidance
 Importancia: Alta



 Stephane,

 As everything good, LCS comes at certain price.

 LCS will put most load on you I/O system (if you use spindles - you may need
 to be careful about that) and on CPU. Also LCS (by default) may fall back to
 STCS if it is falling behind (which is very possible with heavy writing
 activity) and this will result in higher disk space usage. Also LCS has
 certain limitation I have discovered lately. Sometimes LCS may not be able
 to use all your node's resources (algorithm limitations) and this reduces
 the overall compaction throughput. This may happen if you have a large
 column family with lots of data per node. STCS won't have this limitation.



 By the way, the primary goal of LCS is to reduce the number of sstables C*
 has to look at to find your data. With LCS properly functioning this number
 will be most likely between something like 1 and 3 for most of the reads.
 But if you do few reads and not concerned about the latency today, most
 likely LCS may only save you some disk space.



 On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay sle...@looplogic.com
 wrote:

 Hi there,



 use case:



 - Heavy write app, few reads.

 - Lots of updates of rows / columns.

 - Current performance is fine, for both writes and reads..

 - Currently using SizedCompactionStrategy



 We're trying to limit the amount of storage used during compaction. Should
 we switch to LeveledCompactionStrategy?



 Thanks




 --

 Nikolai Grigoriev
 (514) 772-5178


Re: Compaction Strategy guidance

2014-11-23 Thread Nikolai Grigoriev
Just to clarify - when I was talking about the large amount of data I
really meant large amount of data per node in a single CF (table). LCS does
not seem to like it when it gets thousands of sstables (makes 4-5 levels).

When bootstraping a new node you'd better enable that option from
CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it
had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does
not go down. Number of sstables at L0  is over 11K and it is slowly slowly
building upper levels. Total number of sstables is 4x the normal amount.
Now I am not entirely sure if this node will ever get back to normal life.
And believe me - this is not because of I/O, I have SSDs everywhere and 16
physical cores. This machine is barely using 1-3 cores at most of the time.
The problem is that allowing STCS fallback is not a good option either - it
will quickly result in a few 200Gb+ sstables in my configuration and then
these sstables will never be compacted. Plus, it will require close to 2x
disk space on EVERY disk in my JBOD configuration...this will kill the node
sooner or later. This is all because all sstables after bootstrap end at L0
and then the process slowly slowly moves them to other levels. If you have
write traffic to that CF then the number of sstables and L0 will grow
quickly - like it happens in my case now.

Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301 is
implemented it may be better.


On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov aiva...@iponweb.net wrote:

 Stephane,

 We are having a somewhat similar C* load profile. Hence some comments
 in addition Nikolai's answer.
 1. Fallback to STCS - you can disable it actually
 2. Based on our experience, if you have a lot of data per node, LCS
 may work just fine. That is, till the moment you decide to join
 another node - chances are that the newly added node will not be able
 to compact what it gets from old nodes. In your case, if you switch
 strategy the same thing may happen. This is all due to limitations
 mentioned by Nikolai.

 Andrei,


 On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. smg...@gmail.com
 wrote:
  ABUSE
 
 
 
  YA NO QUIERO MAS MAILS SOY DE MEXICO
 
 
 
  De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
  Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
  Para: user@cassandra.apache.org
  Asunto: Re: Compaction Strategy guidance
  Importancia: Alta
 
 
 
  Stephane,
 
  As everything good, LCS comes at certain price.
 
  LCS will put most load on you I/O system (if you use spindles - you may
 need
  to be careful about that) and on CPU. Also LCS (by default) may fall
 back to
  STCS if it is falling behind (which is very possible with heavy writing
  activity) and this will result in higher disk space usage. Also LCS has
  certain limitation I have discovered lately. Sometimes LCS may not be
 able
  to use all your node's resources (algorithm limitations) and this reduces
  the overall compaction throughput. This may happen if you have a large
  column family with lots of data per node. STCS won't have this
 limitation.
 
 
 
  By the way, the primary goal of LCS is to reduce the number of sstables
 C*
  has to look at to find your data. With LCS properly functioning this
 number
  will be most likely between something like 1 and 3 for most of the reads.
  But if you do few reads and not concerned about the latency today, most
  likely LCS may only save you some disk space.
 
 
 
  On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay sle...@looplogic.com
  wrote:
 
  Hi there,
 
 
 
  use case:
 
 
 
  - Heavy write app, few reads.
 
  - Lots of updates of rows / columns.
 
  - Current performance is fine, for both writes and reads..
 
  - Currently using SizedCompactionStrategy
 
 
 
  We're trying to limit the amount of storage used during compaction.
 Should
  we switch to LeveledCompactionStrategy?
 
 
 
  Thanks
 
 
 
 
  --
 
  Nikolai Grigoriev
  (514) 772-5178




-- 
Nikolai Grigoriev
(514) 772-5178


Re: Compaction Strategy guidance

2014-11-23 Thread Jean-Armel Luce
Hi Nikolai,

Thanks for those informations.

Please could you clarify a little bit what you call 

2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev ngrigor...@gmail.com:

 Just to clarify - when I was talking about the large amount of data I
 really meant large amount of data per node in a single CF (table). LCS does
 not seem to like it when it gets thousands of sstables (makes 4-5 levels).

 When bootstraping a new node you'd better enable that option from
 CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
 mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it
 had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does
 not go down. Number of sstables at L0  is over 11K and it is slowly slowly
 building upper levels. Total number of sstables is 4x the normal amount.
 Now I am not entirely sure if this node will ever get back to normal life.
 And believe me - this is not because of I/O, I have SSDs everywhere and 16
 physical cores. This machine is barely using 1-3 cores at most of the time.
 The problem is that allowing STCS fallback is not a good option either - it
 will quickly result in a few 200Gb+ sstables in my configuration and then
 these sstables will never be compacted. Plus, it will require close to 2x
 disk space on EVERY disk in my JBOD configuration...this will kill the node
 sooner or later. This is all because all sstables after bootstrap end at L0
 and then the process slowly slowly moves them to other levels. If you have
 write traffic to that CF then the number of sstables and L0 will grow
 quickly - like it happens in my case now.

 Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301
 is implemented it may be better.


 On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov aiva...@iponweb.net
 wrote:

 Stephane,

 We are having a somewhat similar C* load profile. Hence some comments
 in addition Nikolai's answer.
 1. Fallback to STCS - you can disable it actually
 2. Based on our experience, if you have a lot of data per node, LCS
 may work just fine. That is, till the moment you decide to join
 another node - chances are that the newly added node will not be able
 to compact what it gets from old nodes. In your case, if you switch
 strategy the same thing may happen. This is all due to limitations
 mentioned by Nikolai.

 Andrei,


 On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. smg...@gmail.com
 wrote:
  ABUSE
 
 
 
  YA NO QUIERO MAS MAILS SOY DE MEXICO
 
 
 
  De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
  Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
  Para: user@cassandra.apache.org
  Asunto: Re: Compaction Strategy guidance
  Importancia: Alta
 
 
 
  Stephane,
 
  As everything good, LCS comes at certain price.
 
  LCS will put most load on you I/O system (if you use spindles - you may
 need
  to be careful about that) and on CPU. Also LCS (by default) may fall
 back to
  STCS if it is falling behind (which is very possible with heavy writing
  activity) and this will result in higher disk space usage. Also LCS has
  certain limitation I have discovered lately. Sometimes LCS may not be
 able
  to use all your node's resources (algorithm limitations) and this
 reduces
  the overall compaction throughput. This may happen if you have a large
  column family with lots of data per node. STCS won't have this
 limitation.
 
 
 
  By the way, the primary goal of LCS is to reduce the number of sstables
 C*
  has to look at to find your data. With LCS properly functioning this
 number
  will be most likely between something like 1 and 3 for most of the
 reads.
  But if you do few reads and not concerned about the latency today, most
  likely LCS may only save you some disk space.
 
 
 
  On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay sle...@looplogic.com
  wrote:
 
  Hi there,
 
 
 
  use case:
 
 
 
  - Heavy write app, few reads.
 
  - Lots of updates of rows / columns.
 
  - Current performance is fine, for both writes and reads..
 
  - Currently using SizedCompactionStrategy
 
 
 
  We're trying to limit the amount of storage used during compaction.
 Should
  we switch to LeveledCompactionStrategy?
 
 
 
  Thanks
 
 
 
 
  --
 
  Nikolai Grigoriev
  (514) 772-5178




 --
 Nikolai Grigoriev
 (514) 772-5178



Re: Compaction Strategy guidance

2014-11-23 Thread Jean-Armel Luce
Hi Nikolai,

Please could you clarify a little bit what you call a large amount of
data ?

How many tables ?
How many rows in your largest table ?
How many GB in your largest table ?
How many GB per node ?

Thanks.



2014-11-24 8:27 GMT+01:00 Jean-Armel Luce jaluc...@gmail.com:

 Hi Nikolai,

 Thanks for those informations.

 Please could you clarify a little bit what you call 

 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev ngrigor...@gmail.com:

 Just to clarify - when I was talking about the large amount of data I
 really meant large amount of data per node in a single CF (table). LCS does
 not seem to like it when it gets thousands of sstables (makes 4-5 levels).

 When bootstraping a new node you'd better enable that option from
 CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
 mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it
 had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does
 not go down. Number of sstables at L0  is over 11K and it is slowly slowly
 building upper levels. Total number of sstables is 4x the normal amount.
 Now I am not entirely sure if this node will ever get back to normal life.
 And believe me - this is not because of I/O, I have SSDs everywhere and 16
 physical cores. This machine is barely using 1-3 cores at most of the time.
 The problem is that allowing STCS fallback is not a good option either - it
 will quickly result in a few 200Gb+ sstables in my configuration and then
 these sstables will never be compacted. Plus, it will require close to 2x
 disk space on EVERY disk in my JBOD configuration...this will kill the node
 sooner or later. This is all because all sstables after bootstrap end at L0
 and then the process slowly slowly moves them to other levels. If you have
 write traffic to that CF then the number of sstables and L0 will grow
 quickly - like it happens in my case now.

 Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301
 is implemented it may be better.


 On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov aiva...@iponweb.net
 wrote:

 Stephane,

 We are having a somewhat similar C* load profile. Hence some comments
 in addition Nikolai's answer.
 1. Fallback to STCS - you can disable it actually
 2. Based on our experience, if you have a lot of data per node, LCS
 may work just fine. That is, till the moment you decide to join
 another node - chances are that the newly added node will not be able
 to compact what it gets from old nodes. In your case, if you switch
 strategy the same thing may happen. This is all due to limitations
 mentioned by Nikolai.

 Andrei,


 On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. smg...@gmail.com
 wrote:
  ABUSE
 
 
 
  YA NO QUIERO MAS MAILS SOY DE MEXICO
 
 
 
  De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
  Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
  Para: user@cassandra.apache.org
  Asunto: Re: Compaction Strategy guidance
  Importancia: Alta
 
 
 
  Stephane,
 
  As everything good, LCS comes at certain price.
 
  LCS will put most load on you I/O system (if you use spindles - you
 may need
  to be careful about that) and on CPU. Also LCS (by default) may fall
 back to
  STCS if it is falling behind (which is very possible with heavy writing
  activity) and this will result in higher disk space usage. Also LCS has
  certain limitation I have discovered lately. Sometimes LCS may not be
 able
  to use all your node's resources (algorithm limitations) and this
 reduces
  the overall compaction throughput. This may happen if you have a large
  column family with lots of data per node. STCS won't have this
 limitation.
 
 
 
  By the way, the primary goal of LCS is to reduce the number of
 sstables C*
  has to look at to find your data. With LCS properly functioning this
 number
  will be most likely between something like 1 and 3 for most of the
 reads.
  But if you do few reads and not concerned about the latency today, most
  likely LCS may only save you some disk space.
 
 
 
  On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay sle...@looplogic.com
  wrote:
 
  Hi there,
 
 
 
  use case:
 
 
 
  - Heavy write app, few reads.
 
  - Lots of updates of rows / columns.
 
  - Current performance is fine, for both writes and reads..
 
  - Currently using SizedCompactionStrategy
 
 
 
  We're trying to limit the amount of storage used during compaction.
 Should
  we switch to LeveledCompactionStrategy?
 
 
 
  Thanks
 
 
 
 
  --
 
  Nikolai Grigoriev
  (514) 772-5178




 --
 Nikolai Grigoriev
 (514) 772-5178





Re: Compaction Strategy guidance

2014-11-23 Thread Andrei Ivanov
Jean-Armel,

I have the same problem/state as Nikolai. Here are my stats:
~ 1 table
~ 10B records
~ 2TB/node x 6 nodes

Nikolai,
I'm sort of wondering if switching to some larger sstable_size_in_mb
(say 4096 or 8192 or something) with LCS may be a solution, even if
not absolutely permanent?
As for huge sstables, I do already have some 400-500GB tables. The
only idea how I can manage to compact them in the future is to offline
split them at some point. Does it make sense?

(I'm still doing a test drive and really need to understand how we are
going to handle that in production)

Andrei.



On Mon, Nov 24, 2014 at 10:30 AM, Jean-Armel Luce jaluc...@gmail.com wrote:
 Hi Nikolai,

 Please could you clarify a little bit what you call a large amount of data
 ?

 How many tables ?
 How many rows in your largest table ?
 How many GB in your largest table ?
 How many GB per node ?

 Thanks.



 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce jaluc...@gmail.com:

 Hi Nikolai,

 Thanks for those informations.

 Please could you clarify a little bit what you call 

 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev ngrigor...@gmail.com:

 Just to clarify - when I was talking about the large amount of data I
 really meant large amount of data per node in a single CF (table). LCS does
 not seem to like it when it gets thousands of sstables (makes 4-5 levels).

 When bootstraping a new node you'd better enable that option from
 CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
 mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it had
 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does not go
 down. Number of sstables at L0  is over 11K and it is slowly slowly building
 upper levels. Total number of sstables is 4x the normal amount. Now I am not
 entirely sure if this node will ever get back to normal life. And believe me
 - this is not because of I/O, I have SSDs everywhere and 16 physical cores.
 This machine is barely using 1-3 cores at most of the time. The problem is
 that allowing STCS fallback is not a good option either - it will quickly
 result in a few 200Gb+ sstables in my configuration and then these sstables
 will never be compacted. Plus, it will require close to 2x disk space on
 EVERY disk in my JBOD configuration...this will kill the node sooner or
 later. This is all because all sstables after bootstrap end at L0 and then
 the process slowly slowly moves them to other levels. If you have write
 traffic to that CF then the number of sstables and L0 will grow quickly -
 like it happens in my case now.

 Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301
 is implemented it may be better.


 On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov aiva...@iponweb.net
 wrote:

 Stephane,

 We are having a somewhat similar C* load profile. Hence some comments
 in addition Nikolai's answer.
 1. Fallback to STCS - you can disable it actually
 2. Based on our experience, if you have a lot of data per node, LCS
 may work just fine. That is, till the moment you decide to join
 another node - chances are that the newly added node will not be able
 to compact what it gets from old nodes. In your case, if you switch
 strategy the same thing may happen. This is all due to limitations
 mentioned by Nikolai.

 Andrei,


 On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. smg...@gmail.com
 wrote:
  ABUSE
 
 
 
  YA NO QUIERO MAS MAILS SOY DE MEXICO
 
 
 
  De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
  Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
  Para: user@cassandra.apache.org
  Asunto: Re: Compaction Strategy guidance
  Importancia: Alta
 
 
 
  Stephane,
 
  As everything good, LCS comes at certain price.
 
  LCS will put most load on you I/O system (if you use spindles - you
  may need
  to be careful about that) and on CPU. Also LCS (by default) may fall
  back to
  STCS if it is falling behind (which is very possible with heavy
  writing
  activity) and this will result in higher disk space usage. Also LCS
  has
  certain limitation I have discovered lately. Sometimes LCS may not be
  able
  to use all your node's resources (algorithm limitations) and this
  reduces
  the overall compaction throughput. This may happen if you have a large
  column family with lots of data per node. STCS won't have this
  limitation.
 
 
 
  By the way, the primary goal of LCS is to reduce the number of
  sstables C*
  has to look at to find your data. With LCS properly functioning this
  number
  will be most likely between something like 1 and 3 for most of the
  reads.
  But if you do few reads and not concerned about the latency today,
  most
  likely LCS may only save you some disk space.
 
 
 
  On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay sle...@looplogic.com
  wrote:
 
  Hi there,
 
 
 
  use case:
 
 
 
  - Heavy write app, few reads.
 
  - Lots of updates of rows / columns.
 
  - Current performance is fine, for both writes and reads

Re: Compaction Strategy guidance

2014-11-22 Thread Nikolai Grigoriev
Stephane,

As everything good, LCS comes at certain price.

LCS will put most load on you I/O system (if you use spindles - you may
need to be careful about that) and on CPU. Also LCS (by default) may fall
back to STCS if it is falling behind (which is very possible with heavy
writing activity) and this will result in higher disk space usage. Also LCS
has certain limitation I have discovered lately. Sometimes LCS may not be
able to use all your node's resources (algorithm limitations) and this
reduces the overall compaction throughput. This may happen if you have a
large column family with lots of data per node. STCS won't have this
limitation.

By the way, the primary goal of LCS is to reduce the number of sstables C*
has to look at to find your data. With LCS properly functioning this number
will be most likely between something like 1 and 3 for most of the reads.
But if you do few reads and not concerned about the latency today, most
likely LCS may only save you some disk space.

On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay sle...@looplogic.com
wrote:

 Hi there,

 use case:

 - Heavy write app, few reads.
 - Lots of updates of rows / columns.
 - Current performance is fine, for both writes and reads..
 - Currently using SizedCompactionStrategy

 We're trying to limit the amount of storage used during compaction. Should
 we switch to LeveledCompactionStrategy?

 Thanks




-- 
Nikolai Grigoriev
(514) 772-5178


RE: Compaction Strategy guidance

2014-11-22 Thread Servando Muñoz G .
ABUSE

 

YA NO QUIERO MAS MAILS SOY DE MEXICO

 

De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] 
Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
Para: user@cassandra.apache.org
Asunto: Re: Compaction Strategy guidance
Importancia: Alta

 

Stephane,

As everything good, LCS comes at certain price.

LCS will put most load on you I/O system (if you use spindles - you may need to 
be careful about that) and on CPU. Also LCS (by default) may fall back to STCS 
if it is falling behind (which is very possible with heavy writing activity) 
and this will result in higher disk space usage. Also LCS has certain 
limitation I have discovered lately. Sometimes LCS may not be able to use all 
your node's resources (algorithm limitations) and this reduces the overall 
compaction throughput. This may happen if you have a large column family with 
lots of data per node. STCS won't have this limitation.

 

By the way, the primary goal of LCS is to reduce the number of sstables C* has 
to look at to find your data. With LCS properly functioning this number will be 
most likely between something like 1 and 3 for most of the reads. But if you do 
few reads and not concerned about the latency today, most likely LCS may only 
save you some disk space.

 

On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay sle...@looplogic.com wrote:

Hi there,

 

use case:

 

- Heavy write app, few reads.

- Lots of updates of rows / columns.

- Current performance is fine, for both writes and reads..

- Currently using SizedCompactionStrategy

 

We're trying to limit the amount of storage used during compaction. Should we 
switch to LeveledCompactionStrategy? 

 

Thanks




-- 

Nikolai Grigoriev
(514) 772-5178



Re: compaction strategy

2011-05-11 Thread Terje Marthinussen


 Not sure I follow you. 4 sstables is the minimum compaction look for
 (by default).
 If there is 30 sstables of ~20MB sitting there because compaction is
 behind, you
 will compact those 30 sstables together (unless there is not enough space
 for
 that and considering you haven't changed the max compaction threshold (32
 by
 default)). And you can increase max threshold.
 Don't get me wrong, I'm not pretending this works better than it does, but
 let's not pretend either that it's worth than it is.


Sorry, I am not trying to pretend anything or blow it out of proportions.
Just reacting to what I see.

This is what I see after some stress testing of some pretty decent HW.

81 Up Normal  181.6 GB8.33%   Token(bytes[30])

82 Up Normal  501.43 GB   8.33%   Token(bytes[313230])

83 Up Normal  248.07 GB   8.33%   Token(bytes[313437])

84 Up Normal  349.64 GB   8.33%   Token(bytes[313836])

85 Up Normal  511.55 GB   8.33%   Token(bytes[323336])

86 Up Normal  654.93 GB   8.33%   Token(bytes[333234])

87 UpNormal  534.77 GB   8.33%   Token(bytes[333939])

88 Up   Normal  525.88 GB   8.33%   Token(bytes[343739])

89 Up Normal  476.6 GB8.33%   Token(bytes[353730])

90 Up Normal  424.89 GB   8.33%   Token(bytes[363635])

91 Up Normal  338.14 GB   8.33%   Token(bytes[383036])

92 Up Normal  546.95 GB   8.33%   Token(bytes[6a])

.81 has been exposed to a full compaction. It had ~370GB before that and the
resulting sstable is 165GB.
The other nodes has only been doing minor compactions

I think this is a problem.
You are of course free to disagree.

I do however recommend doing a simulation on potential worst case scenarios
if many of the buckets end up with 3 sstables and don't compact for a while.
The disk space requirements  get pretty bad even without getting into
theoretical worst cases.

Regards,
Terje


Re: compaction strategy

2011-05-11 Thread Jonathan Ellis
You are of course free to reduce the min per bucket to 2.

The fundamental idea of sstables + compaction is to trade disk space
for higher write performance. For most applications this is the right
trade to make on modern hardware... I don't think you'll get very far
trying to get the 2nd without the 1st.

On Wed, May 11, 2011 at 3:49 AM, Terje Marthinussen
tmarthinus...@gmail.com wrote:

 Not sure I follow you. 4 sstables is the minimum compaction look for
 (by default).
 If there is 30 sstables of ~20MB sitting there because compaction is
 behind, you
 will compact those 30 sstables together (unless there is not enough space
 for
 that and considering you haven't changed the max compaction threshold (32
 by
 default)). And you can increase max threshold.
 Don't get me wrong, I'm not pretending this works better than it does, but
 let's not pretend either that it's worth than it is.


 Sorry, I am not trying to pretend anything or blow it out of proportions.
 Just reacting to what I see.
 This is what I see after some stress testing of some pretty decent HW.
 81     Up     Normal  181.6 GB        8.33%   Token(bytes[30])

 82     Up     Normal  501.43 GB       8.33%   Token(bytes[313230])

 83     Up     Normal  248.07 GB       8.33%   Token(bytes[313437])

 84     Up     Normal  349.64 GB       8.33%   Token(bytes[313836])

 85     Up     Normal  511.55 GB       8.33%   Token(bytes[323336])

 86     Up     Normal  654.93 GB       8.33%   Token(bytes[333234])

 87     Up    Normal  534.77 GB       8.33%   Token(bytes[333939])

 88     Up   Normal  525.88 GB       8.33%   Token(bytes[343739])

 89     Up     Normal  476.6 GB        8.33%   Token(bytes[353730])

 90     Up     Normal  424.89 GB       8.33%   Token(bytes[363635])

 91     Up     Normal  338.14 GB       8.33%   Token(bytes[383036])

 92     Up     Normal  546.95 GB       8.33%   Token(bytes[6a])
 .81 has been exposed to a full compaction. It had ~370GB before that and the
 resulting sstable is 165GB.
 The other nodes has only been doing minor compactions
 I think this is a problem.
 You are of course free to disagree.
 I do however recommend doing a simulation on potential worst case scenarios
 if many of the buckets end up with 3 sstables and don't compact for a while.
 The disk space requirements  get pretty bad even without getting into
 theoretical worst cases.
 Regards,
 Terje



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: compaction strategy

2011-05-10 Thread Terje Marthinussen
 Everyone may be well aware of that, but I'll still remark that a minor
 compaction
 will try to merge as many 20MB sstables as it can up to the max
 compaction
 threshold (which is configurable). So if you do accumulate some newly
 created
 sstable at some point in time, the next minor compaction will take all of
 them
 and thus not create a 40 MB sstable, then 80MB etc... Sure there will be
 more
 step than with a major compaction, but let's keep in mind we don't
 merge sstables
 2 by 2.


Well, you do kind of merge them 2 by 2 as you look for at least 4 at a time
;)
But yes, 20MB should become at least 80MB. Still quite a few hops to reach
100GB.

I'm also not too much in favor of triggering major compactions,
 because it mostly
 have a nasty effect (create one huge sstable). Now maybe we could expose
 the
 difference factor for which we'll consider sstables in the same bucket


The nasty side effect I am scared of is disk space and to keep the disk
space under control, I need to get down to 1 file.

As an example:
2 days ago, I looked at a system that had gone idle from compaction with
something like 24 sstables.
Disk use was 370GB.

After manually triggering full compaction,  I was left with a single sstable
which is 164 GB large.

This means I may need more than 3x the full dataset to survive if certain
nasty events such as repairs or anti compactions should occur.
Way more than the recommended 2x.

In the same system, I see nodes reaching up towards 900GB during compaction
and 5-600GB otherwise.
This is with OPP, so distribution is not 100% perfect, but I expect these
5-600GB nodes to compact down to the 200GB area if a full compaction is
triggered.

That is way way beyond the recommendation to have 2x the disk space.

You may disagree, but I think this is a problem.
Either we need to recommend 3-5x the best case disk usage or we need to fix
cassandra.

A simple improvement initially may be to change the bucketing strategy if
you cannot find suitable candidates.
I believe lucene for instance has a strategy where it can mix a set of small
index fragments with one large.
This may be possible to consider as a fallback strategy and just let
cassandra compact down to 1 file whenever it can.

Ultimately, I think segmenting on token space is the only way to fix this.
That segmentation could be done by building histograms of your token
distribution as you compact and the compaction can further adjust the
segments accordingly as full compactions take place.

This would seem simpler to do than a full vnode based infrastructure.

Terje


Re: compaction strategy

2011-05-10 Thread Sylvain Lebresne
On Tue, May 10, 2011 at 6:20 PM, Terje Marthinussen
tmarthinus...@gmail.com wrote:

 Everyone may be well aware of that, but I'll still remark that a minor
 compaction
 will try to merge as many 20MB sstables as it can up to the max
 compaction
 threshold (which is configurable). So if you do accumulate some newly
 created
 sstable at some point in time, the next minor compaction will take all of
 them
 and thus not create a 40 MB sstable, then 80MB etc... Sure there will be
 more
 step than with a major compaction, but let's keep in mind we don't
 merge sstables
 2 by 2.

 Well, you do kind of merge them 2 by 2 as you look for at least 4 at a time
 ;)
 But yes, 20MB should become at least 80MB. Still quite a few hops to reach
 100GB.

Not sure I follow you. 4 sstables is the minimum compaction look for
(by default).
If there is 30 sstables of ~20MB sitting there because compaction is behind, you
will compact those 30 sstables together (unless there is not enough space for
that and considering you haven't changed the max compaction threshold (32 by
default)). And you can increase max threshold.
Don't get me wrong, I'm not pretending this works better than it does, but
let's not pretend either that it's worth than it is.


 I'm also not too much in favor of triggering major compactions,
 because it mostly
 have a nasty effect (create one huge sstable). Now maybe we could expose
 the
 difference factor for which we'll consider sstables in the same bucket

 The nasty side effect I am scared of is disk space and to keep the disk
 space under control, I need to get down to 1 file.

 As an example:
 2 days ago, I looked at a system that had gone idle from compaction with
 something like 24 sstables.
 Disk use was 370GB.

 After manually triggering full compaction,  I was left with a single sstable
 which is 164 GB large.

 This means I may need more than 3x the full dataset to survive if certain
 nasty events such as repairs or anti compactions should occur.
 Way more than the recommended 2x.

 In the same system, I see nodes reaching up towards 900GB during compaction
 and 5-600GB otherwise.
 This is with OPP, so distribution is not 100% perfect, but I expect these
 5-600GB nodes to compact down to the 200GB area if a full compaction is
 triggered.

 That is way way beyond the recommendation to have 2x the disk space.

 You may disagree, but I think this is a problem.

I absolutely do not disagree. I was just arguing that I'm not sure triggering a
major compaction based on some fuzzy heuristic is a good solution to the
problem.

And we do know that compaction could and should be improved, both to
make it have less impact on read when it's behind:
https://issues.apache.org/jira/browse/CASSANDRA-2498
to allow for easily testing different strategy:
https://issues.apache.org/jira/browse/CASSANDRA-1610
as well as redesigning the mechanism:
https://issues.apache.org/jira/browse/CASSANDRA-1608
You'll see in particular in that last ticket comments that segmenting on token
space has been suggested already and there is probably a handful of thread
about vnodes in the mailing list archives.

And I personally think that yes, partitioning the sstables is a good idea.

 Either we need to recommend 3-5x the best case disk usage or we need to fix
 cassandra.

 A simple improvement initially may be to change the bucketing strategy if
 you cannot find suitable candidates.
 I believe lucene for instance has a strategy where it can mix a set of small
 index fragments with one large.
 This may be possible to consider as a fallback strategy and just let
 cassandra compact down to 1 file whenever it can.

 Ultimately, I think segmenting on token space is the only way to fix this.
 That segmentation could be done by building histograms of your token
 distribution as you compact and the compaction can further adjust the
 segments accordingly as full compactions take place.

 This would seem simpler to do than a full vnode based infrastructure.

 Terje



Re: compaction strategy

2011-05-09 Thread Sylvain Lebresne
On Sat, May 7, 2011 at 7:20 PM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
 This is an all ssd system. I have no problems with read/write performance
 due to I/O.
 I do have a potential with the crazy explosion you can get in terms of disk
 use if compaction cannot keep up.

 As things falls behind and you get many generations of data, yes, read
 performance gets a problem due to the number of sstables.

 As things start falling behind, you have a bunch of minor compactions trying
 to merge 20MB (sstables cassandra generally dumps with current config when
 under pressure) into 40 MB into 80MB into

Everyone may be well aware of that, but I'll still remark that a minor
compaction
will try to merge as many 20MB sstables as it can up to the max compaction
threshold (which is configurable). So if you do accumulate some newly created
sstable at some point in time, the next minor compaction will take all of them
and thus not create a 40 MB sstable, then 80MB etc... Sure there will be more
step than with a major compaction, but let's keep in mind we don't
merge sstables
2 by 2.

I'm also not too much in favor of triggering major compactions,
because it mostly
have a nasty effect (create one huge sstable). Now maybe we could expose the
difference factor for which we'll consider sstables in the same bucket
(i.e, of similar
size). As a side note, I think that
https://issues.apache.org/jira/browse/CASSANDRA-1610,
if done correctly, could help in such situation in that one could try
a strategy adapted
to it's work load.


 Anyone wants to do the math on how many times you are rewriting the data
 going this route?

 There is just no way this can keep up. It will just fall more and more
 behind.
 Only way to recover as I can see would be to trigger a full compaction?

 It does not really make sense to me to go through all these minor merges
 when a full compaction will do a much faster and better job.

 Terje

 On Sat, May 7, 2011 at 9:54 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Sat, May 7, 2011 at 2:01 AM, Terje Marthinussen
 tmarthinus...@gmail.com wrote:
  1. Would it make sense to make full compactions occur a bit more
  aggressive.

 I'd rather reduce the performance impact of being behind, than do more
 full compactions: https://issues.apache.org/jira/browse/CASSANDRA-2498

  2. I
  would think the code should be smart enough to either trigger a full
  compaction and scrap the current queue, or at least merge some of those
  pending tasks into larger ones

 Not crazy but a queue-rewriter would be nontrivial. For now I'm okay
 with saying add capacity until compaction can mostly keep up. (Most
 people's problem is making compaction LESS aggressive, hence
 https://issues.apache.org/jira/browse/CASSANDRA-2156.)

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com




Re: compaction strategy

2011-05-09 Thread David Boxenhorn
I'm also not too much in favor of triggering major compactions, because it
mostly have a nasty effect (create one huge sstable).

If that is the case, why can't major compactions create many,
non-overlapping SSTables?

In general, it seems to me that non-overlapping SSTables have all the
advantages of big SSTables (i.e. you know exactly where the data is) without
the disadvantages that come with being big. Why doesn't Cassandra take
advantage of that in a major way?


Re: compaction strategy

2011-05-09 Thread David Boxenhorn
If they each have their own copy of the data, then they are *not*
non-overlapping!

If you have non-overlapping SSTables (and you know the min/max keys), it's
like having one big SSTable because you know exactly where each row is, and
it becomes easy to merge a new SSTable in small batches, rather than in one
huge batch.

The only step that you have to add to the current merge process is, when you
going to write a new SSTable, if it's too big, to write N (non-overlapping!)
pieces instead.


On Mon, May 9, 2011 at 12:46 PM, Terje Marthinussen tmarthinus...@gmail.com
 wrote:

 Yes, agreed.

 I actually think cassandra has to.

 And if you do not go down to that single file, how do you avoid getting
 into a situation where you can very realistically end up with 4-5 big
 sstables each having its own copy of the same data massively increasing disk
 requirements?

 Terje

 On Mon, May 9, 2011 at 5:58 PM, David Boxenhorn da...@taotown.com wrote:

 I'm also not too much in favor of triggering major compactions, because
 it mostly have a nasty effect (create one huge sstable).

 If that is the case, why can't major compactions create many,
 non-overlapping SSTables?

 In general, it seems to me that non-overlapping SSTables have all the
 advantages of big SSTables (i.e. you know exactly where the data is) without
 the disadvantages that come with being big. Why doesn't Cassandra take
 advantage of that in a major way?





Re: compaction strategy

2011-05-09 Thread Terje Marthinussen
Sorry, I was referring to the claim that one big file was a problem, not
the non-overlapping part.

If you never compact to a single file, you never get rid of all
generations/duplicates.
With non-overlapping files covering small enough token ranges, compacting
down to one file is not a big issue.

Terje

On Mon, May 9, 2011 at 8:52 PM, David Boxenhorn da...@taotown.com wrote:

 If they each have their own copy of the data, then they are *not*
 non-overlapping!

 If you have non-overlapping SSTables (and you know the min/max keys), it's
 like having one big SSTable because you know exactly where each row is, and
 it becomes easy to merge a new SSTable in small batches, rather than in one
 huge batch.

 The only step that you have to add to the current merge process is, when
 you going to write a new SSTable, if it's too big, to write N
 (non-overlapping!) pieces instead.


 On Mon, May 9, 2011 at 12:46 PM, Terje Marthinussen 
 tmarthinus...@gmail.com wrote:

 Yes, agreed.

 I actually think cassandra has to.

 And if you do not go down to that single file, how do you avoid getting
 into a situation where you can very realistically end up with 4-5 big
 sstables each having its own copy of the same data massively increasing disk
 requirements?

 Terje

 On Mon, May 9, 2011 at 5:58 PM, David Boxenhorn da...@taotown.comwrote:

 I'm also not too much in favor of triggering major compactions, because
 it mostly have a nasty effect (create one huge sstable).

 If that is the case, why can't major compactions create many,
 non-overlapping SSTables?

 In general, it seems to me that non-overlapping SSTables have all the
 advantages of big SSTables (i.e. you know exactly where the data is) without
 the disadvantages that come with being big. Why doesn't Cassandra take
 advantage of that in a major way?






Re: compaction strategy

2011-05-07 Thread Jonathan Ellis
On Sat, May 7, 2011 at 2:01 AM, Terje Marthinussen
tmarthinus...@gmail.com wrote:
 1. Would it make sense to make full compactions occur a bit more aggressive.

I'd rather reduce the performance impact of being behind, than do more
full compactions: https://issues.apache.org/jira/browse/CASSANDRA-2498

 2. I
 would think the code should be smart enough to either trigger a full
 compaction and scrap the current queue, or at least merge some of those
 pending tasks into larger ones

Not crazy but a queue-rewriter would be nontrivial. For now I'm okay
with saying add capacity until compaction can mostly keep up. (Most
people's problem is making compaction LESS aggressive, hence
https://issues.apache.org/jira/browse/CASSANDRA-2156.)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: compaction strategy

2011-05-07 Thread Edward Capriolo
On Sat, May 7, 2011 at 8:54 AM, Jonathan Ellis jbel...@gmail.com wrote:
 On Sat, May 7, 2011 at 2:01 AM, Terje Marthinussen
 tmarthinus...@gmail.com wrote:
 1. Would it make sense to make full compactions occur a bit more aggressive.

 I'd rather reduce the performance impact of being behind, than do more
 full compactions: https://issues.apache.org/jira/browse/CASSANDRA-2498

 2. I
 would think the code should be smart enough to either trigger a full
 compaction and scrap the current queue, or at least merge some of those
 pending tasks into larger ones

 Not crazy but a queue-rewriter would be nontrivial. For now I'm okay
 with saying add capacity until compaction can mostly keep up. (Most
 people's problem is making compaction LESS aggressive, hence
 https://issues.apache.org/jira/browse/CASSANDRA-2156.)

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com


Adjusting compaction and memtable settings is a tuning thing. Tuning
is not usually a game changer. Every once in a while you hit a
something wonderful and get 20% or 30% enhancement. But normally you
are in the gain 1%-3% range.

If you are seeing 600 pending compaction tasks regularly you almost
definitely need more hardware.


Re: compaction strategy

2011-05-07 Thread Peter Schuller
 If you are seeing 600 pending compaction tasks regularly you almost
 definitely need more hardware.

Note that pending compactions is pretty misleading and you can't
really draw conclusions just based on the pending compactions
number/graph. For example, standard behavior during e.g.a  long repair
may end up accumulating thousands of pending compactions that suddenly
drop to zero once you're done and a bunch of tasks that don't actually
need to do anything are completed. With the concurrent compaction
support I suppose this will be mitigated as long as you don't hit your
concurrency limit.

-- 
/ Peter Schuller


Re: compaction strategy

2011-05-07 Thread Terje Marthinussen
This is an all ssd system. I have no problems with read/write performance
due to I/O.
I do have a potential with the crazy explosion you can get in terms of disk
use if compaction cannot keep up.

As things falls behind and you get many generations of data, yes, read
performance gets a problem due to the number of sstables.

As things start falling behind, you have a bunch of minor compactions trying
to merge 20MB (sstables cassandra generally dumps with current config when
under pressure) into 40 MB into 80MB into

Anyone wants to do the math on how many times you are rewriting the data
going this route?

There is just no way this can keep up. It will just fall more and more
behind.
Only way to recover as I can see would be to trigger a full compaction?

It does not really make sense to me to go through all these minor merges
when a full compaction will do a much faster and better job.

Terje

On Sat, May 7, 2011 at 9:54 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Sat, May 7, 2011 at 2:01 AM, Terje Marthinussen
 tmarthinus...@gmail.com wrote:
  1. Would it make sense to make full compactions occur a bit more
 aggressive.

 I'd rather reduce the performance impact of being behind, than do more
 full compactions: https://issues.apache.org/jira/browse/CASSANDRA-2498

  2. I
  would think the code should be smart enough to either trigger a full
  compaction and scrap the current queue, or at least merge some of those
  pending tasks into larger ones

 Not crazy but a queue-rewriter would be nontrivial. For now I'm okay
 with saying add capacity until compaction can mostly keep up. (Most
 people's problem is making compaction LESS aggressive, hence
 https://issues.apache.org/jira/browse/CASSANDRA-2156.)

 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: compaction strategy

2011-05-07 Thread Peter Schuller
 It does not really make sense to me to go through all these minor merges
 when a full compaction will do a much faster and better job.

In a system heavily reliant on caching (platter drives, large data
sizes, much larger than RAM) major compactions can be very detrimental
to performance due to the effects of the temporary spike in data size
and cache coldness. Sounds like it makes good sense in your situation
though.

-- 
/ Peter Schuller