Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Ah, clear then. SSD usage imposes a different bias in terms of costs;-)

On Tue, Nov 25, 2014 at 9:48 PM, Nikolai Grigoriev  wrote:
> Andrei,
>
> Oh, yes, I have scanned the top of your previous email but overlooked the
> last part.
>
> I am using SSDs so I prefer to put extra work to keep my system performing
> and save expensive disk space. So far I've been able to size the system more
> or less correctly so these LCS limitations do not cause too much troubles.
> But I do keep the CF "sharding" option as backup - for me it will be
> relatively easy to implement it.
>
>
> On Tue, Nov 25, 2014 at 1:25 PM, Andrei Ivanov  wrote:
>>
>> Nikolai,
>>
>> Just in case you've missed my comment in the thread (guess you have) -
>> increasing sstable size does nothing (in our case at least). That is,
>> it's not worse but the load pattern is still the same - doing nothing
>> most of the time. So, I switched to STCS and we will have to live with
>> extra storage cost - storage is way cheaper than cpu etc anyhow:-)
>>
>> On Tue, Nov 25, 2014 at 5:53 PM, Nikolai Grigoriev 
>> wrote:
>> > Hi Jean-Armel,
>> >
>> > I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but
>> > there
>> > are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra
>> > 2.0.10.
>> >
>> > I have about 1,8Tb of data per node now in total, which falls into that
>> > range.
>> >
>> > As I said, it is really a problem with large amount of data in a single
>> > CF,
>> > not total amount of data. Quite often the nodes are idle yet having
>> > quite a
>> > bit of pending compactions. I have discussed it with other members of C*
>> > community and DataStax guys and, they have confirmed my observation.
>> >
>> > I believe that increasing the sstable size won't help at all and
>> > probably
>> > will make the things worse - everything else being equal, of course. But
>> > I
>> > would like to hear from Andrei when he is done with his test.
>> >
>> > Regarding the last statement - yes, C* clearly likes many small servers
>> > more
>> > than fewer large ones. But it is all relative - and can be all
>> > recalculated
>> > to $$$ :) C* is all about partitioning of everything - storage,
>> > traffic...Less data per node and more nodes give you lower latency,
>> > lower
>> > heap usage etc, etc. I think I have learned this with my project.
>> > Somewhat
>> > hard way but still, nothing is better than the personal experience :)
>> >
>> > On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce 
>> > wrote:
>> >>
>> >> Hi Andrei, Hi Nicolai,
>> >>
>> >> Which version of C* are you using ?
>> >>
>> >> There are some recommendations about the max storage per node :
>> >>
>> >> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
>> >>
>> >> "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
>> >> handle 10x
>> >> (3-5TB)".
>> >>
>> >> I have the feeling that those recommendations are sensitive according
>> >> many
>> >> criteria such as :
>> >> - your hardware
>> >> - the compaction strategy
>> >> - ...
>> >>
>> >> It looks that LCS lower those limitations.
>> >>
>> >> Increasing the size of sstables might help if you have enough CPU and
>> >> you
>> >> can put more load on your I/O system (@Andrei, I am interested by the
>> >> results of your  experimentation about large sstable files)
>> >>
>> >> From my point of view, there are some usage patterns where it is better
>> >> to
>> >> have many small servers than a few large servers. Probably, it is
>> >> better to
>> >> have many small servers if you need LCS for large tables.
>> >>
>> >> Just my 2 cents.
>> >>
>> >> Jean-Armel
>> >>
>> >> 2014-11-24 19:56 GMT+01:00 Robert Coli :
>> >>>
>> >>> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev
>> >>> 
>> >>> wrote:
>> 
>>  One of the obvious recommendations I have received was to run more
>>  than
>>  one instance of C* per host. Makes sense - it will reduce the amount
>>  of data
>>  per node and will make better use of the resources.
>> >>>
>> >>>
>> >>> This is usually a Bad Idea to do in production.
>> >>>
>> >>> =Rob
>> >>>
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Nikolai Grigoriev
>> > (514) 772-5178
>
>
>
>
> --
> Nikolai Grigoriev
> (514) 772-5178


Re: Compaction Strategy guidance

2014-11-25 Thread Nikolai Grigoriev
Andrei,

Oh, yes, I have scanned the top of your previous email but overlooked the
last part.

I am using SSDs so I prefer to put extra work to keep my system performing
and save expensive disk space. So far I've been able to size the system
more or less correctly so these LCS limitations do not cause too much
troubles. But I do keep the CF "sharding" option as backup - for me it will
be relatively easy to implement it.

On Tue, Nov 25, 2014 at 1:25 PM, Andrei Ivanov  wrote:

> Nikolai,
>
> Just in case you've missed my comment in the thread (guess you have) -
> increasing sstable size does nothing (in our case at least). That is,
> it's not worse but the load pattern is still the same - doing nothing
> most of the time. So, I switched to STCS and we will have to live with
> extra storage cost - storage is way cheaper than cpu etc anyhow:-)
>
> On Tue, Nov 25, 2014 at 5:53 PM, Nikolai Grigoriev 
> wrote:
> > Hi Jean-Armel,
> >
> > I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but
> there
> > are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra
> 2.0.10.
> >
> > I have about 1,8Tb of data per node now in total, which falls into that
> > range.
> >
> > As I said, it is really a problem with large amount of data in a single
> CF,
> > not total amount of data. Quite often the nodes are idle yet having
> quite a
> > bit of pending compactions. I have discussed it with other members of C*
> > community and DataStax guys and, they have confirmed my observation.
> >
> > I believe that increasing the sstable size won't help at all and probably
> > will make the things worse - everything else being equal, of course. But
> I
> > would like to hear from Andrei when he is done with his test.
> >
> > Regarding the last statement - yes, C* clearly likes many small servers
> more
> > than fewer large ones. But it is all relative - and can be all
> recalculated
> > to $$$ :) C* is all about partitioning of everything - storage,
> > traffic...Less data per node and more nodes give you lower latency, lower
> > heap usage etc, etc. I think I have learned this with my project.
> Somewhat
> > hard way but still, nothing is better than the personal experience :)
> >
> > On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce 
> wrote:
> >>
> >> Hi Andrei, Hi Nicolai,
> >>
> >> Which version of C* are you using ?
> >>
> >> There are some recommendations about the max storage per node :
> >>
> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
> >>
> >> "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
> >> handle 10x
> >> (3-5TB)".
> >>
> >> I have the feeling that those recommendations are sensitive according
> many
> >> criteria such as :
> >> - your hardware
> >> - the compaction strategy
> >> - ...
> >>
> >> It looks that LCS lower those limitations.
> >>
> >> Increasing the size of sstables might help if you have enough CPU and
> you
> >> can put more load on your I/O system (@Andrei, I am interested by the
> >> results of your  experimentation about large sstable files)
> >>
> >> From my point of view, there are some usage patterns where it is better
> to
> >> have many small servers than a few large servers. Probably, it is
> better to
> >> have many small servers if you need LCS for large tables.
> >>
> >> Just my 2 cents.
> >>
> >> Jean-Armel
> >>
> >> 2014-11-24 19:56 GMT+01:00 Robert Coli :
> >>>
> >>> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev <
> ngrigor...@gmail.com>
> >>> wrote:
> 
>  One of the obvious recommendations I have received was to run more
> than
>  one instance of C* per host. Makes sense - it will reduce the amount
> of data
>  per node and will make better use of the resources.
> >>>
> >>>
> >>> This is usually a Bad Idea to do in production.
> >>>
> >>> =Rob
> >>>
> >>
> >>
> >
> >
> >
> > --
> > Nikolai Grigoriev
> > (514) 772-5178
>



-- 
Nikolai Grigoriev
(514) 772-5178


Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Nikolai,

Just in case you've missed my comment in the thread (guess you have) -
increasing sstable size does nothing (in our case at least). That is,
it's not worse but the load pattern is still the same - doing nothing
most of the time. So, I switched to STCS and we will have to live with
extra storage cost - storage is way cheaper than cpu etc anyhow:-)

On Tue, Nov 25, 2014 at 5:53 PM, Nikolai Grigoriev  wrote:
> Hi Jean-Armel,
>
> I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but there
> are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra 2.0.10.
>
> I have about 1,8Tb of data per node now in total, which falls into that
> range.
>
> As I said, it is really a problem with large amount of data in a single CF,
> not total amount of data. Quite often the nodes are idle yet having quite a
> bit of pending compactions. I have discussed it with other members of C*
> community and DataStax guys and, they have confirmed my observation.
>
> I believe that increasing the sstable size won't help at all and probably
> will make the things worse - everything else being equal, of course. But I
> would like to hear from Andrei when he is done with his test.
>
> Regarding the last statement - yes, C* clearly likes many small servers more
> than fewer large ones. But it is all relative - and can be all recalculated
> to $$$ :) C* is all about partitioning of everything - storage,
> traffic...Less data per node and more nodes give you lower latency, lower
> heap usage etc, etc. I think I have learned this with my project. Somewhat
> hard way but still, nothing is better than the personal experience :)
>
> On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce  wrote:
>>
>> Hi Andrei, Hi Nicolai,
>>
>> Which version of C* are you using ?
>>
>> There are some recommendations about the max storage per node :
>> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
>>
>> "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
>> handle 10x
>> (3-5TB)".
>>
>> I have the feeling that those recommendations are sensitive according many
>> criteria such as :
>> - your hardware
>> - the compaction strategy
>> - ...
>>
>> It looks that LCS lower those limitations.
>>
>> Increasing the size of sstables might help if you have enough CPU and you
>> can put more load on your I/O system (@Andrei, I am interested by the
>> results of your  experimentation about large sstable files)
>>
>> From my point of view, there are some usage patterns where it is better to
>> have many small servers than a few large servers. Probably, it is better to
>> have many small servers if you need LCS for large tables.
>>
>> Just my 2 cents.
>>
>> Jean-Armel
>>
>> 2014-11-24 19:56 GMT+01:00 Robert Coli :
>>>
>>> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev 
>>> wrote:

 One of the obvious recommendations I have received was to run more than
 one instance of C* per host. Makes sense - it will reduce the amount of 
 data
 per node and will make better use of the resources.
>>>
>>>
>>> This is usually a Bad Idea to do in production.
>>>
>>> =Rob
>>>
>>
>>
>
>
>
> --
> Nikolai Grigoriev
> (514) 772-5178


Re: Compaction Strategy guidance

2014-11-25 Thread Nikolai Grigoriev
Hi Jean-Armel,

I am using latest and greatest DSE 4.5.2 (4.5.3 in another cluster but
there are no relevant changes between 4.5.2 and 4.5.3) - thus, Cassandra
2.0.10.

I have about 1,8Tb of data per node now in total, which falls into that
range.

As I said, it is really a problem with large amount of data in a single CF,
not total amount of data. Quite often the nodes are idle yet having quite a
bit of pending compactions. I have discussed it with other members of C*
community and DataStax guys and, they have confirmed my observation.

I believe that increasing the sstable size won't help at all and probably
will make the things worse - everything else being equal, of course. But I
would like to hear from Andrei when he is done with his test.

Regarding the last statement - yes, C* clearly likes many small servers
more than fewer large ones. But it is all relative - and can be all
recalculated to $$$ :) C* is all about partitioning of everything -
storage, traffic...Less data per node and more nodes give you lower
latency, lower heap usage etc, etc. I think I have learned this with my
project. Somewhat hard way but still, nothing is better than the personal
experience :)

On Tue, Nov 25, 2014 at 3:23 AM, Jean-Armel Luce  wrote:

> Hi Andrei, Hi Nicolai,
>
> Which version of C* are you using ?
>
> There are some recommendations about the max storage per node :
> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
>
> "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
> handle 10x
> (3-5TB)".
>
> I have the feeling that those recommendations are sensitive according many
> criteria such as :
> - your hardware
> - the compaction strategy
> - ...
>
> It looks that LCS lower those limitations.
>
> Increasing the size of sstables might help if you have enough CPU and you
> can put more load on your I/O system (@Andrei, I am interested by the
> results of your  experimentation about large sstable files)
>
> From my point of view, there are some usage patterns where it is better to
> have many small servers than a few large servers. Probably, it is better to
> have many small servers if you need LCS for large tables.
>
> Just my 2 cents.
>
> Jean-Armel
>
> 2014-11-24 19:56 GMT+01:00 Robert Coli :
>
>> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev 
>> wrote:
>>
>>> One of the obvious recommendations I have received was to run more than
>>> one instance of C* per host. Makes sense - it will reduce the amount of
>>> data per node and will make better use of the resources.
>>>
>>
>> This is usually a Bad Idea to do in production.
>>
>> =Rob
>>
>>
>
>


-- 
Nikolai Grigoriev
(514) 772-5178


Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Yep, Marcus, I know. It's mainly a question of cost of those extra x2
disks, you know. Our "final" setup will be more like 30TB, so doubling
it is still some cost. But i guess, we will have to live with it

On Tue, Nov 25, 2014 at 1:26 PM, Marcus Eriksson  wrote:
> If you are that write-heavy you should definitely go with STCS, LCS
> optimizes for reads by doing more compactions
>
> /Marcus
>
> On Tue, Nov 25, 2014 at 11:22 AM, Andrei Ivanov  wrote:
>>
>> Hi Jean-Armel, Nikolai,
>>
>> 1. Increasing sstable size doesn't work (well, I think, unless we
>> "overscale" - add more nodes than really necessary, which is
>> prohibitive for us in a way). Essentially there is no change.  I gave
>> up and will go for STCS;-(
>> 2. We use 2.0.11 as of now
>> 3. We are running on EC2 c3.8xlarge instances with EBS volumes for data
>> (GP SSD)
>>
>> Jean-Armel, I believe that what you say about many small instances is
>> absolutely true. But, is not good in our case - we write a lot and
>> almost never read what we've written. That is, we want to be able to
>> read everything, but in reality we hardly read 1%, I think. This
>> implies that smaller instances are of no use in terms of read
>> performance for us. And generally nstances/cpu/ram is more expensive
>> than storage. So, we really would like to have instances with large
>> storage.
>>
>> Andrei.
>>
>>
>>
>>
>>
>> On Tue, Nov 25, 2014 at 11:23 AM, Jean-Armel Luce 
>> wrote:
>> > Hi Andrei, Hi Nicolai,
>> >
>> > Which version of C* are you using ?
>> >
>> > There are some recommendations about the max storage per node :
>> >
>> > http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
>> >
>> > "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
>> > handle
>> > 10x
>> > (3-5TB)".
>> >
>> > I have the feeling that those recommendations are sensitive according
>> > many
>> > criteria such as :
>> > - your hardware
>> > - the compaction strategy
>> > - ...
>> >
>> > It looks that LCS lower those limitations.
>> >
>> > Increasing the size of sstables might help if you have enough CPU and
>> > you
>> > can put more load on your I/O system (@Andrei, I am interested by the
>> > results of your  experimentation about large sstable files)
>> >
>> > From my point of view, there are some usage patterns where it is better
>> > to
>> > have many small servers than a few large servers. Probably, it is better
>> > to
>> > have many small servers if you need LCS for large tables.
>> >
>> > Just my 2 cents.
>> >
>> > Jean-Armel
>> >
>> > 2014-11-24 19:56 GMT+01:00 Robert Coli :
>> >>
>> >> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev
>> >> 
>> >> wrote:
>> >>>
>> >>> One of the obvious recommendations I have received was to run more
>> >>> than
>> >>> one instance of C* per host. Makes sense - it will reduce the amount
>> >>> of data
>> >>> per node and will make better use of the resources.
>> >>
>> >>
>> >> This is usually a Bad Idea to do in production.
>> >>
>> >> =Rob
>> >>
>> >
>> >
>
>


Re: Compaction Strategy guidance

2014-11-25 Thread Marcus Eriksson
If you are that write-heavy you should definitely go with STCS, LCS
optimizes for reads by doing more compactions

/Marcus

On Tue, Nov 25, 2014 at 11:22 AM, Andrei Ivanov  wrote:

> Hi Jean-Armel, Nikolai,
>
> 1. Increasing sstable size doesn't work (well, I think, unless we
> "overscale" - add more nodes than really necessary, which is
> prohibitive for us in a way). Essentially there is no change.  I gave
> up and will go for STCS;-(
> 2. We use 2.0.11 as of now
> 3. We are running on EC2 c3.8xlarge instances with EBS volumes for data
> (GP SSD)
>
> Jean-Armel, I believe that what you say about many small instances is
> absolutely true. But, is not good in our case - we write a lot and
> almost never read what we've written. That is, we want to be able to
> read everything, but in reality we hardly read 1%, I think. This
> implies that smaller instances are of no use in terms of read
> performance for us. And generally nstances/cpu/ram is more expensive
> than storage. So, we really would like to have instances with large
> storage.
>
> Andrei.
>
>
>
>
>
> On Tue, Nov 25, 2014 at 11:23 AM, Jean-Armel Luce 
> wrote:
> > Hi Andrei, Hi Nicolai,
> >
> > Which version of C* are you using ?
> >
> > There are some recommendations about the max storage per node :
> >
> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
> >
> > "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
> handle
> > 10x
> > (3-5TB)".
> >
> > I have the feeling that those recommendations are sensitive according
> many
> > criteria such as :
> > - your hardware
> > - the compaction strategy
> > - ...
> >
> > It looks that LCS lower those limitations.
> >
> > Increasing the size of sstables might help if you have enough CPU and you
> > can put more load on your I/O system (@Andrei, I am interested by the
> > results of your  experimentation about large sstable files)
> >
> > From my point of view, there are some usage patterns where it is better
> to
> > have many small servers than a few large servers. Probably, it is better
> to
> > have many small servers if you need LCS for large tables.
> >
> > Just my 2 cents.
> >
> > Jean-Armel
> >
> > 2014-11-24 19:56 GMT+01:00 Robert Coli :
> >>
> >> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev <
> ngrigor...@gmail.com>
> >> wrote:
> >>>
> >>> One of the obvious recommendations I have received was to run more than
> >>> one instance of C* per host. Makes sense - it will reduce the amount
> of data
> >>> per node and will make better use of the resources.
> >>
> >>
> >> This is usually a Bad Idea to do in production.
> >>
> >> =Rob
> >>
> >
> >
>


Re: Compaction Strategy guidance

2014-11-25 Thread Andrei Ivanov
Hi Jean-Armel, Nikolai,

1. Increasing sstable size doesn't work (well, I think, unless we
"overscale" - add more nodes than really necessary, which is
prohibitive for us in a way). Essentially there is no change.  I gave
up and will go for STCS;-(
2. We use 2.0.11 as of now
3. We are running on EC2 c3.8xlarge instances with EBS volumes for data (GP SSD)

Jean-Armel, I believe that what you say about many small instances is
absolutely true. But, is not good in our case - we write a lot and
almost never read what we've written. That is, we want to be able to
read everything, but in reality we hardly read 1%, I think. This
implies that smaller instances are of no use in terms of read
performance for us. And generally nstances/cpu/ram is more expensive
than storage. So, we really would like to have instances with large
storage.

Andrei.





On Tue, Nov 25, 2014 at 11:23 AM, Jean-Armel Luce  wrote:
> Hi Andrei, Hi Nicolai,
>
> Which version of C* are you using ?
>
> There are some recommendations about the max storage per node :
> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
>
> "For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to handle
> 10x
> (3-5TB)".
>
> I have the feeling that those recommendations are sensitive according many
> criteria such as :
> - your hardware
> - the compaction strategy
> - ...
>
> It looks that LCS lower those limitations.
>
> Increasing the size of sstables might help if you have enough CPU and you
> can put more load on your I/O system (@Andrei, I am interested by the
> results of your  experimentation about large sstable files)
>
> From my point of view, there are some usage patterns where it is better to
> have many small servers than a few large servers. Probably, it is better to
> have many small servers if you need LCS for large tables.
>
> Just my 2 cents.
>
> Jean-Armel
>
> 2014-11-24 19:56 GMT+01:00 Robert Coli :
>>
>> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev 
>> wrote:
>>>
>>> One of the obvious recommendations I have received was to run more than
>>> one instance of C* per host. Makes sense - it will reduce the amount of data
>>> per node and will make better use of the resources.
>>
>>
>> This is usually a Bad Idea to do in production.
>>
>> =Rob
>>
>
>


Re: Compaction Strategy guidance

2014-11-25 Thread Jean-Armel Luce
Hi Andrei, Hi Nicolai,

Which version of C* are you using ?

There are some recommendations about the max storage per node :
http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2

"For 1.0 we recommend 300-500GB. For 1.2 we are looking to be able to
handle 10x
(3-5TB)".

I have the feeling that those recommendations are sensitive according many
criteria such as :
- your hardware
- the compaction strategy
- ...

It looks that LCS lower those limitations.

Increasing the size of sstables might help if you have enough CPU and you
can put more load on your I/O system (@Andrei, I am interested by the
results of your  experimentation about large sstable files)

>From my point of view, there are some usage patterns where it is better to
have many small servers than a few large servers. Probably, it is better to
have many small servers if you need LCS for large tables.

Just my 2 cents.

Jean-Armel

2014-11-24 19:56 GMT+01:00 Robert Coli :

> On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev 
> wrote:
>
>> One of the obvious recommendations I have received was to run more than
>> one instance of C* per host. Makes sense - it will reduce the amount of
>> data per node and will make better use of the resources.
>>
>
> This is usually a Bad Idea to do in production.
>
> =Rob
>
>


Re: Compaction Strategy guidance

2014-11-24 Thread Robert Coli
On Mon, Nov 24, 2014 at 6:48 AM, Nikolai Grigoriev 
wrote:

> One of the obvious recommendations I have received was to run more than
> one instance of C* per host. Makes sense - it will reduce the amount of
> data per node and will make better use of the resources.
>

This is usually a Bad Idea to do in production.

=Rob


Re: Compaction Strategy guidance

2014-11-24 Thread Andrei Ivanov
: 3379391
>> >> > Compacted partition mean bytes: 172660
>> >> > Average live cells per slice (last five minutes): 495.0
>> >> > Average tombstones per slice (last five minutes): 0.0
>> >> >
>> >> > Another table of similar structure (same number of rows) is about 4x
>> >> > times
>> >> > smaller. That table does not suffer from those issues - it compacts
>> >> > well
>> >> > and
>> >> > efficiently.
>> >> >
>> >> > On Mon, Nov 24, 2014 at 2:30 AM, Jean-Armel Luce 
>> >> > wrote:
>> >> >>
>> >> >> Hi Nikolai,
>> >> >>
>> >> >> Please could you clarify a little bit what you call "a large amount
>> >> >> of
>> >> >> data" ?
>> >> >>
>> >> >> How many tables ?
>> >> >> How many rows in your largest table ?
>> >> >> How many GB in your largest table ?
>> >> >> How many GB per node ?
>> >> >>
>> >> >> Thanks.
>> >> >>
>> >> >>
>> >> >>
>> >> >> 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce :
>> >> >>>
>> >> >>> Hi Nikolai,
>> >> >>>
>> >> >>> Thanks for those informations.
>> >> >>>
>> >> >>> Please could you clarify a little bit what you call "
>> >> >>>
>> >> >>> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev :
>> >> >>>>
>> >> >>>> Just to clarify - when I was talking about the large amount of
>> >> >>>> data I
>> >> >>>> really meant large amount of data per node in a single CF (table).
>> >> >>>> LCS does
>> >> >>>> not seem to like it when it gets thousands of sstables (makes 4-5
>> >> >>>> levels).
>> >> >>>>
>> >> >>>> When bootstraping a new node you'd better enable that option from
>> >> >>>> CASSANDRA-6621 (the one that disables STCS in L0). But it will
>> >> >>>> still
>> >> >>>> be a
>> >> >>>> mess - I have a node that I have bootstrapped ~2 weeks ago.
>> >> >>>> Initially
>> >> >>>> it had
>> >> >>>> 7,5K pending compactions, now it has almost stabilized ad 4,6K.
>> >> >>>> Does
>> >> >>>> not go
>> >> >>>> down. Number of sstables at L0  is over 11K and it is slowly
>> >> >>>> slowly
>> >> >>>> building
>> >> >>>> upper levels. Total number of sstables is 4x the normal amount.
>> >> >>>> Now I
>> >> >>>> am not
>> >> >>>> entirely sure if this node will ever get back to normal life. And
>> >> >>>> believe me
>> >> >>>> - this is not because of I/O, I have SSDs everywhere and 16
>> >> >>>> physical
>> >> >>>> cores.
>> >> >>>> This machine is barely using 1-3 cores at most of the time. The
>> >> >>>> problem is
>> >> >>>> that allowing STCS fallback is not a good option either - it will
>> >> >>>> quickly
>> >> >>>> result in a few 200Gb+ sstables in my configuration and then these
>> >> >>>> sstables
>> >> >>>> will never be compacted. Plus, it will require close to 2x disk
>> >> >>>> space
>> >> >>>> on
>> >> >>>> EVERY disk in my JBOD configuration...this will kill the node
>> >> >>>> sooner
>> >> >>>> or
>> >> >>>> later. This is all because all sstables after bootstrap end at L0
>> >> >>>> and
>> >> >>>> then
>> >> >>>> the process slowly slowly moves them to other levels. If you have
>> >> >>>> write
>> >> >>>> traffic to that CF then the number of sstables and L0 will grow
>> >> >>>> quickly -
>> >> >>>> like it happens in my case now

Re: Compaction Strategy guidance

2014-11-24 Thread Nikolai Grigoriev
 2:30 AM, Jean-Armel Luce 
> >> > wrote:
> >> >>
> >> >> Hi Nikolai,
> >> >>
> >> >> Please could you clarify a little bit what you call "a large amount
> of
> >> >> data" ?
> >> >>
> >> >> How many tables ?
> >> >> How many rows in your largest table ?
> >> >> How many GB in your largest table ?
> >> >> How many GB per node ?
> >> >>
> >> >> Thanks.
> >> >>
> >> >>
> >> >>
> >> >> 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce :
> >> >>>
> >> >>> Hi Nikolai,
> >> >>>
> >> >>> Thanks for those informations.
> >> >>>
> >> >>> Please could you clarify a little bit what you call "
> >> >>>
> >> >>> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev :
> >> >>>>
> >> >>>> Just to clarify - when I was talking about the large amount of
> data I
> >> >>>> really meant large amount of data per node in a single CF (table).
> >> >>>> LCS does
> >> >>>> not seem to like it when it gets thousands of sstables (makes 4-5
> >> >>>> levels).
> >> >>>>
> >> >>>> When bootstraping a new node you'd better enable that option from
> >> >>>> CASSANDRA-6621 (the one that disables STCS in L0). But it will
> still
> >> >>>> be a
> >> >>>> mess - I have a node that I have bootstrapped ~2 weeks ago.
> Initially
> >> >>>> it had
> >> >>>> 7,5K pending compactions, now it has almost stabilized ad 4,6K.
> Does
> >> >>>> not go
> >> >>>> down. Number of sstables at L0  is over 11K and it is slowly slowly
> >> >>>> building
> >> >>>> upper levels. Total number of sstables is 4x the normal amount.
> Now I
> >> >>>> am not
> >> >>>> entirely sure if this node will ever get back to normal life. And
> >> >>>> believe me
> >> >>>> - this is not because of I/O, I have SSDs everywhere and 16
> physical
> >> >>>> cores.
> >> >>>> This machine is barely using 1-3 cores at most of the time. The
> >> >>>> problem is
> >> >>>> that allowing STCS fallback is not a good option either - it will
> >> >>>> quickly
> >> >>>> result in a few 200Gb+ sstables in my configuration and then these
> >> >>>> sstables
> >> >>>> will never be compacted. Plus, it will require close to 2x disk
> space
> >> >>>> on
> >> >>>> EVERY disk in my JBOD configuration...this will kill the node
> sooner
> >> >>>> or
> >> >>>> later. This is all because all sstables after bootstrap end at L0
> and
> >> >>>> then
> >> >>>> the process slowly slowly moves them to other levels. If you have
> >> >>>> write
> >> >>>> traffic to that CF then the number of sstables and L0 will grow
> >> >>>> quickly -
> >> >>>> like it happens in my case now.
> >> >>>>
> >> >>>> Once something like
> >> >>>> https://issues.apache.org/jira/browse/CASSANDRA-8301
> >> >>>> is implemented it may be better.
> >> >>>>
> >> >>>>
> >> >>>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov <
> aiva...@iponweb.net>
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> Stephane,
> >> >>>>>
> >> >>>>> We are having a somewhat similar C* load profile. Hence some
> >> >>>>> comments
> >> >>>>> in addition Nikolai's answer.
> >> >>>>> 1. Fallback to STCS - you can disable it actually
> >> >>>>> 2. Based on our experience, if you have a lot of data per node,
> LCS
> >> >>>>> may work just fine. That is, till the moment you decide to join
> >> >>>>> another node - chances are that the newly added node will not be
> >> >>>

Re: Compaction Strategy guidance

2014-11-24 Thread Andrei Ivanov
Now I
>> >>>> am not
>> >>>> entirely sure if this node will ever get back to normal life. And
>> >>>> believe me
>> >>>> - this is not because of I/O, I have SSDs everywhere and 16 physical
>> >>>> cores.
>> >>>> This machine is barely using 1-3 cores at most of the time. The
>> >>>> problem is
>> >>>> that allowing STCS fallback is not a good option either - it will
>> >>>> quickly
>> >>>> result in a few 200Gb+ sstables in my configuration and then these
>> >>>> sstables
>> >>>> will never be compacted. Plus, it will require close to 2x disk space
>> >>>> on
>> >>>> EVERY disk in my JBOD configuration...this will kill the node sooner
>> >>>> or
>> >>>> later. This is all because all sstables after bootstrap end at L0 and
>> >>>> then
>> >>>> the process slowly slowly moves them to other levels. If you have
>> >>>> write
>> >>>> traffic to that CF then the number of sstables and L0 will grow
>> >>>> quickly -
>> >>>> like it happens in my case now.
>> >>>>
>> >>>> Once something like
>> >>>> https://issues.apache.org/jira/browse/CASSANDRA-8301
>> >>>> is implemented it may be better.
>> >>>>
>> >>>>
>> >>>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov 
>> >>>> wrote:
>> >>>>>
>> >>>>> Stephane,
>> >>>>>
>> >>>>> We are having a somewhat similar C* load profile. Hence some
>> >>>>> comments
>> >>>>> in addition Nikolai's answer.
>> >>>>> 1. Fallback to STCS - you can disable it actually
>> >>>>> 2. Based on our experience, if you have a lot of data per node, LCS
>> >>>>> may work just fine. That is, till the moment you decide to join
>> >>>>> another node - chances are that the newly added node will not be
>> >>>>> able
>> >>>>> to compact what it gets from old nodes. In your case, if you switch
>> >>>>> strategy the same thing may happen. This is all due to limitations
>> >>>>> mentioned by Nikolai.
>> >>>>>
>> >>>>> Andrei,
>> >>>>>
>> >>>>>
>> >>>>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G.
>> >>>>> 
>> >>>>> wrote:
>> >>>>> > ABUSE
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > YA NO QUIERO MAS MAILS SOY DE MEXICO
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
>> >>>>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
>> >>>>> > Para: user@cassandra.apache.org
>> >>>>> > Asunto: Re: Compaction Strategy guidance
>> >>>>> > Importancia: Alta
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > Stephane,
>> >>>>> >
>> >>>>> > As everything good, LCS comes at certain price.
>> >>>>> >
>> >>>>> > LCS will put most load on you I/O system (if you use spindles -
>> >>>>> > you
>> >>>>> > may need
>> >>>>> > to be careful about that) and on CPU. Also LCS (by default) may
>> >>>>> > fall
>> >>>>> > back to
>> >>>>> > STCS if it is falling behind (which is very possible with heavy
>> >>>>> > writing
>> >>>>> > activity) and this will result in higher disk space usage. Also
>> >>>>> > LCS
>> >>>>> > has
>> >>>>> > certain limitation I have discovered lately. Sometimes LCS may not
>> >>>>> > be
>> >>>>> > able
>> >>>>> > to use all your node's resources (algorithm limitations) and this
>> >>>>> > reduces
>> >>>>> > the overall compaction throughput. This may happen if you have a
>> >>>>> > large
>> >>>>> > column family with lots of data per node. STCS won't have this
>> >>>>> > limitation.
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > By the way, the primary goal of LCS is to reduce the number of
>> >>>>> > sstables C*
>> >>>>> > has to look at to find your data. With LCS properly functioning
>> >>>>> > this
>> >>>>> > number
>> >>>>> > will be most likely between something like 1 and 3 for most of the
>> >>>>> > reads.
>> >>>>> > But if you do few reads and not concerned about the latency today,
>> >>>>> > most
>> >>>>> > likely LCS may only save you some disk space.
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay
>> >>>>> > 
>> >>>>> > wrote:
>> >>>>> >
>> >>>>> > Hi there,
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > use case:
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > - Heavy write app, few reads.
>> >>>>> >
>> >>>>> > - Lots of updates of rows / columns.
>> >>>>> >
>> >>>>> > - Current performance is fine, for both writes and reads..
>> >>>>> >
>> >>>>> > - Currently using SizedCompactionStrategy
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > We're trying to limit the amount of storage used during
>> >>>>> > compaction.
>> >>>>> > Should
>> >>>>> > we switch to LeveledCompactionStrategy?
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > Thanks
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> >
>> >>>>> > --
>> >>>>> >
>> >>>>> > Nikolai Grigoriev
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Nikolai Grigoriev
>> >>>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Nikolai Grigoriev
>> >
>
>
>
>
> --
> Nikolai Grigoriev
> (514) 772-5178


Re: Compaction Strategy guidance

2014-11-24 Thread Nikolai Grigoriev
later. This is all because all sstables after bootstrap end at L0 and
> then
> >>>> the process slowly slowly moves them to other levels. If you have
> write
> >>>> traffic to that CF then the number of sstables and L0 will grow
> quickly -
> >>>> like it happens in my case now.
> >>>>
> >>>> Once something like
> https://issues.apache.org/jira/browse/CASSANDRA-8301
> >>>> is implemented it may be better.
> >>>>
> >>>>
> >>>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov 
> >>>> wrote:
> >>>>>
> >>>>> Stephane,
> >>>>>
> >>>>> We are having a somewhat similar C* load profile. Hence some comments
> >>>>> in addition Nikolai's answer.
> >>>>> 1. Fallback to STCS - you can disable it actually
> >>>>> 2. Based on our experience, if you have a lot of data per node, LCS
> >>>>> may work just fine. That is, till the moment you decide to join
> >>>>> another node - chances are that the newly added node will not be able
> >>>>> to compact what it gets from old nodes. In your case, if you switch
> >>>>> strategy the same thing may happen. This is all due to limitations
> >>>>> mentioned by Nikolai.
> >>>>>
> >>>>> Andrei,
> >>>>>
> >>>>>
> >>>>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G.  >
> >>>>> wrote:
> >>>>> > ABUSE
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > YA NO QUIERO MAS MAILS SOY DE MEXICO
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
> >>>>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
> >>>>> > Para: user@cassandra.apache.org
> >>>>> > Asunto: Re: Compaction Strategy guidance
> >>>>> > Importancia: Alta
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > Stephane,
> >>>>> >
> >>>>> > As everything good, LCS comes at certain price.
> >>>>> >
> >>>>> > LCS will put most load on you I/O system (if you use spindles - you
> >>>>> > may need
> >>>>> > to be careful about that) and on CPU. Also LCS (by default) may
> fall
> >>>>> > back to
> >>>>> > STCS if it is falling behind (which is very possible with heavy
> >>>>> > writing
> >>>>> > activity) and this will result in higher disk space usage. Also LCS
> >>>>> > has
> >>>>> > certain limitation I have discovered lately. Sometimes LCS may not
> be
> >>>>> > able
> >>>>> > to use all your node's resources (algorithm limitations) and this
> >>>>> > reduces
> >>>>> > the overall compaction throughput. This may happen if you have a
> >>>>> > large
> >>>>> > column family with lots of data per node. STCS won't have this
> >>>>> > limitation.
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > By the way, the primary goal of LCS is to reduce the number of
> >>>>> > sstables C*
> >>>>> > has to look at to find your data. With LCS properly functioning
> this
> >>>>> > number
> >>>>> > will be most likely between something like 1 and 3 for most of the
> >>>>> > reads.
> >>>>> > But if you do few reads and not concerned about the latency today,
> >>>>> > most
> >>>>> > likely LCS may only save you some disk space.
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay
> >>>>> > 
> >>>>> > wrote:
> >>>>> >
> >>>>> > Hi there,
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > use case:
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > - Heavy write app, few reads.
> >>>>> >
> >>>>> > - Lots of updates of rows / columns.
> >>>>> >
> >>>>> > - Current performance is fine, for both writes and reads..
> >>>>> >
> >>>>> > - Currently using SizedCompactionStrategy
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > We're trying to limit the amount of storage used during compaction.
> >>>>> > Should
> >>>>> > we switch to LeveledCompactionStrategy?
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > Thanks
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > --
> >>>>> >
> >>>>> > Nikolai Grigoriev
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Nikolai Grigoriev
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > Nikolai Grigoriev
> >
>



-- 
Nikolai Grigoriev
(514) 772-5178


Re: Compaction Strategy guidance

2014-11-24 Thread Andrei Ivanov
>>
>>>>> Stephane,
>>>>>
>>>>> We are having a somewhat similar C* load profile. Hence some comments
>>>>> in addition Nikolai's answer.
>>>>> 1. Fallback to STCS - you can disable it actually
>>>>> 2. Based on our experience, if you have a lot of data per node, LCS
>>>>> may work just fine. That is, till the moment you decide to join
>>>>> another node - chances are that the newly added node will not be able
>>>>> to compact what it gets from old nodes. In your case, if you switch
>>>>> strategy the same thing may happen. This is all due to limitations
>>>>> mentioned by Nikolai.
>>>>>
>>>>> Andrei,
>>>>>
>>>>>
>>>>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. 
>>>>> wrote:
>>>>> > ABUSE
>>>>> >
>>>>> >
>>>>> >
>>>>> > YA NO QUIERO MAS MAILS SOY DE MEXICO
>>>>> >
>>>>> >
>>>>> >
>>>>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
>>>>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
>>>>> > Para: user@cassandra.apache.org
>>>>> > Asunto: Re: Compaction Strategy guidance
>>>>> > Importancia: Alta
>>>>> >
>>>>> >
>>>>> >
>>>>> > Stephane,
>>>>> >
>>>>> > As everything good, LCS comes at certain price.
>>>>> >
>>>>> > LCS will put most load on you I/O system (if you use spindles - you
>>>>> > may need
>>>>> > to be careful about that) and on CPU. Also LCS (by default) may fall
>>>>> > back to
>>>>> > STCS if it is falling behind (which is very possible with heavy
>>>>> > writing
>>>>> > activity) and this will result in higher disk space usage. Also LCS
>>>>> > has
>>>>> > certain limitation I have discovered lately. Sometimes LCS may not be
>>>>> > able
>>>>> > to use all your node's resources (algorithm limitations) and this
>>>>> > reduces
>>>>> > the overall compaction throughput. This may happen if you have a
>>>>> > large
>>>>> > column family with lots of data per node. STCS won't have this
>>>>> > limitation.
>>>>> >
>>>>> >
>>>>> >
>>>>> > By the way, the primary goal of LCS is to reduce the number of
>>>>> > sstables C*
>>>>> > has to look at to find your data. With LCS properly functioning this
>>>>> > number
>>>>> > will be most likely between something like 1 and 3 for most of the
>>>>> > reads.
>>>>> > But if you do few reads and not concerned about the latency today,
>>>>> > most
>>>>> > likely LCS may only save you some disk space.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay
>>>>> > 
>>>>> > wrote:
>>>>> >
>>>>> > Hi there,
>>>>> >
>>>>> >
>>>>> >
>>>>> > use case:
>>>>> >
>>>>> >
>>>>> >
>>>>> > - Heavy write app, few reads.
>>>>> >
>>>>> > - Lots of updates of rows / columns.
>>>>> >
>>>>> > - Current performance is fine, for both writes and reads..
>>>>> >
>>>>> > - Currently using SizedCompactionStrategy
>>>>> >
>>>>> >
>>>>> >
>>>>> > We're trying to limit the amount of storage used during compaction.
>>>>> > Should
>>>>> > we switch to LeveledCompactionStrategy?
>>>>> >
>>>>> >
>>>>> >
>>>>> > Thanks
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> >
>>>>> > Nikolai Grigoriev
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nikolai Grigoriev
>>>>
>>>
>>
>
>
>
> --
> Nikolai Grigoriev
>


Re: Compaction Strategy guidance

2014-11-24 Thread Nikolai Grigoriev
Jean-Armel,

I have only two large tables, the rest is super-small. In the test cluster
of 15 nodes the largest table has about 110M rows. Its total size is about
1,26Gb per node (total disk space used per node for that CF). It's got
about 5K sstables per node - the sstable size is 256Mb. cfstats on a
"healthy" node look like this:

Read Count: 8973748
Read Latency: 16.130059053251774 ms.
Write Count: 32099455
Write Latency: 1.6124713938912671 ms.
Pending Tasks: 0
Table: wm_contacts
SSTable count: 5195
SSTables in each level: [27/4, 11/10, 104/100, 1053/1000, 4000, 0,
0, 0, 0]
Space used (live), bytes: 1266060391852
Space used (total), bytes: 1266144170869
SSTable Compression Ratio: 0.32604853410787327
Number of keys (estimate): 25696000
Memtable cell count: 71402
Memtable data size, bytes: 26938402
Memtable switch count: 9489
Local read count: 8973748
Local read latency: 17.696 ms
Local write count: 32099471
Local write latency: 1.732 ms
Pending tasks: 0
Bloom filter false positives: 32248
Bloom filter false ratio: 0.50685
Bloom filter space used, bytes: 20744432
Compacted partition minimum bytes: 104
Compacted partition maximum bytes: 3379391
Compacted partition mean bytes: 172660
Average live cells per slice (last five minutes): 495.0
Average tombstones per slice (last five minutes): 0.0

Another table of similar structure (same number of rows) is about 4x times
smaller. That table does not suffer from those issues - it compacts well
and efficiently.

On Mon, Nov 24, 2014 at 2:30 AM, Jean-Armel Luce  wrote:

> Hi Nikolai,
>
> Please could you clarify a little bit what you call "a large amount of
> data" ?
>
> How many tables ?
> How many rows in your largest table ?
> How many GB in your largest table ?
> How many GB per node ?
>
> Thanks.
>
>
>
> 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce :
>
>> Hi Nikolai,
>>
>> Thanks for those informations.
>>
>> Please could you clarify a little bit what you call "
>>
>> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev :
>>
>>> Just to clarify - when I was talking about the large amount of data I
>>> really meant large amount of data per node in a single CF (table). LCS does
>>> not seem to like it when it gets thousands of sstables (makes 4-5 levels).
>>>
>>> When bootstraping a new node you'd better enable that option from
>>> CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
>>> mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it
>>> had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does
>>> not go down. Number of sstables at L0  is over 11K and it is slowly slowly
>>> building upper levels. Total number of sstables is 4x the normal amount.
>>> Now I am not entirely sure if this node will ever get back to normal life.
>>> And believe me - this is not because of I/O, I have SSDs everywhere and 16
>>> physical cores. This machine is barely using 1-3 cores at most of the time.
>>> The problem is that allowing STCS fallback is not a good option either - it
>>> will quickly result in a few 200Gb+ sstables in my configuration and then
>>> these sstables will never be compacted. Plus, it will require close to 2x
>>> disk space on EVERY disk in my JBOD configuration...this will kill the node
>>> sooner or later. This is all because all sstables after bootstrap end at L0
>>> and then the process slowly slowly moves them to other levels. If you have
>>> write traffic to that CF then the number of sstables and L0 will grow
>>> quickly - like it happens in my case now.
>>>
>>> Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301
>>> is implemented it may be better.
>>>
>>>
>>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov 
>>> wrote:
>>>
>>>> Stephane,
>>>>
>>>> We are having a somewhat similar C* load profile. Hence some comments
>>>> in addition Nikolai's answer.
>>>> 1. Fallback to STCS - you can disable it actually
>>>> 2. Based on our experience, if you have a lot of data per node, LCS
>>>> may work just fine. That is, till the moment you decide to join
>>>> another node - chances are that the newly added node will not be able
>>>> to compact what it gets from old nodes. In your case, if you switch
>>>> strategy the same thing may happen. This is all due to limitations
>>>>

Re: Compaction Strategy guidance

2014-11-23 Thread Andrei Ivanov
Jean-Armel,

I have the same problem/state as Nikolai. Here are my stats:
~ 1 table
~ 10B records
~ 2TB/node x 6 nodes

Nikolai,
I'm sort of wondering if switching to some larger sstable_size_in_mb
(say 4096 or 8192 or something) with LCS may be a solution, even if
not absolutely permanent?
As for huge sstables, I do already have some 400-500GB tables. The
only idea how I can manage to compact them in the future is to offline
split them at some point. Does it make sense?

(I'm still doing a test drive and really need to understand how we are
going to handle that in production)

Andrei.



On Mon, Nov 24, 2014 at 10:30 AM, Jean-Armel Luce  wrote:
> Hi Nikolai,
>
> Please could you clarify a little bit what you call "a large amount of data"
> ?
>
> How many tables ?
> How many rows in your largest table ?
> How many GB in your largest table ?
> How many GB per node ?
>
> Thanks.
>
>
>
> 2014-11-24 8:27 GMT+01:00 Jean-Armel Luce :
>>
>> Hi Nikolai,
>>
>> Thanks for those informations.
>>
>> Please could you clarify a little bit what you call "
>>
>> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev :
>>>
>>> Just to clarify - when I was talking about the large amount of data I
>>> really meant large amount of data per node in a single CF (table). LCS does
>>> not seem to like it when it gets thousands of sstables (makes 4-5 levels).
>>>
>>> When bootstraping a new node you'd better enable that option from
>>> CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
>>> mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it had
>>> 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does not go
>>> down. Number of sstables at L0  is over 11K and it is slowly slowly building
>>> upper levels. Total number of sstables is 4x the normal amount. Now I am not
>>> entirely sure if this node will ever get back to normal life. And believe me
>>> - this is not because of I/O, I have SSDs everywhere and 16 physical cores.
>>> This machine is barely using 1-3 cores at most of the time. The problem is
>>> that allowing STCS fallback is not a good option either - it will quickly
>>> result in a few 200Gb+ sstables in my configuration and then these sstables
>>> will never be compacted. Plus, it will require close to 2x disk space on
>>> EVERY disk in my JBOD configuration...this will kill the node sooner or
>>> later. This is all because all sstables after bootstrap end at L0 and then
>>> the process slowly slowly moves them to other levels. If you have write
>>> traffic to that CF then the number of sstables and L0 will grow quickly -
>>> like it happens in my case now.
>>>
>>> Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301
>>> is implemented it may be better.
>>>
>>>
>>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov 
>>> wrote:
>>>>
>>>> Stephane,
>>>>
>>>> We are having a somewhat similar C* load profile. Hence some comments
>>>> in addition Nikolai's answer.
>>>> 1. Fallback to STCS - you can disable it actually
>>>> 2. Based on our experience, if you have a lot of data per node, LCS
>>>> may work just fine. That is, till the moment you decide to join
>>>> another node - chances are that the newly added node will not be able
>>>> to compact what it gets from old nodes. In your case, if you switch
>>>> strategy the same thing may happen. This is all due to limitations
>>>> mentioned by Nikolai.
>>>>
>>>> Andrei,
>>>>
>>>>
>>>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. 
>>>> wrote:
>>>> > ABUSE
>>>> >
>>>> >
>>>> >
>>>> > YA NO QUIERO MAS MAILS SOY DE MEXICO
>>>> >
>>>> >
>>>> >
>>>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
>>>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
>>>> > Para: user@cassandra.apache.org
>>>> > Asunto: Re: Compaction Strategy guidance
>>>> > Importancia: Alta
>>>> >
>>>> >
>>>> >
>>>> > Stephane,
>>>> >
>>>> > As everything good, LCS comes at certain price.
>>>> >
>>>> > LCS will put most load on you I/O system (if you use spindles - you
>>>> > may need
>>>> > to b

Re: Compaction Strategy guidance

2014-11-23 Thread Jean-Armel Luce
Hi Nikolai,

Please could you clarify a little bit what you call "a large amount of
data" ?

How many tables ?
How many rows in your largest table ?
How many GB in your largest table ?
How many GB per node ?

Thanks.



2014-11-24 8:27 GMT+01:00 Jean-Armel Luce :

> Hi Nikolai,
>
> Thanks for those informations.
>
> Please could you clarify a little bit what you call "
>
> 2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev :
>
>> Just to clarify - when I was talking about the large amount of data I
>> really meant large amount of data per node in a single CF (table). LCS does
>> not seem to like it when it gets thousands of sstables (makes 4-5 levels).
>>
>> When bootstraping a new node you'd better enable that option from
>> CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
>> mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it
>> had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does
>> not go down. Number of sstables at L0  is over 11K and it is slowly slowly
>> building upper levels. Total number of sstables is 4x the normal amount.
>> Now I am not entirely sure if this node will ever get back to normal life.
>> And believe me - this is not because of I/O, I have SSDs everywhere and 16
>> physical cores. This machine is barely using 1-3 cores at most of the time.
>> The problem is that allowing STCS fallback is not a good option either - it
>> will quickly result in a few 200Gb+ sstables in my configuration and then
>> these sstables will never be compacted. Plus, it will require close to 2x
>> disk space on EVERY disk in my JBOD configuration...this will kill the node
>> sooner or later. This is all because all sstables after bootstrap end at L0
>> and then the process slowly slowly moves them to other levels. If you have
>> write traffic to that CF then the number of sstables and L0 will grow
>> quickly - like it happens in my case now.
>>
>> Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301
>> is implemented it may be better.
>>
>>
>> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov 
>> wrote:
>>
>>> Stephane,
>>>
>>> We are having a somewhat similar C* load profile. Hence some comments
>>> in addition Nikolai's answer.
>>> 1. Fallback to STCS - you can disable it actually
>>> 2. Based on our experience, if you have a lot of data per node, LCS
>>> may work just fine. That is, till the moment you decide to join
>>> another node - chances are that the newly added node will not be able
>>> to compact what it gets from old nodes. In your case, if you switch
>>> strategy the same thing may happen. This is all due to limitations
>>> mentioned by Nikolai.
>>>
>>> Andrei,
>>>
>>>
>>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. 
>>> wrote:
>>> > ABUSE
>>> >
>>> >
>>> >
>>> > YA NO QUIERO MAS MAILS SOY DE MEXICO
>>> >
>>> >
>>> >
>>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
>>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
>>> > Para: user@cassandra.apache.org
>>> > Asunto: Re: Compaction Strategy guidance
>>> > Importancia: Alta
>>> >
>>> >
>>> >
>>> > Stephane,
>>> >
>>> > As everything good, LCS comes at certain price.
>>> >
>>> > LCS will put most load on you I/O system (if you use spindles - you
>>> may need
>>> > to be careful about that) and on CPU. Also LCS (by default) may fall
>>> back to
>>> > STCS if it is falling behind (which is very possible with heavy writing
>>> > activity) and this will result in higher disk space usage. Also LCS has
>>> > certain limitation I have discovered lately. Sometimes LCS may not be
>>> able
>>> > to use all your node's resources (algorithm limitations) and this
>>> reduces
>>> > the overall compaction throughput. This may happen if you have a large
>>> > column family with lots of data per node. STCS won't have this
>>> limitation.
>>> >
>>> >
>>> >
>>> > By the way, the primary goal of LCS is to reduce the number of
>>> sstables C*
>>> > has to look at to find your data. With LCS properly functioning this
>>> number
>>> > will be most likely between something like 1 and 3 for most of the
>>> reads.
>>> > But if you do few reads and not concerned about the latency today, most
>>> > likely LCS may only save you some disk space.
>>> >
>>> >
>>> >
>>> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay 
>>> > wrote:
>>> >
>>> > Hi there,
>>> >
>>> >
>>> >
>>> > use case:
>>> >
>>> >
>>> >
>>> > - Heavy write app, few reads.
>>> >
>>> > - Lots of updates of rows / columns.
>>> >
>>> > - Current performance is fine, for both writes and reads..
>>> >
>>> > - Currently using SizedCompactionStrategy
>>> >
>>> >
>>> >
>>> > We're trying to limit the amount of storage used during compaction.
>>> Should
>>> > we switch to LeveledCompactionStrategy?
>>> >
>>> >
>>> >
>>> > Thanks
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > Nikolai Grigoriev
>>> > (514) 772-5178
>>>
>>
>>
>>
>> --
>> Nikolai Grigoriev
>> (514) 772-5178
>>
>
>


Re: Compaction Strategy guidance

2014-11-23 Thread Jean-Armel Luce
Hi Nikolai,

Thanks for those informations.

Please could you clarify a little bit what you call "

2014-11-24 4:37 GMT+01:00 Nikolai Grigoriev :

> Just to clarify - when I was talking about the large amount of data I
> really meant large amount of data per node in a single CF (table). LCS does
> not seem to like it when it gets thousands of sstables (makes 4-5 levels).
>
> When bootstraping a new node you'd better enable that option from
> CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
> mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it
> had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does
> not go down. Number of sstables at L0  is over 11K and it is slowly slowly
> building upper levels. Total number of sstables is 4x the normal amount.
> Now I am not entirely sure if this node will ever get back to normal life.
> And believe me - this is not because of I/O, I have SSDs everywhere and 16
> physical cores. This machine is barely using 1-3 cores at most of the time.
> The problem is that allowing STCS fallback is not a good option either - it
> will quickly result in a few 200Gb+ sstables in my configuration and then
> these sstables will never be compacted. Plus, it will require close to 2x
> disk space on EVERY disk in my JBOD configuration...this will kill the node
> sooner or later. This is all because all sstables after bootstrap end at L0
> and then the process slowly slowly moves them to other levels. If you have
> write traffic to that CF then the number of sstables and L0 will grow
> quickly - like it happens in my case now.
>
> Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301
> is implemented it may be better.
>
>
> On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov 
> wrote:
>
>> Stephane,
>>
>> We are having a somewhat similar C* load profile. Hence some comments
>> in addition Nikolai's answer.
>> 1. Fallback to STCS - you can disable it actually
>> 2. Based on our experience, if you have a lot of data per node, LCS
>> may work just fine. That is, till the moment you decide to join
>> another node - chances are that the newly added node will not be able
>> to compact what it gets from old nodes. In your case, if you switch
>> strategy the same thing may happen. This is all due to limitations
>> mentioned by Nikolai.
>>
>> Andrei,
>>
>>
>> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. 
>> wrote:
>> > ABUSE
>> >
>> >
>> >
>> > YA NO QUIERO MAS MAILS SOY DE MEXICO
>> >
>> >
>> >
>> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
>> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
>> > Para: user@cassandra.apache.org
>> > Asunto: Re: Compaction Strategy guidance
>> > Importancia: Alta
>> >
>> >
>> >
>> > Stephane,
>> >
>> > As everything good, LCS comes at certain price.
>> >
>> > LCS will put most load on you I/O system (if you use spindles - you may
>> need
>> > to be careful about that) and on CPU. Also LCS (by default) may fall
>> back to
>> > STCS if it is falling behind (which is very possible with heavy writing
>> > activity) and this will result in higher disk space usage. Also LCS has
>> > certain limitation I have discovered lately. Sometimes LCS may not be
>> able
>> > to use all your node's resources (algorithm limitations) and this
>> reduces
>> > the overall compaction throughput. This may happen if you have a large
>> > column family with lots of data per node. STCS won't have this
>> limitation.
>> >
>> >
>> >
>> > By the way, the primary goal of LCS is to reduce the number of sstables
>> C*
>> > has to look at to find your data. With LCS properly functioning this
>> number
>> > will be most likely between something like 1 and 3 for most of the
>> reads.
>> > But if you do few reads and not concerned about the latency today, most
>> > likely LCS may only save you some disk space.
>> >
>> >
>> >
>> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay 
>> > wrote:
>> >
>> > Hi there,
>> >
>> >
>> >
>> > use case:
>> >
>> >
>> >
>> > - Heavy write app, few reads.
>> >
>> > - Lots of updates of rows / columns.
>> >
>> > - Current performance is fine, for both writes and reads..
>> >
>> > - Currently using SizedCompactionStrategy
>> >
>> >
>> >
>> > We're trying to limit the amount of storage used during compaction.
>> Should
>> > we switch to LeveledCompactionStrategy?
>> >
>> >
>> >
>> > Thanks
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Nikolai Grigoriev
>> > (514) 772-5178
>>
>
>
>
> --
> Nikolai Grigoriev
> (514) 772-5178
>


Re: Compaction Strategy guidance

2014-11-23 Thread Nikolai Grigoriev
Just to clarify - when I was talking about the large amount of data I
really meant large amount of data per node in a single CF (table). LCS does
not seem to like it when it gets thousands of sstables (makes 4-5 levels).

When bootstraping a new node you'd better enable that option from
CASSANDRA-6621 (the one that disables STCS in L0). But it will still be a
mess - I have a node that I have bootstrapped ~2 weeks ago. Initially it
had 7,5K pending compactions, now it has almost stabilized ad 4,6K. Does
not go down. Number of sstables at L0  is over 11K and it is slowly slowly
building upper levels. Total number of sstables is 4x the normal amount.
Now I am not entirely sure if this node will ever get back to normal life.
And believe me - this is not because of I/O, I have SSDs everywhere and 16
physical cores. This machine is barely using 1-3 cores at most of the time.
The problem is that allowing STCS fallback is not a good option either - it
will quickly result in a few 200Gb+ sstables in my configuration and then
these sstables will never be compacted. Plus, it will require close to 2x
disk space on EVERY disk in my JBOD configuration...this will kill the node
sooner or later. This is all because all sstables after bootstrap end at L0
and then the process slowly slowly moves them to other levels. If you have
write traffic to that CF then the number of sstables and L0 will grow
quickly - like it happens in my case now.

Once something like https://issues.apache.org/jira/browse/CASSANDRA-8301 is
implemented it may be better.


On Sun, Nov 23, 2014 at 4:53 AM, Andrei Ivanov  wrote:

> Stephane,
>
> We are having a somewhat similar C* load profile. Hence some comments
> in addition Nikolai's answer.
> 1. Fallback to STCS - you can disable it actually
> 2. Based on our experience, if you have a lot of data per node, LCS
> may work just fine. That is, till the moment you decide to join
> another node - chances are that the newly added node will not be able
> to compact what it gets from old nodes. In your case, if you switch
> strategy the same thing may happen. This is all due to limitations
> mentioned by Nikolai.
>
> Andrei,
>
>
> On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G. 
> wrote:
> > ABUSE
> >
> >
> >
> > YA NO QUIERO MAS MAILS SOY DE MEXICO
> >
> >
> >
> > De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
> > Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
> > Para: user@cassandra.apache.org
> > Asunto: Re: Compaction Strategy guidance
> > Importancia: Alta
> >
> >
> >
> > Stephane,
> >
> > As everything good, LCS comes at certain price.
> >
> > LCS will put most load on you I/O system (if you use spindles - you may
> need
> > to be careful about that) and on CPU. Also LCS (by default) may fall
> back to
> > STCS if it is falling behind (which is very possible with heavy writing
> > activity) and this will result in higher disk space usage. Also LCS has
> > certain limitation I have discovered lately. Sometimes LCS may not be
> able
> > to use all your node's resources (algorithm limitations) and this reduces
> > the overall compaction throughput. This may happen if you have a large
> > column family with lots of data per node. STCS won't have this
> limitation.
> >
> >
> >
> > By the way, the primary goal of LCS is to reduce the number of sstables
> C*
> > has to look at to find your data. With LCS properly functioning this
> number
> > will be most likely between something like 1 and 3 for most of the reads.
> > But if you do few reads and not concerned about the latency today, most
> > likely LCS may only save you some disk space.
> >
> >
> >
> > On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay 
> > wrote:
> >
> > Hi there,
> >
> >
> >
> > use case:
> >
> >
> >
> > - Heavy write app, few reads.
> >
> > - Lots of updates of rows / columns.
> >
> > - Current performance is fine, for both writes and reads..
> >
> > - Currently using SizedCompactionStrategy
> >
> >
> >
> > We're trying to limit the amount of storage used during compaction.
> Should
> > we switch to LeveledCompactionStrategy?
> >
> >
> >
> > Thanks
> >
> >
> >
> >
> > --
> >
> > Nikolai Grigoriev
> > (514) 772-5178
>



-- 
Nikolai Grigoriev
(514) 772-5178


Re: Compaction Strategy guidance

2014-11-23 Thread Andrei Ivanov
Stephane,

We are having a somewhat similar C* load profile. Hence some comments
in addition Nikolai's answer.
1. Fallback to STCS - you can disable it actually
2. Based on our experience, if you have a lot of data per node, LCS
may work just fine. That is, till the moment you decide to join
another node - chances are that the newly added node will not be able
to compact what it gets from old nodes. In your case, if you switch
strategy the same thing may happen. This is all due to limitations
mentioned by Nikolai.

Andrei,


On Sun, Nov 23, 2014 at 8:51 AM, Servando Muñoz G.  wrote:
> ABUSE
>
>
>
> YA NO QUIERO MAS MAILS SOY DE MEXICO
>
>
>
> De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com]
> Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
> Para: user@cassandra.apache.org
> Asunto: Re: Compaction Strategy guidance
> Importancia: Alta
>
>
>
> Stephane,
>
> As everything good, LCS comes at certain price.
>
> LCS will put most load on you I/O system (if you use spindles - you may need
> to be careful about that) and on CPU. Also LCS (by default) may fall back to
> STCS if it is falling behind (which is very possible with heavy writing
> activity) and this will result in higher disk space usage. Also LCS has
> certain limitation I have discovered lately. Sometimes LCS may not be able
> to use all your node's resources (algorithm limitations) and this reduces
> the overall compaction throughput. This may happen if you have a large
> column family with lots of data per node. STCS won't have this limitation.
>
>
>
> By the way, the primary goal of LCS is to reduce the number of sstables C*
> has to look at to find your data. With LCS properly functioning this number
> will be most likely between something like 1 and 3 for most of the reads.
> But if you do few reads and not concerned about the latency today, most
> likely LCS may only save you some disk space.
>
>
>
> On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay 
> wrote:
>
> Hi there,
>
>
>
> use case:
>
>
>
> - Heavy write app, few reads.
>
> - Lots of updates of rows / columns.
>
> - Current performance is fine, for both writes and reads..
>
> - Currently using SizedCompactionStrategy
>
>
>
> We're trying to limit the amount of storage used during compaction. Should
> we switch to LeveledCompactionStrategy?
>
>
>
> Thanks
>
>
>
>
> --
>
> Nikolai Grigoriev
> (514) 772-5178


RE: Compaction Strategy guidance

2014-11-22 Thread Servando Muñoz G .
ABUSE

 

YA NO QUIERO MAS MAILS SOY DE MEXICO

 

De: Nikolai Grigoriev [mailto:ngrigor...@gmail.com] 
Enviado el: sábado, 22 de noviembre de 2014 07:13 p. m.
Para: user@cassandra.apache.org
Asunto: Re: Compaction Strategy guidance
Importancia: Alta

 

Stephane,

As everything good, LCS comes at certain price.

LCS will put most load on you I/O system (if you use spindles - you may need to 
be careful about that) and on CPU. Also LCS (by default) may fall back to STCS 
if it is falling behind (which is very possible with heavy writing activity) 
and this will result in higher disk space usage. Also LCS has certain 
limitation I have discovered lately. Sometimes LCS may not be able to use all 
your node's resources (algorithm limitations) and this reduces the overall 
compaction throughput. This may happen if you have a large column family with 
lots of data per node. STCS won't have this limitation.

 

By the way, the primary goal of LCS is to reduce the number of sstables C* has 
to look at to find your data. With LCS properly functioning this number will be 
most likely between something like 1 and 3 for most of the reads. But if you do 
few reads and not concerned about the latency today, most likely LCS may only 
save you some disk space.

 

On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay  wrote:

Hi there,

 

use case:

 

- Heavy write app, few reads.

- Lots of updates of rows / columns.

- Current performance is fine, for both writes and reads..

- Currently using SizedCompactionStrategy

 

We're trying to limit the amount of storage used during compaction. Should we 
switch to LeveledCompactionStrategy? 

 

Thanks




-- 

Nikolai Grigoriev
(514) 772-5178



Re: Compaction Strategy guidance

2014-11-22 Thread Nikolai Grigoriev
Stephane,

As everything good, LCS comes at certain price.

LCS will put most load on you I/O system (if you use spindles - you may
need to be careful about that) and on CPU. Also LCS (by default) may fall
back to STCS if it is falling behind (which is very possible with heavy
writing activity) and this will result in higher disk space usage. Also LCS
has certain limitation I have discovered lately. Sometimes LCS may not be
able to use all your node's resources (algorithm limitations) and this
reduces the overall compaction throughput. This may happen if you have a
large column family with lots of data per node. STCS won't have this
limitation.

By the way, the primary goal of LCS is to reduce the number of sstables C*
has to look at to find your data. With LCS properly functioning this number
will be most likely between something like 1 and 3 for most of the reads.
But if you do few reads and not concerned about the latency today, most
likely LCS may only save you some disk space.

On Sat, Nov 22, 2014 at 6:25 PM, Stephane Legay 
wrote:

> Hi there,
>
> use case:
>
> - Heavy write app, few reads.
> - Lots of updates of rows / columns.
> - Current performance is fine, for both writes and reads..
> - Currently using SizedCompactionStrategy
>
> We're trying to limit the amount of storage used during compaction. Should
> we switch to LeveledCompactionStrategy?
>
> Thanks
>



-- 
Nikolai Grigoriev
(514) 772-5178


Compaction Strategy guidance

2014-11-22 Thread Stephane Legay
Hi there,

use case:

- Heavy write app, few reads.
- Lots of updates of rows / columns.
- Current performance is fine, for both writes and reads..
- Currently using SizedCompactionStrategy

We're trying to limit the amount of storage used during compaction. Should
we switch to LeveledCompactionStrategy?

Thanks