Re: Question about compaction strategy changes

2016-10-24 Thread kurt Greaves
On 24 October 2016 at 18:11, Seth Edwards  wrote:

> The other thought is that we currently have data mixed in that does not
> have a TTL and we are strongly considering putting this data in it's own
> table.


You should definitely do that. Having non-TTL'd data mixed in will result
in SSTables that don't expire because some small portion may be live data.
Plus mixed with the small number of compaction candidates, it could take a
long time for these types of SSTables to be compacted (possibly never).

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Question about compaction strategy changes

2016-10-24 Thread Seth Edwards
Thanks Jeff. We've been trying to find the optimal setting for our TWCS.
It's just two tables with only one of the tables being a factor. Initially
we set the window to an hour, and then increased it to a day. It still
seemed that there were lots of small sstables on disk. dozens of small db
files that were maybe only a few megabytes. These were all the most recent
sstables in the data directory. As we've increased the window size and the
tombstone_threshold we've seen the size of the newest db files on disk to
now be larger, as we would expect.

The total size of the table in question is between 500GB and 550GB on each
node. At certain intervals it seems that all nodes begin a cycle of
compactions and the number of pending tasks goes up. During this period we
can see the compactions use up maybe 100 or 200GB, sometimes more, and then
when everything finished, we gain most of that disk space back. We usually
have over 500GB free but it can trickle down to only 150GB free. I assume
solving this is about finding the optimal TWCS settings for our TTL data.

The other thought is that we currently have data mixed in that does not
have a TTL and we are strongly considering putting this data in it's own
table.

On Mon, Oct 24, 2016 at 6:38 AM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
wrote:

>
>
> If you drop window size, you may force some window-major compactions (if
> you go from 1 week windows to 1 day windows, you’ll have 6 days worth of
> files start compacting into 1-day sstables).
>
> If you increase window size, you’ll likely have adjacent windows join (if
> you go from 1 day windows to 2 day windows, nearly every sstable will be
> joined with the one in the day adjacent to it).
>
>
>
> Short of altering compaction strategies, it seems unlikely that you’d see
> huge jumps where you’d run out of space. How many tables/CFs have TWCS
> enabled? How much space are you using, and how much is free?  Do you have
> hundreds with the same TWCS parameters?
>
>
>
> If you’re running very close to your capacity, you may want to consider
> dropping concurrent compactors down so fewer compaction tasks run at the
> same time. That will translate proportionally to the amount of extra disk
> you have consumed by compaction in a TWCS setting.
>
>
>
>
>
>
>
> *From: *Seth Edwards <s...@pubnub.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Sunday, October 23, 2016 at 7:03 PM
> *To: *user <user@cassandra.apache.org>
> *Subject: *Re: Question about compaction strategy changes
>
>
>
> More compactions meaning "rows to be compacted" or actual number of
> pending compactions? I assumed when I run nodetool compactionstats the
> number of pending tasks would line up with number of sstables that will be
> compacted. Most of the time this is idle, then we hit spots when it could
> jump into the thousands and we and up being short of a few hundred GB of
> disk space.
>
>
>
> On Sun, Oct 23, 2016 at 5:49 PM, kurt Greaves <k...@instaclustr.com>
> wrote:
>
>
>
> On 22 October 2016 at 03:37, Seth Edwards <s...@pubnub.com> wrote:
>
> We're using TWCS and we notice that if we make changes to the options to
> the window unit or size, it seems to implicitly start recompacting all
> sstables.
>
>
>
> If you increase the window unit or size you potentially increase the
> number of SSTable candidates for compaction inside each window, which is
> why you would see more compactions. If you decrease the window you
> shouldn't see any new compactions kicked off, however be aware that you
> will have SSTables covering multiple windows, so until a full cycle of your
> TTL passes your read queries won't benefit from the smaller window size.
>
>
> Kurt Greaves
>
> k...@instaclustr.com
>
> www.instaclustr.com
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.instaclustr.com=DQMFaQ=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow=bT5rVUkGNycBRCPaF4XuTwYmPMNlu83RBkGLXPp7up4=GZ6bHFwxWbRnT6rYMPaZStcQTVz0xDq9HNmMMPDjZ9U=>
>
>
> CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and
> may be legally privileged. If you are not the intended recipient, do not
> disclose, copy, distribute, or use this email or any attachments. If you
> have received this in error please let the sender know and then delete the
> email and all attachments.
>


Re: Question about compaction strategy changes

2016-10-24 Thread Jeff Jirsa
 

If you drop window size, you may force some window-major compactions (if you go 
from 1 week windows to 1 day windows, you’ll have 6 days worth of files start 
compacting into 1-day sstables).

If you increase window size, you’ll likely have adjacent windows join (if you 
go from 1 day windows to 2 day windows, nearly every sstable will be joined 
with the one in the day adjacent to it).

 

Short of altering compaction strategies, it seems unlikely that you’d see huge 
jumps where you’d run out of space. How many tables/CFs have TWCS enabled? How 
much space are you using, and how much is free?  Do you have hundreds with the 
same TWCS parameters? 

 

If you’re running very close to your capacity, you may want to consider 
dropping concurrent compactors down so fewer compaction tasks run at the same 
time. That will translate proportionally to the amount of extra disk you have 
consumed by compaction in a TWCS setting. 

 

 

 

From: Seth Edwards <s...@pubnub.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Sunday, October 23, 2016 at 7:03 PM
To: user <user@cassandra.apache.org>
Subject: Re: Question about compaction strategy changes

 

More compactions meaning "rows to be compacted" or actual number of pending 
compactions? I assumed when I run nodetool compactionstats the number of 
pending tasks would line up with number of sstables that will be compacted. 
Most of the time this is idle, then we hit spots when it could jump into the 
thousands and we and up being short of a few hundred GB of disk space. 

 

On Sun, Oct 23, 2016 at 5:49 PM, kurt Greaves <k...@instaclustr.com> wrote:

 

On 22 October 2016 at 03:37, Seth Edwards <s...@pubnub.com> wrote:

We're using TWCS and we notice that if we make changes to the options to the 
window unit or size, it seems to implicitly start recompacting all sstables.

 

If you increase the window unit or size you potentially increase the number of 
SSTable candidates for compaction inside each window, which is why you would 
see more compactions. If you decrease the window you shouldn't see any new 
compactions kicked off, however be aware that you will have SSTables covering 
multiple windows, so until a full cycle of your TTL passes your read queries 
won't benefit from the smaller window size.


Kurt Greaves 

k...@instaclustr.com

www.instaclustr.com

 

CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and may 
be legally privileged. If you are not the intended recipient, do not disclose, 
copy, distribute, or use this email or any attachments. If you have received 
this in error please let the sender know and then delete the email and all 
attachments.


smime.p7s
Description: S/MIME cryptographic signature


Re: Question about compaction strategy changes

2016-10-23 Thread kurt Greaves
​More compactions meaning "actual number of compaction tasks". A compaction
task generally operates on many SSTables (how many depends on the chosen
compaction strategy). The number of pending tasks does not line up with the
number of SSTables that will be compacted. 1 task may compact many SSTables.
If your pending tasks are jumping "into the thousands" you're quite
possibly flushing data from memtables faster than you can compact them.
Ideally your pending compactions shouldn't really go above 10 (or 5 even),
and if they are you're possibly overloading the cluster.


Re: Question about compaction strategy changes

2016-10-23 Thread Seth Edwards
More compactions meaning "rows to be compacted" or actual number of pending
compactions? I assumed when I run nodetool compactionstats the number of
pending tasks would line up with number of sstables that will be compacted.
Most of the time this is idle, then we hit spots when it could jump into
the thousands and we and up being short of a few hundred GB of disk space.

On Sun, Oct 23, 2016 at 5:49 PM, kurt Greaves  wrote:

>
> On 22 October 2016 at 03:37, Seth Edwards  wrote:
>
>> We're using TWCS and we notice that if we make changes to the options to
>> the window unit or size, it seems to implicitly start recompacting all
>> sstables.
>
>
> If you increase the window unit or size you potentially increase the
> number of SSTable candidates for compaction inside each window, which is
> why you would see more compactions. If you decrease the window you
> shouldn't see any new compactions kicked off, however be aware that you
> will have SSTables covering multiple windows, so until a full cycle of your
> TTL passes your read queries won't benefit from the smaller window size.
>
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com
>


Re: Question about compaction strategy changes

2016-10-23 Thread kurt Greaves
On 22 October 2016 at 03:37, Seth Edwards  wrote:

> We're using TWCS and we notice that if we make changes to the options to
> the window unit or size, it seems to implicitly start recompacting all
> sstables.


If you increase the window unit or size you potentially increase the number
of SSTable candidates for compaction inside each window, which is why you
would see more compactions. If you decrease the window you shouldn't see
any new compactions kicked off, however be aware that you will have
SSTables covering multiple windows, so until a full cycle of your TTL
passes your read queries won't benefit from the smaller window size.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Question about compaction strategy changes

2016-10-21 Thread Seth Edwards
Hello! We're using TWCS and we notice that if we make changes to the
options to the window unit or size, it seems to implicitly start
recompacting all sstables. Is this indeed the case and more importantly,
does the same happen if we were to adjust the gr_grace_seconds for this
table?


Thanks!