[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2023-07-26 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747681#comment-17747681
 ] 

Jon Haddad commented on CASSANDRA-8460:
---

{quote}
Tiering to spinning disk or cheaper block devices is fine. It's a win. It's 
easy to reason about - probably just implement it via compaction and all the 
read and write path stay exactly the same.

But I think the industry trends would suggest this is suboptimal - moving this 
to a fast object store (e.g. s3) would be even better. It's lower cost / higher 
durability, and it allows for other things "eventually", like sharing one 
sstable between replicas (or eventually erasure encoding pieces of data).

That turns this ticket from ~easy to ~hard, because you also have to touch the 
read path (or, more likely, change / add a new sstablereader that can read from 
object storage, and then figure out how you want to upload to object storage).

So "is there interest", probably, but in an s3 version of this feature, vs 
spinning disk.
{quote}

Tiering with object store is a lot more interesting and useful to me as well.  
I know many teams that would make use of this, and could dramatically reduce 
cost depending on the size of the active dataset.  



> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Normal
>  Labels: doc-impacting, dtcs
> Fix For: 5.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2023-05-15 Thread Jeff Jirsa (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722803#comment-17722803
 ] 

Jeff Jirsa commented on CASSANDRA-8460:
---

I think a lot of people would still find it useful, however, I think since 
2014, the way most people think about storage has changed.

 

Tiering to spinning disk or cheaper block devices is fine. It's a win. It's 
easy to reason about - probably just implement it via compaction and all the 
read and write path stay exactly the same.

 

But I think the industry trends would suggest this is suboptimal - moving this 
to a fast object store (e.g. s3) would be even better. It's lower cost / higher 
durability, and it allows for other things "eventually", like sharing one 
sstable between replicas (or eventually erasure encoding pieces of data).

 

That turns this ticket from ~easy to ~hard, because you also have to touch the 
read path (or, more likely, change / add a new sstablereader that can read from 
object storage, and then figure out how you want to upload to object storage).

 

So "is there interest", probably, but in an s3 version of this feature, vs 
spinning disk.

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Normal
>  Labels: doc-impacting, dtcs
> Fix For: 5.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2023-05-15 Thread Claude Warren (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722765#comment-17722765
 ] 

Claude Warren commented on CASSANDRA-8460:
--

Is there still interest in moving this concept forward?  I am interested in 
exploring this option.

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Compaction
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Normal
>  Labels: doc-impacting, dtcs
> Fix For: 5.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-04-18 Thread Lerh Chuan Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443412#comment-16443412
 ] 

Lerh Chuan Low commented on CASSANDRA-8460:
---

Hi [~rustyrazorblade],

Sorry for the delay - usually a little bit tricky to get the setup right and 
also had a few hiccups with trunk. Over the last few weeks we've done 
benchmarking on AWS using 4 different 3 node clusters (each with their own 
dedicated stress box) - a LVM setup, a HDD setup, my code setup, and a SSD 
setup. 

The details are in here: 
[https://docs.google.com/document/d/164qZ3zpG5pm_j4r9yWccmMiZh6XK4LsqnZBP7Iu3gII/edit#|https://docs.google.com/document/d/164qZ3zpG5pm_j4r9yWccmMiZh6XK4LsqnZBP7Iu3gII/edit]

It details the way I've stressed and the way I've setup LVM as also a write 
through. We've done 5 takes and in those cases it doesn't seem like LVM 
performs very well even when compared to the HDD. That said, the archiving code 
is also not as good as SSD, I think it may be related to the partition being 
spread across both the slow and the fast. LVM I know the volume is being used 
(based on cloudwatch), I guess it's kind of unintuitive for me how it may be 
this case because it did really well in the fio benchmark you ran. Would you 
(or anyone, really) like to take a look and give your thoughts? Maybe if the 
test is skewed against LVM and we can tune it better? Much appreciated :) 

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-26 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414966#comment-16414966
 ] 

Ben Slater commented on CASSANDRA-8460:
---

OK. Talking to Lerh, his code his just about at the point where we can do some 
initial benchmarking so we'll run some tests to compare the two approaches and 
report what we get.

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-26 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414820#comment-16414820
 ] 

Jon Haddad commented on CASSANDRA-8460:
---

I'm not sure what else to tell you, [~slater_ben].  You just described a 
perfect use case for the Linux page cache, TWCS, and dm-cache.  I understand 
what you're trying to say - that somehow dm-cache won't cache the right data, 
and Cassandra will somehow do a better job than the kernel at understanding the 
data we need to keep hot, but so far my experience leads me to disagree with 
you that there would be an issue.  

For data that's recently been compacted & read, that data is going to be in 
your page cache.  For data that's been recently "major compacted" in the 
previous TWCS window, that will either be in page cache or the dm-cache.  After 
that, the data is just sitting around, so the access patterns will keep it 
either in cache or out, depending on when it's accessed.  

Ultimately what matters is reducing the hit miss on your disk to a minimum.  
You do that by keeping frequently accessed data in the cache.  Using a time 
element (recent) without factoring in hot spots will actually get your a 
_worse_ cache hit ratio, which will put more pressure on the slow disks, 
driving up seeks, making it harder to meet the SLA. 

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-26 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414793#comment-16414793
 ] 

Ben Slater commented on CASSANDRA-8460:
---

Fair enough - the example was a bit of an oversimplification even for how I 
would have guessed it work. Having read up a bit 
([https://www.redhat.com/en/blog/improving-read-performance-dm-cache)] and 
([https://www.kernel.org/doc/Documentation/device-mapper/cache-policies.txt)] I 
suspect we've actually got a bit of a different model of the use cases we are 
both imagining (and I haven't done a great job of describing what I have in 
mind).

Consider you're building an IOT application that collects sensor data and has 
some kind of UI for displaying readings. You want to be able to provide an 
experience for your users where accessing today's data (the most common use) is 
snappy while still providing the ability to go back in time a year but as it's 
not common it's fine for access to that data to be slower. 

In this scenario the recent data isn't "hot" in the sense that it is accessed 
many times (I'm not sure there is a well defined term for what it is - maybe 
"high priority" is better?) so it's hard for a caching algorithm (like smq) 
based on frequency of access to work effectively (in fact the first access is 
the one you want to be fast).

Does that make more sense as to where I'm coming from?

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-26 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414774#comment-16414774
 ] 

Jon Haddad commented on CASSANDRA-8460:
---

{quote}
With LVM (possibly depending on it's rules about how and when to cache - I 
admit I don't know a lot about tuning possibilities there) you could end up 
with issues like one of your users decides to do some analysis/extract a heap 
old data and ends up evicting the recent data from your cache and cause what 
you expected to be hot data to slow down. 
{quote}

Perhaps you should do some research about how lvmcache / dmcache actually works 
before making arguments against it?  What you described about the cache 
eviction is something dmcache was specifically designed to avoid with it's smq 
policy.  

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-26 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414766#comment-16414766
 ] 

Ben Slater commented on CASSANDRA-8460:
---

I'm not sure it's necessarily easier (because you now have two separate pools 
of disk to manage) but I think it is more predictable - your data will be 
always be on the fast disk until it reaches the age you specify. With LVM 
(possibly depending on it's rules about how and when to cache - I admit I don't 
know a lot about tuning possibilities there) you could end up with issues like 
one of your users decides to do some analysis/extract a heap old data and ends 
up evicting the recent data from your cache and cause what you expected to be 
hot data to slow down. 

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-26 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414748#comment-16414748
 ] 

Jon Haddad commented on CASSANDRA-8460:
---

{quote}
Thinking some more about this I think the other (and perhaps most important) 
advantage of implementing in Cassandra is predictability for operators. It's 
easy to say, for example, if I want data < 1 month old to be fast 1 need enough 
fast disk space for that and I know it will be consistently fast after that I 
need X disk space for the older data and I know it will be slower (and can even 
clearly tell users that). Trying to tune performance of the hot data (and avoid 
latency spikes) with with Cassandra + LVM sounds pretty hard.
{quote}

I don't see how having Cassandra manage this makes this easier.  With LVM you 
just set up the cache, and it keeps as much hot data in the cache as it can.  
Maybe you only want need a month's worth of data on your hot drive.  If you 
can't fit a month, lvm cache will manage that just fine because it's a cache 
for hot blocks.  If you can fit 6 months on the cache, it'll do that fine too.  
There isn't any need for configuration, it's literally designed to handle hot 
data and you don't need to guess when to tier data off to your cold storage 
layer.  



> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-26 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414719#comment-16414719
 ] 

Ben Slater commented on CASSANDRA-8460:
---

Think some more about this I think the other (and perhaps most important) 
advantage of implementing in Cassandra is predictability for operators. It's 
easy to say, for example, if I want data < 1 month old to be fast 1 need enough 
fast disk space for that and I know it will be consistently fast after that I 
need X disk space for the older data and I know it will be slower (and can even 
clearly tell users that). Trying to tune performance of the hot data (and avoid 
latency spikes) with with Cassandra + LVM sounds pretty hard.

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-26 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414714#comment-16414714
 ] 

Jon Haddad commented on CASSANDRA-8460:
---

{quote}
The requirement we're looking to target, as per the original JIRA, is people 
who have data that is hot for a short period but then they need to keep around 
for a long time with infrequent access (ie well defined rules on hot vs cold, 
not deciding what is hot based on what was recently read).

Typically when I've seen this requirement people want: 
1) The best possible performance for the hot data
2) Lowest cost of storage for the cold data

It seems to me that with LVM we're a not doing the best we could in terms of 
either of these.
{quote}

If you want the best possible read performance for hot data, there's not going 
to be a better option than the caching layer. Treating a disk as part of the 
Cassandra storage pool rather than a managed cache layer by the OS introduces 
the need for explicit configuration and the need to explicitly manage the free 
data. By this I mean you will need to keep some definition in the schema or 
code about when to keep things on the hot disk and when to move it off. My gut 
tells me this will result in an under utilized disk, mostly because the more 
efficient you get on the fast disk the greater the risk of failure. Imagine a 
large compaction happening on the hot disk - this patch will need to ensure it 
starts moving older data off to the slow drive which is going to block 
compactions from happening on the hot disk.  

Regarding the low cost, I agree with you, duplicating the data on a cache drive 
is going to cost more than the aggregate of the space of the two drives.

{quote}
For performance, there is the write-through slow down you mentioned, depending 
on where you draw the line on moving to slow disk vs the final TWCS compaction 
you might have compactions pushing data you want to be quick out of cache and 
if you used EBS for both the hot disk and the slow disk you are increasing 
usage of the EBS bandwidth to copy to and from cache (although using local SSD 
as the cache negates this last one).
{quote}

I'm not sure how much of a problem is in practice. Cassandra's sequential 
writes are going to avoid a lot of performance issues related to spinning 
disks. In my experience the biggest performance problem limiting compaction 
throughput is goign to be GC pauses, not the ability to write bytes to disk.

{quote}
In terms of cost, with LVM the fast disk is purely being used as cache rather 
than a primary store so you are having to duplicate that amount of data storage 
- whether that is significant probably depends on your desired ratio of fast to 
slow disk and how cost sensitive you are.
{quote}

Agreed. To me, the main benefit to having the fast disk involved is the ability 
to increase density significantly at very low cost. If you were to have a small 
SSD backed by 3-5TB of slow storage, that's a pretty good win in my opinion.

{quote}
Whether this downsides are worth the extra complexity is of course a matter of 
judgement rather than facts so happy to go with the community consensus here 
but thought I'd put in my POV.
{quote}

To be clear - I'm not shooting down the patch, or saying it's a bad idea. I 
think there's some interesting aspects to it with some valid use cases, I'd 
just like everyone to be aware of existing alternatives, as I didn't see anyone 
bring up lvmcache in the three years this ticket has existed.

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-26 Thread Ben Slater (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414547#comment-16414547
 ] 

Ben Slater commented on CASSANDRA-8460:
---

Hi John

I've been setting the requirements from our (Instaclustr) point of view for 
Lerh here so I thought I'd weigh in on why I'd rather see a Cassandra based 
solution than LVM.

The requirement we're looking to target, as per the original JIRA, is people 
who have data that is hot for a short period but then they need to keep around 
for a long time with infrequent access (ie well defined rules on hot vs cold, 
not deciding what is hot based on what was recently read).

Typically when I've seen this requirement people want: 
1) The best possible performance for the hot data
2) Lowest cost of storage for the cold data

It seems to me that with LVM we're a not doing the best we could in terms of 
either of these.

For performance, there is the write-through slow down you mentioned, depending 
on where you draw the line on moving to slow disk vs the final TWCS compaction 
you might have compactions pushing data you want to be quick out of cache and 
if you used EBS for both the hot disk and the slow disk you are increasing 
usage of the EBS bandwidth to copy to and from cache (although using local SSD 
as the cache negates this last one).

In terms of cost, with LVM the fast disk is purely being used as cache rather 
than a primary store so you are having to duplicate that amount of data storage 
- whether that is significant probably depends on your desired ratio of fast to 
slow disk and how cost sensitive you are.

Whether this downsides are worth the extra complexity is of course a matter of 
judgement rather than facts so happy to go with the community consensus here 
but thought I'd put in my POV.

 

Cheers

Ben

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-26 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414484#comment-16414484
 ] 

Jon Haddad commented on CASSANDRA-8460:
---

Hey [~Lerh Low], sorry for the delay.  I've been running some tests on lvm 
cache using fio for benchmarking rather than C*.  Cassandra adds a layer of 
complexity that won't help when it comes to raw benchmarks.

I ran some tests on EBS, SSD (i2.large), and EBS using SSD as a cache volume.  
I ran with this simple configuration to start:

{code}
[global]
size=10G
runtime=30m
directory=/bench/
bs=4k

[random-read]
rw=randread
numjobs=4

[sequential-write]
rw=write
{code}

||Metric||EBS||SSD||EBS + Cache||
|Random Read IOPS|1509|5748|5347|
|Random Read Bandwidth|6MB/s|22MB/s|21MB/s|
|Seq Write IOPS|40K|145K|39K|
|Seq Write Bandwidth|163MB/s|580MB/s|156MB/s|

I've set up the cache as writethrough, meaning we're going to be bottlenecked 
on the slow disk for writes.  Here's the setup:

{code}
root@ip-172-31-45-143:~# lvs -a
  LV  VG   Attr   LSize   PoolOrigin Data%  Meta%  
Move Log Cpy%Sync Convert
  [cache] test Cwi---C--- 700.00g7.25   0.55
0.00
  [cache_cdata]   test Cwi-ao 700.00g
  [cache_cmeta]   test ewi-ao  40.00g
  [lvol0_pmspare] test ewi---  40.00g
  origin  test Cwi-aoC---   1.50t [cache] [origin_corig] 7.25   0.55
0.00
  [origin_corig]  test owi-aoC---   1.50t
{code}

Generally speaking, TWCS uses considerably less I/O than any other strategy, 
and it works fine with spinning disks on EBS already, so I'm inclined to 
_personally_ lean towards using LVM.  It doesn't require any additional 
configuration once the volume is set up, and as I mentioned previously it's 
been baked into the Linux kernel for a long time now.  I haven't researched 
what's available on Windows, so that's something to keep in mind.

I'm not opposed to research, or new features, but this seems to be to me to be 
adding complexity to solve a problem that's already been solved.  

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-25 Thread Lerh Chuan Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16413369#comment-16413369
 ] 

Lerh Chuan Low commented on CASSANDRA-8460:
---

https://github.com/apache/cassandra/compare/trunk...juiceblender:cassandra-8460-single-csm

Bump! Asking for kind souls to review/have a look at what I have at the moment 
- the branch has now been updated enough (also with tests) that should reflect 
a few of the features I have decided on. I've also tried to make archiving as 
non-invasive as possible but given the code organization (and also thinking 
about it intuitively, it sort of makes sense) some parts of archiving had to be 
known to superclasses, such as {{CompactionTask}} or {{CompactionAwareWriter}} 
being aware there really are 2 different directories; one for the hot and one 
for the cold. 

Highlights: 
- New enumeration {{DirectoryType}}. Can be either {{STANDARD}} or {{ARCHIVE}}. 
- The decision on whether or not something should be archived is made in 
{{TimeWindowCompactionStrategy}}. The decisions can be:
* If somebody turns off archiving, candidates always gets put into standard
* If candidates are already in archive, put them back into archiving
* Otherwise, do the standard check: Are their age past the archiving time? 
- CSM is aware that there is such a thing as {{DirectoryType}}. It keeps a 
running CompactionStrategy instance in every single directory; both for archive 
and standard. 
- People can turn off archiving at any time by turning off the archiving flags 
in TWCS options. If they choose to do so, any archived SSTables, if compacted, 
will be moved back to the standard directories. Otherwise, they stay in 
archive. (Maybe I could write a nodetool to move archive back to standard)
- If people turn on archiving, when SSTables are next compacted they are moved 
to archive. 

Also included comments in various places to try to say what I am trying to do. 

Finally, if you don't have time to look through the code, please at least look 
through just this file: {{ArchivingCompactionTest}}. All the methods have long 
names describing a feature of this archiving compaction, and please let me know 
if you disagree with any of them. 

There's still a lot left - dtests, Scrubber + I don't know anything about how 
it works with Repair etc, it needs some functional testing. Potentially also 
separate compaction executor and metrics and concurrent compactors. 

Thanks!

Btw, any luck, Jon? I think I may look at writing some terraform scripts to 
spin up Cassandra on Debian, which may be useful for you. 

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-12 Thread Lerh Chuan Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16396263#comment-16396263
 ] 

Lerh Chuan Low commented on CASSANDRA-8460:
---

Read you loud and clear for stress. I favoured stress because it reports on 
operation and latency rates and I didn't want to dig through the code just yet 
on what metric exactly stress reports, rather just trusting it as it's the 
default cassandra ships with. I do have a custom Java class for doing inserts 
and reads (but it doesn't do much beyond that), let me know if you would like 
it...? I am also curious what metric you think would be an accurate measure, 
off the top of my head from the client side I can think of time from executing 
the query to when I receive the answer, but I'm not sure if I could make the 
case that the cached version is better compared to the uncached version just 
based off that (the alternative is dig through the stress code to look for 
more). 

I am not very familiar with the low level things in Linux so helping with FIO 
will be really appreciated (Doesn't help that my nodes actually run on CoreOS). 
I relied on Cloudwatch to verify that my cache is working. When I have time in 
the coming days I may write a python script to model TWCS (if you hadn't got to 
it then), I agree it should be modeled with TWCS which is why I was trying to 
make stress look like it. That said, it was interesting to see if it helped the 
other compaction strategies :) 

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-08 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392190#comment-16392190
 ] 

Jon Haddad commented on CASSANDRA-8460:
---

I'll be honest, whenever I need to do performance testing, the last thing I 
reach for is stress, because I can't wrap my head around configuring it right.  
It's probably easier to create a ~50 line program to do the inserts.  I'll try 
to throw something together tomorrow.

Ultimately if this is going to benefit TWCS, it *has* to be tested with it, so 
we might as well do that up front.

It's the end of the day, and I'm not an expert in setting up lvmcache, so I'll 
have to try it out tomorrow as well. 

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-08 Thread Lerh Chuan Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392035#comment-16392035
 ] 

Lerh Chuan Low commented on CASSANDRA-8460:
---

Sure. In the commands below {{/dev/md0}} is my RAID and {{/dev/xvdf}} is my SSD 
volume. 

{code}
sudo pvcreate /dev/md0 
sudo vgcreate VolGroupArray /dev/md0 /dev/xvdf 
sudo lvcreate -n SadOldCache -L 99900M VolGroupArray /dev/xvdf
sudo lvcreate -n SadOldCacheMeta -L 100M VolGroupArray /dev/xvdf
sudo lvconvert --type cache-pool --poolmetadata VolGroupArray/SadOldCacheMeta 
VolGroupArray/SadOldCache
sudo lvcreate -l 100%FREE -n RaidHDD VolGroupArray /dev/md0
sudo lvs -a vg
sudo lvs -a VolGroupArray
sudo lvconvert --type cache --cachepool VolGroupArray/SadOldCache 
VolGroupArray/RaidHDD
sudo vgchange -ay
{code}



> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-08 Thread Lerh Chuan Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392030#comment-16392030
 ] 

Lerh Chuan Low commented on CASSANDRA-8460:
---

On other thoughts, maybe it isn't ideal to bundle it together with compactions 
and make a totally new {{Archiving}} Operation type.

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-08 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392024#comment-16392024
 ] 

Jon Haddad commented on CASSANDRA-8460:
---

Would you mind sharing the commands you used to set up the LVM Cache?  It's 
pretty easy to accidentally set up the pool as a balance of the 2 drives rather 
than using the SSD as a cache.

Both implementations are going to work best with TWCS, so I think that testing 
the workload with TWCS using time series writes is going to be a lot more 
productive than random writes, as you've noted.

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-03-08 Thread Lerh Chuan Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392001#comment-16392001
 ] 

Lerh Chuan Low commented on CASSANDRA-8460:
---

[~rustyrazorblade], 

About {{disk_failure_policy}}, that I am not aware so I'll have to look into 
it. With my current patch up to where it is now, there are also other things 
like Scrubber, streaming which I have yet to get to. Thanks for the heads up! 
If such an implication is necessary then maybe we will have to enforce it in 
the code. 

About LVM Cache, I spent some time following the man page and trying it out 
with Cassandra stress. I had spun up a few EC2 clusters. They were all using a 
raid array of 800GB each; one was SSD backed, another was magnetic HDD backed, 
and the final one was magnetic HDD backed with 100GB of LVM writethrough cache. 
I inserted ~200GB of data using cassandra-stress, waited for compactions to 
finish and then attempted a mixed (random) workload...the LVM performed even 
worse than HDD. I guess this was to be expected because the cache works best 
for hot data that is frequently read. 

I did briefly attempt a mixed workload where the queries are always trying to 
select the same data as much as possible (so {{gaussian(1..500M, 25000, 
1000)}}), and there wasn't any noticeable difference between the LVM and HDD 
backed cluster. 

Not sure if you have used LVMCache with a workload before that worked out for 
you and you'd be willing to share details about it...? 

Just thinking about it further, the cache is also very slightly different than 
the original proposal. The cache duplicates the data; making Cassandra 
understand archiving does not. There's also a slight bonus at least from the 
scenario for AWS, the cache consumes the IOPS of the volumes due to the 
duplication (or amplifying) of read and writes back and from the cache.

Any thoughts? (And thank you for your input once again :)) My clusters are 
still running so happy to try a few configurations if you have any to suggest, 
for now I'm just going to refresh myself on the code and look into getting it 
more presentable if someone else swings by and is willing to give their 
thoughts. 

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-02-23 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16374864#comment-16374864
 ] 

Jon Haddad commented on CASSANDRA-8460:
---

Hey [~Lerh Low]!  

First off, let me thank you for being open to alternative ideas, especially 
after writing a large chunk of code.  Not everyone is willing to take a step 
back and consider other options, I really appreciate it.

{quote}
Maybe you have stumbled upon the case where data has been resurrected in JBOD 
configuration in your experiences...? In theory since splitting by token range 
there should be no more such cases. It is safe.
{quote}

I had actually misremembered how CASSANDRA-6696 was implemented.  Looking back 
at the code and testing it manually I see the memtables are flushed to their 
respective disks initially.  It's nice to be wrong about this.

There's quite a bit going on here, I did a quick search but didn't see anything 
related to disk failure policy.  One thing that's going to be a bit tricky is 
unless you have a 1:1 fast disk to archive disk relationship, you end up with 
some weird situations that can show up when using {{disk_failure_policy: 
best_effort}}, which is what CASSANDRA-6696 was all about in the first place.  
If you lose your fast disk, will you still be able to query data that's on the 
archive disk for a given token range?  

It seems to me that using this feature would have to imply 
{{disk_failure_policy: stop}}, since either the failure of the archive or one 
of the disks in {{data_file_directories}} would result in incorrect results 
being returned.

lvmcache uses 
[dm-cache|https://www.kernel.org/doc/Documentation/device-mapper/cache.txt] 
under the hood which keeps hot pages in memory.  It shipped in Linux kernel 
3.9, which was released in April 2013.  

Using lvmcache, if you were to create a logical volume per disk, with the SSD 
as your fast disk configured as a writethrough, you'd still honor the disk 
failure policy in the case of an archival or SSD failure, as well as have the 
flexibility of keeping any hot data readily available and not explicitly 
needing to move it off to another device when it's still active.  It adapts to 
your read and write patterns rather than requiring configuration.  Take a look 
at the [man page|http://man7.org/linux/man-pages/man7/lvmcache.7.html], it's 
pretty awesome.

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-02-22 Thread Lerh Chuan Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373922#comment-16373922
 ] 

Lerh Chuan Low commented on CASSANDRA-8460:
---

Hi Jon, 

Thanks for pitching in :)

Maybe you have stumbled upon the case where data has been resurrected in JBOD 
configuration in your experiences...? In theory since splitting by token range 
there should be no more such cases. It is safe. 

That said, I have not heard of lvmcache so I'll go and have a look at it. I do 
agree that this as it is introduces a lot of code branches and complexity and 
simple is a feature, which is why I was seeking feedback and becoming 
wary...this sounds good - readily available, works for every CS and doesn't 
introduce all that complexity. I'll test it. 

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-02-20 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370760#comment-16370760
 ] 

Jon Haddad commented on CASSANDRA-8460:
---

Taking a look at this ticket, I've got a concern, and I'd like to suggest an 
alternative.  

Juggling multiple disks has been a bit of a pain so far and still has some 
weird behavior.  We're a little better now that we split by token ranges, but 
there's still (IIRC) a point in time where the failure of a single disk can 
resurrect some data which had just been tombstoned.  If this is fixed, 
apologies, but I haven't seen it.  I'm not quite sure that adding complexity to 
this already long lasting pain point is going to help the project overall.

As an alternative, it's already possible to more or less get this behavior in a 
fashion that works with _every_ compaction strategy.  LVM (Linux only) is 
already ubiquitous.  Using lvmcache (backed by dmcache) already provides the 
ability to put your cold data on the slower spinning disks and leverage SSD for 
fast operations.  The benefit here is that you can keep a lot of your hot data 
on the fast drive and LVM will automatically handle making room for the newer 
files.  A second benefit is that you are not exposing yourself to the above 
mentioned issues with JBOD.  



> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-02-08 Thread Lerh Chuan Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357936#comment-16357936
 ] 

Lerh Chuan Low commented on CASSANDRA-8460:
---

Bump! 

The branch has the latest updates I have (I've decided to go with a single CSM: 
https://github.com/apache/cassandra/compare/trunk...juiceblender:cassandra-8460-single-csm).
 I'm currently working my way through the unit tests, the weird thing is when 
run in isolation they work, when run together they fail, as if it's not being 
cleaned up properly. 

So far all the tests should work (as far as I can tell compared to 3.11) and I 
still have yet to add some tests for the archiving compaction. There are 
definitely a lot more things that require checking and I also haven't gotten 
round to checking what happens when you turn it off, etc. Just trying to get a 
lot of the compaction infrastructure to be aware that an archive directory 
exists and there's existing logic to actually perform the archiving compaction. 
I've also yet to test whether it's able to pick up compactions in the archive 
directory when there legitly exists compactions to be done in that directory. 

As before, comments welcome on this is going down the right path or not/is 
there a better way to do it.

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Lerh Chuan Low
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-01-30 Thread Lerh Chuan Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346255#comment-16346255
 ] 

Lerh Chuan Low commented on CASSANDRA-8460:
---

I've tentatively started work on this, and it's turning out to be a relatively 
bigger code change than I was originally expecting, so would really love to get 
some feedback from the community who knows more (and review my initial 
patches). 

{{CompactionAwareWriter}}, {{DiskBoundaryManager}}, {{Directories}} and 
{{CompactionStrategyManager}} needs to know about archives. I've gone ahead and 
created a new Enumeration for `DirectoryType` that can be either ARCHIVE or 
STANDARD. 

{{CompactionAwareWriter}} always calls {{maybeSwitchWriter(Decorated Key)}} 
before calling {{realAppend}}. This is to handle the JBOD case, 
{{maybeSwitchWriter}} helps the writer write to the right location depending on 
the key to make sure keys do not overlap across directories. So it needs to 
have knowledge on which {{diskBoundaries}} it is actually using so as not to 
get into the situation where it can't differentiate between an actual archive 
disk and an actual JBOD disk. 

It would be wise to re-use the logic in {{diskBoundaries}} to also handle the 
case when the archive directory has been configured as JBOD, so 
{{DiskBoundaryManager}} now also needs to know about archive directories. When 
it tries to {{getWriteableLocations}} or generate disk boundaries, it should be 
able to differentiate between archive and non-archive. 

The same goes for {{CompactionStrategyManager}}. We still need to be able to 
run separate compaction strategy instances in the archive directory to handle 
the case of repairs and streaming (so archived SSTables don't just accumulate 
indefinitely). Here's where I am not sure which way to proceed forward. 

Option 1: 
Have it so that {{ColumnFamilyStore}} still only maintains one CSM and DBM and 
one {{Directories}}. CSM, DBM and {{Directories}} all start knowing about the 
existence of an archive directory; this can either be an extra field, or an 
EnumMap:

{code}
new EnumMap(Directories.DirectoryType.class){{
put(Directories.DirectoryType.STANDARD, 
cfs.getDiskBoundaries(Directories.DirectoryType.STANDARD));
put(Directories.DirectoryType.ARCHIVE, 
cfs.getDiskBoundaries(Directories.DirectoryType.ARCHIVE));
}}
{code}

The worry here for me is that some things may subtly break even as I fix up 
everything else that gets logged as errors...The CSM's own internal fields of 
{{repaired}}, {{unrepaired}} and {{pendingRepaired}} will also need to become 
maps, otherwise the individual instances will again become confused, being 
unable to differentiate between an actual JBOD disk or an archive disk. Some of 
the APIs, e.g reload, shutdown, enable etc will all need some smarts on which 
directory type is needed (in some cases it won't matter). Every consumer of 
these APIs will also need to be updated. 

Here's how it looks like in an initial go: 
https://github.com/apache/cassandra/compare/trunk...juiceblender:cassandra-8460?expand=1

Option 2:
Have it so that {{ColumnFamilyStore}} keeps 2 CSMs and 2 DBMs, of which the 
archiving equivalents are {{null}} if not applicable/reloaded. In this case 
there's a reasonable level of confidence that each CSM and BDM will just 'do 
the right thing', regardless whether it's an archive or not. In this case then 
every call to getting DBM or CSM (and there are a lot for getting CSM) will 
need to be evaluated and checked. 

Here's how it looks like in an initial go: 
https://github.com/apache/cassandra/compare/trunk...juiceblender:cassandra-8460-single-csm?expand=1

Both still have work on them (Scrubber, relocate SSTables, what happens when 
the archiving is turned off etc), but before I continue down the track, just 
wondering if anyone can point out which way is better/this is all misguided and 
, in the event this are the changes that need to happen (I can't seem to find a 
way for just TWCS to be aware that there's an archive directory, CFS needs to 
know as well), is this still worth the complexity introduced? 

[~pavel.trukhanov] Re "Why can't we simply allow a CS instance to spread across 
two disks - SSD
and corresponding archival HDD" -> I think in this case you're back in the 
situation where you can have data resurrected. You can have other replicas 
compact away tombstones (because the CS can see both directories) and then have 
your last remaining replica, before it manages to, get its SSD with the 
tombstone corrupted. Upon replacing the SSD with a new one and issuing repair, 
the tombstone is resurrected. Of course, this can be mitigated by making it 
clear to operators that every time there's a corrupt disk, every single disk 
needs to be replaced. 

Even if we did so, there will still be large code changes to make CSM 

[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-01-23 Thread Lerh Chuan Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336502#comment-16336502
 ] 

Lerh Chuan Low commented on CASSANDRA-8460:
---

[~pavel.trukhanov] That's a really good question. I can't think of any reason 
why other than it just being a relic of my thoughts from JBOD/making sure 
unrepaired/repaired/pending repaired SSTables stay in different disks...so if 
the user wanted to replace just the cold archive disk they could do so. Though 
I'm not sure if having a separate CS actually allows that. Hmm...

I guess it may become clearer to me as I dive into the code, but thank you for 
pointing it out :)

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-01-23 Thread Pavel Trukhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16335527#comment-16335527
 ] 

Pavel Trukhanov commented on CASSANDRA-8460:


Why can't we simply allow a CS instance to spread across two disks - SSD
and corresponding archival HDD, so it will see all the data for any
particular vnode at once and won't falsely ressurect something?




> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-01-21 Thread Lerh Chuan Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333894#comment-16333894
 ] 

Lerh Chuan Low commented on CASSANDRA-8460:
---

Thinking about this further, looks like this will be (reasonably) complex. 

The main issue is that by introducing an archival directory, we now have 
multiple data directories, which is like a JBOD setup. 
https://issues.apache.org/jira/browse/CASSANDRA-6696 (Partition SSTables by 
token range) seeks to prevent resurrected tombstones - the scenario where you 
can have resurrected tombstones is described here: 
https://www.datastax.com/dev/blog/improving-jbod. 

However, with an archiving directory, we can no longer guarantee that a single 
token range (or vnode) will live in one directory (unless I'm missing 
something. Archiving is based on SSTable age; it doesn't know anything about 
tokens)

High levelly, the situation goes like this: 

1. You have a SSD and a HDD. 
2. Key x is written into the SSD. 
3. After some time, x passes the archive days, and ends up in the HDD. 
4. For some reason not quite clear, the user decides to write a tombstone for x 
(They shouldn't for TWCS). So we now have tomb(x) in the SSD. 

At this point, we must keep in mind that there are 3 separate 
{{CompactionStrategy}} (CS) instances running in both the SSD and HDD, each 
managing repaired, unrepaired and pending repair SSTables. So there are 3 in 
the SSD and 3 in the HDD. These CS instances cannot see each other's 
candidates; when considering candidates for compaction, they see only the 
SSTables in their own directories. 

5. It passes gc_grace_second and tomb(x) is compacted away. So now x is 
resurrected. In an actual JBOD setup, this can't happen because a single token 
range or vnode can only live in one directory. This can't be guaranteed with an 
archiving setup. 

We can solve this issue by introducing a new flag. This flag will make it so 
that a tombstone is only dropped if it lives in the archiving directory. 
Enforcing {{gc_grace > archive_days}} is not sufficient because the node can 
always be taken offline or compactions disabled or similar. 

Consider the case where: 

6. The SSD is corrupted and needs to be replaced. In this case, the fix would 
be to replace the entire node, not just the SSD. This is to prevent tombstone 
resurrection but also that the system tables are gone (system tables live in 
the SSD), so a full replace is needed. 

This is the high level design we came up with: 
* In typical TTL use case TTL should always be greater than archive days 
* Introduce a new YAML setting; call it cold_data_directories possibly. This is 
to signal that 'archive' doesn't mean we can just forget it there; compactions 
still need to happen in that directory, for joining nodes, streaming nodes, and 
keeping the disk usage low. 
* An option on TWCS to specify to use cold directory after a certain amount of 
days. 
* Need a new flag to handle the situation described - cannot drop tombstones 
unless it’s in the cold directory. This also has the implication that we can’t 
drop data using tombstones on the non-archived data. Pretty much means we can’t 
use manual deletions on the table and we should only use this when TTLing 
everything, writing once, and we should turn off read repair.
* Need a separate compaction throughput and concurrent compactors setting for 
the cold directory

Caveats with changes to flags/properties:
* Removing cold flag from the yaml means we've lost the data in those 
directories.
* Removing cold flag from table only means data will no longer be archived to 
cold. Existing SSTables in the cold directory should be loaded in; however if 
compacted moved back to hot storage.
* Reducing the archive time on the table will just cause more data to be moved 
to the cold directory.
* Increasing the archive time means existing data that should no longer be 
archived could go back to the live set if compacted, however will stay in cold 
data with no negative impact.
* When promoting data to cold directory need to check that there’s not an 
overlapping SSTable with a max timestamp greater than minimum timestamp, same 
as TWCS expiry.

There will still be significant I/O when it comes to 
compacting/repairing/streaming the SSTables in the cold directory, and it adds 
reasonable complexity to the code base. It's not trivial to reason about 
either, it took 3 hours between me and my colleagues. The only leftover 
question we had was when changing the table level property will Cassandra need 
to be restarted to take effect? Or is there a hook/property checked constantly?

Anybody notice anything we missed or have any thoughts on it so far on the 
feature itself and the value it adds for the complexity introduced (if you have 
time)? Before we go ahead with it. Will be really appreciated! [~krummas] 
[~bdeggleston] [~jjirsa] [~stone] 




> Make it 

[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-01-18 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331512#comment-16331512
 ] 

Jeff Jirsa commented on CASSANDRA-8460:
---

I no longer have a personal need for it, and it's not in my queue of things I 
plan on working on in the next 2 years. By all means, feel free to start with 
some of my code, but I haven't thought about specifics for quite some time. 

 

 

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2018-01-18 Thread Lerh Chuan Low (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331469#comment-16331469
 ] 

Lerh Chuan Low commented on CASSANDRA-8460:
---

Also just bumping this, wondering if you still had plans with it [~jjirsa] or 
[~bdeggleston]? Looks like with the patch you had previously 
(https://github.com/jeffjirsa/cassandra/commit/cc0ab8f733eef63ed0eaea30cc6f471b467c3ec5#diff-f628011a74763c0d0abc369bc8f5762bR126)
 most of the code changes are still applicable. I am willing to give it a go. 

It sounds like we may still be uncertain on how to go about implementing this. 
My original thoughts are with Jeff's, where the archive directories also keep 
an instance of {{XCompactionStrategy}} running for a repaired, unrepaired and 
pending repair set. It will still have to be read and used eventually when 
doing repairs or streaming when adding a new node...so it increasingly looks 
like it will not be ideal to put it into archiving directory and just never 
touch it again, though I'm happy to implement it however people think is better 
because there may be things that are not obvious to me. Flushing won't be aware 
that an archiving directory exists in this case...and will keep flushing to the 
actual {{data_directories}}. Eventually compaction will pick it up and toss it 
into {{archive_data_directories}}, if applicable. 

[~stone] does raise an interesting point though on making it uncoupled from CS 
and using a background periodic task that archives SSTables. I'm guessing in 
this case you would archive based on...SSTable metadata min/max timestamp? Or 
just the last modified of the SSTable files? It will be a YAML property and if 
there is an SSTable with max timestamp behind X days, archive the SSTable? 


> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Priority: Major
>  Labels: doc-impacting, dtcs
> Fix For: 4.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2017-02-06 Thread Pavel Trukhanov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853810#comment-15853810
 ] 

Pavel Trukhanov commented on CASSANDRA-8460:


Any plans on that one? 

And any thoughts with regards to TWCS?

[~bdeggleston] ? 

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>  Labels: doc-impacting, dtcs
> Fix For: 3.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2016-07-05 Thread stone (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15362257#comment-15362257
 ] 

stone commented on CASSANDRA-8460:
--

a simple implement
https://github.com/FS1360472174/cassandra/commit/a6b16962b6777c64d813e9d4420ac7b175efe007

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>  Labels: doc-impacting, dtcs
> Fix For: 3.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2016-06-28 Thread stone (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15354399#comment-15354399
 ] 

stone commented on CASSANDRA-8460:
--

there are several questions about this issue
1.from application perspective,we rarely used these arhived data,
but when scale up cluster,add node or decommission node,we will stream data 
between node,
since these archived sstable still in token ring,how to deal with these 
archived sstable.we need to 
access them,it may take long time to finish bootstrap when the arhived data is 
too large.

2.why not separate  "archive sstable" from compaction compaction strategy?
archive sstable is not a round,in-time task,we just need to execute the task 
periodly.
I mean there are high coupling between compaction and archiving data.
we can provide a sstable tool to archive data.split sstable by date is the job 
of compaction strategy.
we dont care it is DTCS or TWCS.

3.in ArchivingDateTieredCompactionWriter.java
we archive sstable with SSTableWriter.i just thought that why not use 
softlink.move sstable file,and create softlink.
actually I'm not clearly about how the sstable files are moved with the method 
of SSTableWriter.switchWriter().
I just saw cassandra backup data with hardlink,so we can use softlink to 
archive data.

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>  Labels: doc-impacting, dtcs
> Fix For: 3.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2015-10-24 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14972816#comment-14972816
 ] 

Jeff Jirsa commented on CASSANDRA-8460:
---

I should probably cancel patch-available. Will need significant rebase due to 
CASSANDRA-8671 , and max_sstable_age_days probably isn't the right tuning knob 
to use assuming CASSANDRA-10280 makes it in. [~bdeggleston] - if there's a 
better way to implement since 8671, and you want to chat about how you'd like 
to see this implemented in IRC or email, I'll happily re-implement. 

> Make it possible to move non-compacting sstables to slow/big storage in DTCS
> 
>
> Key: CASSANDRA-8460
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Jeff Jirsa
>  Labels: dtcs
> Fix For: 3.x
>
>
> It would be nice if we could configure DTCS to have a set of extra data 
> directories where we move the sstables once they are older than 
> max_sstable_age_days. 
> This would enable users to have a quick, small SSD for hot, new data, and big 
> spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2015-07-28 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645490#comment-14645490
 ] 

Jeff Jirsa commented on CASSANDRA-8460:
---

Pushed to https://github.com/jeffjirsa/cassandra/tree/cassandra-8460-2.2

1) Removed the time component of archive cutoff (no more 
archive_sstable_age_days), and refactored it to archive at max_sstable_age_days 
to match your original intent rather than projecting my own intentions
2) Reworked to explicitly shortcut and return if no archive disk is present
3) Created unit test (and fixed unit tests that this patch broke, primarily in 
DirectoriesTest)

[~krummas] - Can you review at your convenience? 

 Make it possible to move non-compacting sstables to slow/big storage in DTCS
 

 Key: CASSANDRA-8460
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Jeff Jirsa
  Labels: dtcs
 Fix For: 3.x


 It would be nice if we could configure DTCS to have a set of extra data 
 directories where we move the sstables once they are older than 
 max_sstable_age_days. 
 This would enable users to have a quick, small SSD for hot, new data, and big 
 spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2015-06-23 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597381#comment-14597381
 ] 

Marcus Eriksson commented on CASSANDRA-8460:


bq. 1) If compaction strategy calls for archive, but no archive disk is 
available (not defined or otherwise full), I'm falling back to standard disk. 
Agree?
Can't we check before starting an archive compaction if there are any archive 
locations available? If there are none, we shouldn't compact, right?
bq. 2) I originally planned to explicitly prohibit compaction of N files in 
archival disk, but I couldn't convince myself if that made sense. Instead, I'm 
allowing it if sstable_max_age_days allows it (if you set archive lower than 
max age, you could conceivably compact on archival disk tier). Agree?
The way I originally envisioned this was that once an sstable hits 
max_sstable_age_days, we trigger a compaction that puts it on the slow disk, 
and then we never need to look at those sstables again (unless they eventually 
expire due to TTL). The idea behind max_sstable_age_days is that this is the 
point where we don't expect to do many reads anymore, so it would also be a 
good point to put them on slow disks

I guess it could be a problem if users increase max_sstable_age_days and we 
move the data back to the fast disks though, thoughts?

3) In the case where archived sstables can still be compacted, it's possible in 
some windows to have them compacted with sstables on the faster standard disk. 
In those cases, I'm making a judgement call that if any of the source sstables 
were archived, the resulting sstable will also be archived. Agree?
As in 2), I think we should never compact the sstables on the slow disks.

4) Finally, I was trying to determine the right way to tell if an sstable was 
already archived. The logic I eventually used was simply parsing the path of 
the sstable and seeing if it was in the array of archive directories ( 
https://github.com/jeffjirsa/cassandra/commit/079b22136d178937b28b82326f132e33e96f6cad#diff-894e091348f28001de5b7fe88e65733fR1665
 ) . I'm not convinced this is best, but I didn't know if it was appropriate to 
extend sstablemetadata or similar to avoid this. Thoughts?
We do something similar in Directories.java: 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Directories.java#L242
 - you should probably check absolute paths and use startsWith?

 Make it possible to move non-compacting sstables to slow/big storage in DTCS
 

 Key: CASSANDRA-8460
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Jeff Jirsa
  Labels: dtcs
 Fix For: 3.x


 It would be nice if we could configure DTCS to have a set of extra data 
 directories where we move the sstables once they are older than 
 max_sstable_age_days. 
 This would enable users to have a quick, small SSD for hot, new data, and big 
 spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2015-06-23 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598730#comment-14598730
 ] 

Jeff Jirsa commented on CASSANDRA-8460:
---

Thanks for the feedback, [~krummas]!

{quote}
Can't we check before starting an archive compaction if there are any archive 
locations available? If there are none, we shouldn't compact, right?
{quote}

Yea. There's a few cases here, and I suppose that answer works for all of them: 

- CF compaction strategy specifies archive tier, but no disk is configured on 
the node
- CF compaction strategy specifies archive tier, but there's no free space 
- If we were to allowe max_sstable_age_days  archive_sstables_age_days, there 
could be a use case where 2 sstables on archive storage would be eligible for 
compaction, but there may not be room for them to be combined. If we don't 
allow this, then the potential edge case goes away.


{quote}
I guess it could be a problem if users increase max_sstable_age_days and we 
move the data back to the fast disks though, thoughts?
{quote}

Is that a problem? If the user wants to tune the parameter, we should support 
it. 

{quote}
As in 2), I think we should never compact the sstables on the slow disks.
{quote}

I'll write it however you want it, but my assumption was that the 
{{max_sstable_age_days}} parameter is set and greater than 
{{archive_sstables_age_days}}, we would still compact, it's just obviously 
slower. In my mind, it's a cost/performance tradeoff for operators - slow disk 
may not be SUPER slow, it may just be 10k iops instead of 20k iops, so 
compaction may be OK, just not the best for hottest data. If you're adamant 
about not allowing compaction on the archive tier, I'll add a check so that 
{{max_sstable_age_days}} can not be set higher than 
{{archive_sstables_age_days}} . 

{quote}
you should probably check absolute paths and use startsWith?
{quote}

Noted, I like that way better.

Thanks again. I'll work on finishing this up and adding some tests.

 Make it possible to move non-compacting sstables to slow/big storage in DTCS
 

 Key: CASSANDRA-8460
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Jeff Jirsa
  Labels: dtcs
 Fix For: 3.x


 It would be nice if we could configure DTCS to have a set of extra data 
 directories where we move the sstables once they are older than 
 max_sstable_age_days. 
 This would enable users to have a quick, small SSD for hot, new data, and big 
 spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2015-06-16 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589084#comment-14589084
 ] 

Jeff Jirsa commented on CASSANDRA-8460:
---

Pushed a version, which I believe works as described. Would appreciate some 
feedback, and then if it looks promising, I'll finish it up with adding unit 
tests.

https://github.com/jeffjirsa/cassandra/commit/079b22136d178937b28b82326f132e33e96f6cad

A few explicit questions for [~krummas] and [~Bj0rn] : 

1) If compaction strategy calls for archive, but no archive disk is available 
(not defined or otherwise full), I'm falling back to standard disk. Agree? 
https://github.com/jeffjirsa/cassandra/commit/079b22136d178937b28b82326f132e33e96f6cad#diff-2c2b50ecd5e8515531c5d041117c9b4fR371

2) I originally planned to explicitly prohibit compaction of N files in 
archival disk, but I couldn't convince myself if that made sense. Instead, I'm 
allowing it if sstable_max_age_days allows it (if you set archive lower than 
max age, you could conceivably compact on archival disk tier). Agree? 

3) In the case where archived sstables can still be compacted, it's possible in 
some windows to have them compacted with sstables on the faster standard disk. 
In those cases, I'm making a judgement call that if any of the source sstables 
were archived, the resulting sstable will also be archived. Agree? 
https://github.com/jeffjirsa/cassandra/commit/079b22136d178937b28b82326f132e33e96f6cad#diff-7a9ada329d886c1871344b1d6fceec5cR56

4) Finally, I was trying to determine the right way to tell if an sstable was 
already archived. The logic I eventually used was simply parsing the path of 
the sstable and seeing if it was in the array of archive directories ( 
https://github.com/jeffjirsa/cassandra/commit/079b22136d178937b28b82326f132e33e96f6cad#diff-894e091348f28001de5b7fe88e65733fR1665
 ) . I'm not convinced this is best, but I didn't know if it was appropriate to 
extend sstablemetadata or similar to avoid this. Thoughts? 




 Make it possible to move non-compacting sstables to slow/big storage in DTCS
 

 Key: CASSANDRA-8460
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
Assignee: Jeff Jirsa
  Labels: dtcs

 It would be nice if we could configure DTCS to have a set of extra data 
 directories where we move the sstables once they are older than 
 max_sstable_age_days. 
 This would enable users to have a quick, small SSD for hot, new data, and big 
 spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2015-06-12 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14583935#comment-14583935
 ] 

Marcus Eriksson commented on CASSANDRA-8460:


bq. So my initial approach was to define a second config item, separate from 
data_file_directories
yeah lets keep it simple for now, add a new config variable like you suggest 

 Make it possible to move non-compacting sstables to slow/big storage in DTCS
 

 Key: CASSANDRA-8460
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
  Labels: dtcs

 It would be nice if we could configure DTCS to have a set of extra data 
 directories where we move the sstables once they are older than 
 max_sstable_age_days. 
 This would enable users to have a quick, small SSD for hot, new data, and big 
 spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2015-06-12 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14583809#comment-14583809
 ] 

Jeff Jirsa commented on CASSANDRA-8460:
---

{quote}yes, I've been thinking maybe adding priorities or tags to the data 
directories, but that is probably not needed now. Adding a flag to each 
data_directory that states whether it is for archival storage or not is 
probably enough for now.{quote}

Asking for clarification to make sure I don't go too far into pony land:

So my initial approach was to define a second config item, separate from 
{{data_file_directories}} entirely, so that no other code needed to be aware of 
it except for classes explicitly wanting to use `archive` tier storage ( 
{{dd.getAllDataFileLocations()}} would not return the archive tier, but rather 
add a {{dd.getArchiveDataFileLocations()}} specifically for the slow class of 
storage).  

It sounds from your description you're envisioning changing the list of 
data_file_locations to a list of maps {noformat} 
[tag1:location1,tag1:location2,tag3:location3] {noformat} or {noformat} 
tag1:[location1,location2],tag3:[location3] {noformat} In this case, we'd also 
need to maintain backwards compatibility, which seems fairly straight forward 
to do (check to see if the provided {{data_files_directory}} is an old-format 
list rather than map and apply some default tag?)

The first approach is clean and isolated, unlikely to introduce surprises, but 
potentially limits us from being able to do more interesting work with tagged 
data file directories later (ie: only store data for KS W in data directories 
tagged X, and KS Y in data directories tagged Z). Can you clarify which best 
fits your expectations? 


 Make it possible to move non-compacting sstables to slow/big storage in DTCS
 

 Key: CASSANDRA-8460
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
  Labels: dtcs

 It would be nice if we could configure DTCS to have a set of extra data 
 directories where we move the sstables once they are older than 
 max_sstable_age_days. 
 This would enable users to have a quick, small SSD for hot, new data, and big 
 spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2015-05-18 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547604#comment-14547604
 ] 

Marcus Eriksson commented on CASSANDRA-8460:


bq. 1) Create a new notion of tiered storage configurable per node in yaml
yes, I've been thinking maybe adding priorities or tags to the data 
directories, but that is probably not needed now. Adding a flag to each 
data_directory that states whether it is for archival storage or not is 
probably enough for now.
bq. 2) Allow compaction strategies access to the various tiers with 
CASSANDRA-8671
yes, but CASSANDRA-8671 is mostly to give compaction strategies more control 
over flushing and streaming locations - with the CompactionAwareWriter 
interface added in 2.2 I think we get most of what we need for this ticket.
bq. 3) Extend DTCS to take advantage of CASSANDRA-8671 + slow tier from step 1 
as a compaction option
sounds good to me

 Make it possible to move non-compacting sstables to slow/big storage in DTCS
 

 Key: CASSANDRA-8460
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson

 It would be nice if we could configure DTCS to have a set of extra data 
 directories where we move the sstables once they are older than 
 max_sstable_age_days. 
 This would enable users to have a quick, small SSD for hot, new data, and big 
 spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2015-05-16 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546975#comment-14546975
 ] 

Jeff Jirsa commented on CASSANDRA-8460:
---

[~krummas] Does it make sense to address this in a few parts? 

1) Create a new notion of tiered storage configurable per node in yaml (either 
one default tier for hot data {{data_file_directories}} and one tier for cold 
data {{archive_file_directories}}, or some form of arbitrary named tiers? )
2) Allow compaction strategies access to the various tiers with CASSANDRA-8671 
( tagging [~bdeggleston] for visibility )
3) Extend DTCS to take advantage of CASSANDRA-8671 + slow tier from step 1 as a 
compaction option such as {{WITH compaction = {'class': 
'DateTieredCompactionStrategy', 'timestamp_resolution':'resolution', 
'base_time_seconds':'3600', 'max_sstable_age_days':'7', 
'max_sstable_age_disk_tier':'archive' }; }} ? 



 Make it possible to move non-compacting sstables to slow/big storage in DTCS
 

 Key: CASSANDRA-8460
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson

 It would be nice if we could configure DTCS to have a set of extra data 
 directories where we move the sstables once they are older than 
 max_sstable_age_days. 
 This would enable users to have a quick, small SSD for hot, new data, and big 
 spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8460) Make it possible to move non-compacting sstables to slow/big storage in DTCS

2015-03-31 Thread Jim Plush (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389447#comment-14389447
 ] 

Jim Plush commented on CASSANDRA-8460:
--

We also have this use case... for the PB+ size clusters where 90% of the data 
is cold storage and rarely used it would be nice to have some cheap spinning 
disks that could hold the data. Read latencies would be less of a concern do to 
the in-frequency of reads. 

 Make it possible to move non-compacting sstables to slow/big storage in DTCS
 

 Key: CASSANDRA-8460
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8460
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson

 It would be nice if we could configure DTCS to have a set of extra data 
 directories where we move the sstables once they are older than 
 max_sstable_age_days. 
 This would enable users to have a quick, small SSD for hot, new data, and big 
 spinning disks for data that is rarely read and never compacted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)