I have been thinking about tiered storage wherein infrequently used data
can be moved off to slow (cold) storage (like S3).  I think that CEP-17 in
conjunction with CEP-21 provides an opportunity for an interesting approach.

As I understand it CEP-17 clarified the SSTables interface(s) so that
alternative implementations are possible, most notably CEM-25 (trie format
sstables).  CEP-21 provides a mechanism by which specific primary key
blocks can be assigned to specific servers.

It seems to me that we could implement an SSTable format that reads/writes
S3 storage and then use CEP-21 to direct specific keys to servers that
implement that storage.

I use primary key because I don't think we can reasonably partition the
records onto cold storage using any other method.

I think that records on the cold storage may be deleted, and may be updated
but both operations may take significant time and would require compaction
to be run at some point.  I expect that compaction would be very slow.

I am certain there are issues with this approach and am looking for
feedback before progressing an architecture proposal.

Thanks,
Claude

Reply via email to