I can't comment on the technical question, however one thing I learnt with managing the growth of data is that the $/GB of tends to drop at a rate that can absorb a moderate proportion of the increase in cost due to the increase in size of data. I'd recommend having a wet-finger-in-the-air stab at projecting the growth in data sizes versus the historical trends in the decease in cost of storage.
cheers On Fri, Nov 1, 2013 at 7:15 AM, Dave Cowen <d...@luciddg.com> wrote: > Hi, all - > > I'm currently managing a small Cassandra cluster, several nodes with local > SSD storage. > > It's difficult for to forecast the growth of the Cassandra data over the > next couple of years for various reasons, but it is virtually guaranteed to > grow substantially. > > During this time, there may be times where it is desirable to increase the > amount of storage available to each node, but, assuming we are not I/O > bound, keep from expanding the cluster horizontally with additional nodes > that have local storage. In addition, expanding with local SSDs is costly. > > My colleagues and I have had several discussions of a couple of other > options that don't involve scaling horizontally or adding SSDs: > > 1) Move to larger, cheaper spinning-platter disks. However, when > monitoring the performance of our cluster, we see sustained periods - > especially during repair/compaction/cleanup - of several hours where there > are >2000 IOPS. It will be hard to get to that level of performance in each > node with spinning platter disks, and we'd prefer not to take that kind of > performance hit during maintenance operations. > > 2) Move some nodes to a SAN solution, ensuring that there is a mix of > storage, drives, LUNs and RAIDs so that there isn't a single point of > failure. While we're aware that this is frowned on in the Cassandra > community due to Cassandra's design, a SAN seems like the obvious way of > being able to quickly add storage to a cluster without having to juggle > local drives, and provides a level of performance between local spinning > platter drives and local SSDs. > > So, the questions: > > 1) Has anyone moved from SSDs to spinning-platter disks, or managed a > cluster that contained both? Do the numbers we're seeing exaggerate the > performance hit we'd see if we moved to spinners? > > 2) Have you successfully used a SAN or a hybrid SAN solution (some local, > some SAN-based) to dynamically add storage to the cluster? What type of SAN > have you used, and what issues have you run into? > > 3) Am I missing a way of economically scaling storage? > > Thanks for any insight. > > Dave > -- *Franc Carter* | Systems architect | Sirca Ltd <marc.zianideferra...@sirca.org.au> franc.car...@sirca.org.au | www.sirca.org.au Tel: +61 2 8355 2514 Level 4, 55 Harrington St, The Rocks NSW 2000 PO Box H58, Australia Square, Sydney NSW 1215