Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread AJ
On 6/6/2011 11:25 PM, Benjamin Coverston wrote: Currently, my data dir has about 16 sets. I thought that compaction (with nodetool) would clean-up these files, but it doesn't. Neither does cleanup or repair. You're not even talking about snapshots using nodetool snapshot yet. Also nodetool

Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread Maki Watanabe
You can find useful information in: http://www.datastax.com/docs/0.8/operations/scheduled_tasks sstables are immutable. Once it written to disk, it won't be updated. When you take snapshot, the tool makes hard links to sstable files. After certain time, you will have some times of memtable

RE: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread Jeremiah Jordan
: Backups, Snapshots, SSTable Data Files, Compaction On 6/7/2011 2:29 AM, Maki Watanabe wrote: You can find useful information in: http://www.datastax.com/docs/0.8/operations/scheduled_tasks sstables are immutable. Once it written to disk, it won't be updated. When you take snapshot, the tool

Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread Benjamin Coverston
Hi AJ, Unfortunately, for storage capacity planning it's a bit of a guessing game. Until you run your load against it and profile the usage you just are not going to know for sure. I have seen cases where planning to have 50% excess capacity/node was plenty, and I have seen other extreme

Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread AJ
Thanks to everyone who responded thus far. On 6/7/2011 10:16 AM, Benjamin Coverston wrote: snip Not to say that there aren't workloads where having many TB/Node doesn't work, but if you're planning to read from the data you're writing you do want to ensure that your working set is stored in

Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread aaron morton
I'd also say consider what happens during maintenance and failure scenarios. Moving 10's TB around takes a lot longer than 100's GB. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 8 Jun 2011, at 06:40, AJ wrote: Thanks to

Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-07 Thread Benjamin Coverston
Aaron makes a good point, the happiest customers in my opinion are the ones that choose nodes on the smaller side, and more of them. Regarding the working set, I am referring to the OS cache. On linux, with JNA, Cassadra utilizes, to great effectiveness, memory mapped files and this is where

Backups, Snapshots, SSTable Data Files, Compaction

2011-06-06 Thread AJ
Hi, I am working on a backup strategy and am trying to understand what is going on in the data directory. I notice that after a write to a CF and then flush, a new set of data files are created with an index number incremented in their names, such as: Initially: Users-e-1-Filter.db

Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-06 Thread Benjamin Coverston
Hi AJ, inline: On 6/6/11 11:03 PM, AJ wrote: Hi, I am working on a backup strategy and am trying to understand what is going on in the data directory. I notice that after a write to a CF and then flush, a new set of data files are created with an index number incremented in their names,