Re: quick question about data layout on disk

2012-08-11 Thread Aaron Turner
So how does that work? An sstable is for a single CF, but it can and likely will have multiple rows. There is no read to write and as I understand it, writes are append operations. So if you have an sstable with say 26 different rows (A-Z) already in it with a bunch of columns and you add a new

Re: quick question about data layout on disk

2012-08-11 Thread Edward Capriolo
Aaron, I have not deep dived the data files in a while but this is how I understand it. http://wiki.apache.org/cassandra/ArchitectureSSTable There is no need to store the row key each time with the column. RowKey to columns is a one to many relationship. This would be a diagram of a physical

Re: quick question about data layout on disk

2012-08-11 Thread Aaron Turner
Thanks Russell, that's the info I was looking for! On Sat, Aug 11, 2012 at 11:23 AM, Russell Haering russellhaer...@gmail.com wrote: Your update doesn't go directly to an sstable (which are immutable), it is first merged to an in-memory table. Eventually the memtable is flushed to a new

quick question about data layout on disk

2012-08-10 Thread Aaron Turner
Curious, but does cassandra store the rowkey along with every column/value pair on disk (pre-compaction) like Hbase does? If so (which makes the most sense), I assume that's something that is optimized during compaction? -- Aaron Turner http://synfin.net/ Twitter: @synfinatic

Re: quick question about data layout on disk

2012-08-10 Thread Terje Marthinussen
Rowkey is stored only once in any sstable file. That is, in the spesial case where you get sstable file per column/value, you are correct, but normally, I guess most of us are storing more per key. Regards, Terje On 11 Aug 2012, at 10:34, Aaron Turner synfina...@gmail.com wrote: Curious, but