Re: Would deleted columns slow down reads?

2010-02-25 Thread Jonathan Ellis
Yes, that's going to hurt forward scans with no start column.
(Reverse scans, or scans that start with a known live column, will
still be fast b/c of the per-row column indexes.)

On Thu, Feb 25, 2010 at 8:56 PM, Edmond Lau edm...@ooyala.com wrote:
 Given that Cassandra needs to maintain tombstones to handle
 distributed deletes, does the existence of deleted columns slow down
 slices?

 To be more concrete, suppose I used a row as a queue.  I keep adding
 columns to the end of the sort order of a column family, and I keep
 deleting columns from the start of the sort order.  After some time,
 the row would have a large number of deleted columns followed by a
 number of undeleted columns in the column family.  Does slicing for
 the first N columns from the row now require scanning over all the
 initial deleted columns (meaning reads would get more expensive as
 time goes on), or are the deleted columns stored separately to enable
 Cassandra to skip over deleted columns when processing reads?

 Edmond



Re: Would deleted columns slow down reads?

2010-02-25 Thread Edmond Lau
Thanks for the confirmation - that's what I suspected.

Edmond

On Thu, Feb 25, 2010 at 7:00 PM, Jonathan Ellis jbel...@gmail.com wrote:
 Yes, that's going to hurt forward scans with no start column.
 (Reverse scans, or scans that start with a known live column, will
 still be fast b/c of the per-row column indexes.)

 On Thu, Feb 25, 2010 at 8:56 PM, Edmond Lau edm...@ooyala.com wrote:
 Given that Cassandra needs to maintain tombstones to handle
 distributed deletes, does the existence of deleted columns slow down
 slices?

 To be more concrete, suppose I used a row as a queue.  I keep adding
 columns to the end of the sort order of a column family, and I keep
 deleting columns from the start of the sort order.  After some time,
 the row would have a large number of deleted columns followed by a
 number of undeleted columns in the column family.  Does slicing for
 the first N columns from the row now require scanning over all the
 initial deleted columns (meaning reads would get more expensive as
 time goes on), or are the deleted columns stored separately to enable
 Cassandra to skip over deleted columns when processing reads?

 Edmond