First: 

Compaction controls how sstables are combined but not how they’re read. The 
read path (with one tiny exception) doesn’t know or care which compaction 
strategy you’re using. 

A few more notes inline. 

> On Jan 8, 2019, at 3:04 AM, Jinhua Luo <luajit...@gmail.com> wrote:
> 
> Hi All,
> 
> The compaction would organize the sstables, e.g. with LCS, the
> sstables would be categorized into levels, and the read path should
> read sstables level by level until the read is fulfilled, correct?

LCS levels are to minimize the number of sstables scanned - at most one per 
level - but there’s no attempt to fulfill the read with low levels beyond the 
filtering done by timestamp.

> 
> For STCS, it would search sstables in buckets from smallest to largest?

Nope. No attempt to do this. 

> 
> What about other compaction cases? They would iterate all sstables?

In all cases, we’ll use a combination of bloom filters and sstable metadata and 
indices to include / exclude sstables. If the bloom filter hits, we’ll consider 
things like timestamps and whether or not the min/max clustering of the sstable 
matches the slice we care about. We don’t consult the compaction strategy, 
though the compaction strategy may have (in the case of LCS or TWCS) placed the 
sstables into a state that makes this read less expensive.
 
> 
> But in the codes, I'm confused a lot:
> In 
> org.apache.cassandra.db.SinglePartitionReadCommand#queryMemtableAndDiskInternal,
> it seems that no matter whether the selected columns (except the
> collection/cdt and counter cases, let's assume here the selected
> columns are simple cell) are collected and satisfied, it would search
> both memtable and all sstables, regardless of the compaction strategy.

There’s another that includes timestamps that will do some smart-ish exclusion 
of sstables that aren’t needed for the read command.  

> 
> Why?
> 
> Moreover, for collection/cdt (non-frozen) and counter types, it would
> need to iterate all sstable to ensure the whole set of the fields are
> collected, correct? If so, such multi-cell or counter types are
> heavyweight in performance, correct?
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to