Re: A few stupid questions...

Tyler Hobbs Tue, 26 May 2015 13:46:36 -0700

On Tue, May 26, 2015 at 2:00 PM, Eax Melanhovich <m...@eax.me> wrote:


>
> First. Lets say I have a table (field1, field2, field3, field4), where
> (field1, field2) is a primary key and field1 is partition key. There is
> a secondary index for field3 column. Do I right understand that in this
> case query like:
>
> select ... from my_table where field1 = 123 and field3 > '...';
>
> ... would be quite efficient, i.e. request would be send only to one
> node, not the whole cluster?
>

You are correct that it would only query one node (or one set of replicas,
if RF > 1 and CL > 1) due to the partition key being restricted.  However,
using '>' for the operator on the indexed column forces Cassandra to scan
the partition instead of using the index, because secondary indexes only
support '=' operations.  If you care about performance, you're probably
better off creating a dedicated table to serve this type of query, as
described here:
http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling


>
> Second. Lets say there is some data that almost never changes but is
> read all the time. E.g. information about smiles in social network. Or
> current sessions. In this case would Cassandra cache "hot" data in
> memtable? Or such data should be stored somewhere else, i.e. Redis or
> Couchbase?


Memtables are only used for buffering writes, not for caching read data.
Cassandra does have several layers of caching though.  Frequently read data
will end up in the key cache and the OS page cache, making reads quite
efficient.  Optionally, you can also enable the row cache.  Since you're
almost never modifying the data, the row cache is actually a decent fit,
although I recommend testing it heavily with your use case for stability.
The best way to find out if your performance is good enough is to benchmark
it with your own usecase.


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: A few stupid questions...

Reply via email to