Alex Dejanovski wrote a good post on how the LIMIT clause works and why it doesn’t (until 3.4) work the way you think it would.
http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html > On Apr 7, 2017, at 7:23 AM, Jerry Lam <[email protected]> wrote: > > Hi Jan, > > Thank you for the clarification and knowledge sharing. > > A follow-up question is: > > Does Cassandra need to read all sstables for customer_id = 1L if my query is: > > select view_id from customer_view where customer_id = 1L limit 1 > > Since I have the date_day as the clustering key and it is sorted by > descending order. I'm assuming that the above query will return the latest > view_id for customer_id 1L. > > Since I'm using TWCS, does Cassandra is smart enough to just query the latest > sstable that matches the partition key (customer_id = 1L) or it has to go > through the entire merge process? > > Thank you, > > Jerry > > > On Fri, Apr 7, 2017 at 2:08 AM, <[email protected] > <mailto:[email protected]>> wrote: > Hi Jerry, > > > > the compaction strategy just tells Cassandra how to compact your sstables and > with TWCS when to stop compacting further. But of course your data can and > most likely will live in multiple sstables. > > > > The magic that happens is the the coordinator node for your request will > merge the data for you on the fly. It is an easy job, as your data per > sstable is already sorted. > > > > But be careful, if you end up with a worst case. If a customer_id is insertet > every hour you can end up with reading many sstables decreasing read > performance if the data should be kept a year or so. > > > > Jan > > > > Gesendet von meinem Windows 10 Phone > > > > Von: Jerry Lam <mailto:[email protected]> > Gesendet: Freitag, 7. April 2017 00:30 > An: [email protected] <mailto:[email protected]> > Betreff: How does clustering key works with TimeWindowCompactionStrategy > (TWCS) > > > > Hi guys, > > > > I'm a new and happy user of Cassandra. We are using Cassandra for time series > data so we choose TWCS because of its predictability and its ease of > configuration. > > > > My question is we have a table with the following schema: > > > > CREATE TABLE IF NOT EXISTS customer_view ( > > customer_id bigint, > > date_day Timestamp, > > view_id bigint, > > PRIMARY KEY (customer_id, date_day) > > ) WITH CLUSTERING ORDER BY (date_day DESC) > > > > What I understand is that the data will be order by date_day within the > partition using the clustering key. However, the same customer_id can be > inserted to this partition several times during the day and the TWCS says it > will only compact the sstables within the window interval set in the > configuration (in our case is 1 hour). > > > > How does Cassandra guarantee the clustering key order when the same > customer_id appears in several sstables? Does it need to do a merge and then > sort to find out the latest view_id for the customer_id? Or there are some > magics happen behind the book can tell? > > > > Best Regards, > > > > Jerry > > > >
