Alex Dejanovski wrote a good post on how the LIMIT clause works and why it 
doesn’t (until 3.4) work the way you think it would.

http://thelastpickle.com/blog/2017/03/07/The-limit-clause-in-cassandra-might-not-work-as-you-think.html

> On Apr 7, 2017, at 7:23 AM, Jerry Lam <[email protected]> wrote:
> 
> Hi Jan,
> 
> Thank you for the clarification and knowledge sharing. 
> 
> A follow-up question is:
> 
> Does Cassandra need to read all sstables for customer_id = 1L if my query is:
> 
> select view_id from customer_view where customer_id = 1L limit 1
> 
> Since I have the date_day as the clustering key and it is sorted by 
> descending order. I'm assuming that the above query will return the latest 
> view_id for customer_id 1L. 
> 
> Since I'm using TWCS, does Cassandra is smart enough to just query the latest 
> sstable that matches the partition key (customer_id = 1L) or it has to go 
> through the entire merge process?
> 
> Thank you,
> 
> Jerry
> 
> 
> On Fri, Apr 7, 2017 at 2:08 AM, <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi Jerry,
> 
>  
> 
> the compaction strategy just tells Cassandra how to compact your sstables and 
> with TWCS when to stop compacting further. But of course your data can and 
> most likely will live in multiple sstables.
> 
>  
> 
> The magic that happens is the the coordinator node for your request will 
> merge the data for you on the fly. It is an easy job, as your data per 
> sstable is already sorted.
> 
>  
> 
> But be careful, if you end up with a worst case. If a customer_id is insertet 
> every hour you can end up with reading many sstables decreasing read 
> performance if the data should be kept a year or so.
> 
>  
> 
> Jan
> 
>  
> 
> Gesendet von meinem Windows 10 Phone
> 
>  
> 
> Von: Jerry Lam <mailto:[email protected]>
> Gesendet: Freitag, 7. April 2017 00:30
> An: [email protected] <mailto:[email protected]>
> Betreff: How does clustering key works with TimeWindowCompactionStrategy 
> (TWCS)
> 
>  
> 
> Hi guys,
> 
>  
> 
> I'm a new and happy user of Cassandra. We are using Cassandra for time series 
> data so we choose TWCS because of its predictability and its ease of 
> configuration.
> 
>  
> 
> My question is we have a table with the following schema:
> 
>  
> 
> CREATE TABLE IF NOT EXISTS customer_view (
> 
> customer_id bigint,
> 
> date_day Timestamp,
> 
> view_id bigint,
> 
> PRIMARY KEY (customer_id, date_day)
> 
> ) WITH CLUSTERING ORDER BY (date_day DESC)
> 
>  
> 
> What I understand is that the data will be order by date_day within the 
> partition using the clustering key. However, the same customer_id can be 
> inserted to this partition several times during the day and the TWCS says it 
> will only compact the sstables within the window interval set in the 
> configuration (in our case is 1 hour). 
> 
>  
> 
> How does Cassandra guarantee the clustering key order when the same 
> customer_id appears in several sstables? Does it need to do a merge and then 
> sort to find out the latest view_id for the customer_id? Or there are some 
> magics happen behind the book can tell?
> 
>  
> 
> Best Regards,
> 
>  
> 
> Jerry
> 
>  
> 
> 

Reply via email to