Re: Is SASI index in Cassandra efficient for high cardinality columns?

2016-10-21 Thread DuyHai Doan
If you read my blog post about 2nd index deep dive, you'll get all the answers Le 21 oct. 2016 10:20, "Kant Kodali" a écrit : > Why Secondary index cannot be broken down into token ranges like primary > index at least for exact matches? That way dont need to scan the whole >

Re: Is SASI index in Cassandra efficient for high cardinality columns?

2016-10-21 Thread Kant Kodali
Why Secondary index cannot be broken down into token ranges like primary index at least for exact matches? That way dont need to scan the whole cluster atleast for exact matches. I understand if it is a substring search then there will 2^n substrings which equates to 2^n hashes/tokens which can be

Re: Is SASI index in Cassandra efficient for high cardinality columns?

2016-10-15 Thread DuyHai Doan
If each indexed value has very few matching rows, then querying using SASI (or any impl of secondary index) may scan the whole cluster. This is because the index are "distributed" e.g. the indexed values stay on the same nodes as the base data. And even SASI with its own data-structure will not

Re: Is SASI index in Cassandra efficient for high cardinality columns?

2016-10-15 Thread Kant Kodali
Well I went with the definition from wikipedia and that definition rules out #1 so it is #2 and it is just one matching row in my case. On Sat, Oct 15, 2016 at 2:40 AM, DuyHai Doan wrote: > Define precisely what you mean by "high cardinality columns". Do you mean: > > 1)

Re: Is SASI index in Cassandra efficient for high cardinality columns?

2016-10-15 Thread DuyHai Doan
Define precisely what you mean by "high cardinality columns". Do you mean: 1) a single indexed value is present in a lot of rows 2) a single indexed value has only a few (if not just one) matching row On Sat, Oct 15, 2016 at 8:37 AM, Kant Kodali wrote: > I understand

Is SASI index in Cassandra efficient for high cardinality columns?

2016-10-15 Thread Kant Kodali
I understand Secondary Indexes in general are inefficient on high cardinality columns but since SASI is built from scratch I wonder if the same argument applies there? If not, Why? Because I believe primary keys in Cassandra are indeed indexed and since Primary key is supposed to be the column