Is SASI index in Cassandra efficient for high cardinality columns?
I understand Secondary Indexes in general are inefficient on high cardinality columns but since SASI is built from scratch I wonder if the same argument applies there? If not, Why? Because I believe primary keys in Cassandra are indeed indexed and since Primary key is supposed to be the column with highest cardinality why not do the same for secondary indexes?
Re: Is SASI index in Cassandra efficient for high cardinality columns?
Define precisely what you mean by "high cardinality columns". Do you mean: 1) a single indexed value is present in a lot of rows 2) a single indexed value has only a few (if not just one) matching row On Sat, Oct 15, 2016 at 8:37 AM, Kant Kodaliwrote: > I understand Secondary Indexes in general are inefficient on high > cardinality columns but since SASI is built from scratch I wonder if the > same argument applies there? If not, Why? Because I believe primary keys in > Cassandra are indeed indexed and since Primary key is supposed to be the > column with highest cardinality why not do the same for secondary indexes? >
Re: Is SASI index in Cassandra efficient for high cardinality columns?
Well I went with the definition from wikipedia and that definition rules out #1 so it is #2 and it is just one matching row in my case. On Sat, Oct 15, 2016 at 2:40 AM, DuyHai Doanwrote: > Define precisely what you mean by "high cardinality columns". Do you mean: > > 1) a single indexed value is present in a lot of rows > 2) a single indexed value has only a few (if not just one) matching row > > > On Sat, Oct 15, 2016 at 8:37 AM, Kant Kodali wrote: > >> I understand Secondary Indexes in general are inefficient on high >> cardinality columns but since SASI is built from scratch I wonder if the >> same argument applies there? If not, Why? Because I believe primary keys in >> Cassandra are indeed indexed and since Primary key is supposed to be the >> column with highest cardinality why not do the same for secondary indexes? >> > >
Re: Is SASI index in Cassandra efficient for high cardinality columns?
If each indexed value has very few matching rows, then querying using SASI (or any impl of secondary index) may scan the whole cluster. This is because the index are "distributed" e.g. the indexed values stay on the same nodes as the base data. And even SASI with its own data-structure will not help much here. One should understand that the 2nd index query has to deal with 2 layers: 1) The cluster layer, which is common for any impl of 2nd index. Read my blog post here: http://www.planetcassandra.org/blog/cassandra-native-secondary-index-deep-dive/ 2) The local read path, which depends on the impl of 2nd index. Some are using Lucene library like Stratio impl, some rolls in its own data structures like SASI If you have a 1-to-1 relationship between the index value and the matching row (or 1-to-a few), I would recommend using materialized views instead: http://www.slideshare.net/doanduyhai/sasi-cassandra-on-the-full-text-search-ride-voxxed-daybelgrade-2016/25 Materialized views guarantee that for each search indexed value, you only hit a single node (or N replicas depending on the used consistency level) However, materialized views have their own drawbacks (weeker consistency guarantee) and you can't use range queries (<, >, ≤, ≥) or full text search on the indexed value On Sat, Oct 15, 2016 at 11:55 AM, Kant Kodaliwrote: > Well I went with the definition from wikipedia and that definition rules > out #1 so it is #2 and it is just one matching row in my case. > > > > On Sat, Oct 15, 2016 at 2:40 AM, DuyHai Doan wrote: > > > Define precisely what you mean by "high cardinality columns". Do you > mean: > > > > 1) a single indexed value is present in a lot of rows > > 2) a single indexed value has only a few (if not just one) matching row > > > > > > On Sat, Oct 15, 2016 at 8:37 AM, Kant Kodali wrote: > > > >> I understand Secondary Indexes in general are inefficient on high > >> cardinality columns but since SASI is built from scratch I wonder if the > >> same argument applies there? If not, Why? Because I believe primary > keys in > >> Cassandra are indeed indexed and since Primary key is supposed to be the > >> column with highest cardinality why not do the same for secondary > indexes? > >> > > > > >
[GitHub] cassandra issue #76: CASSANDRA-12541, CASSANDRA-12542, CASSANDRA-12543 and C...
Github user doanduyhai commented on the issue: https://github.com/apache/cassandra/pull/76 Can you give some description of the issue ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] cassandra issue #76: CASSANDRA-12541, CASSANDRA-12542, CASSANDRA-12543 and C...
Github user deshpamit commented on the issue: https://github.com/apache/cassandra/pull/76 HP Fortify Analysis flagged Portability Flaw: Locale Dependent Comparison https://issues.apache.org/jira/browse/CASSANDRA-12541 same issue for all defects --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] cassandra pull request #76: CASSANDRA-12541, CASSANDRA-12542, CASSANDRA-1254...
GitHub user deshpamit opened a pull request: https://github.com/apache/cassandra/pull/76 CASSANDRA-12541, CASSANDRA-12542, CASSANDRA-12543 and CASSANDRA-12545 You can merge this pull request into a Git repository by running: $ git pull https://github.com/deshpamit/cassandra trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/cassandra/pull/76.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #76 commit cb47a3937bab99980d4fec481bfb2b163535a2f0 Author: Amit DeshpandeDate: 2016-10-15T16:23:50Z CASSANDRA-12541, CASSANDRA-12542, CASSANDRA-12543 and CASSANDRA-12545 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] cassandra issue #76: CASSANDRA-12541, CASSANDRA-12542, CASSANDRA-12543 and C...
Github user edwardcapriolo commented on the issue: https://github.com/apache/cassandra/pull/76 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---