Hi, As a general rule of thumb I would steer clear of secondary indexes, this is also the official stand that DataStax take (see p5 of their best practices doc: http://www.datastax.com/wp-content/uploads/2014/04/WP-DataStax-Enterprise-Best-Practices.pdf).
“It is best to avoid using Cassandra's built-in secondary indexes where possible. Instead, it is recommended to denormalize data and manually maintain a dynamic table as a form of an index instead of using a secondary index. If and when secondary indexes are to be used, they should be created only on columns containing low-cardinality data (for example: fields with less than 1000 states).“ Mark On 22 Aug 2014, at 15:58, DuyHai Doan <doanduy...@gmail.com> wrote: > Hello Eric > > "Under the hood what is the difference of the both solutions?" > > 1. Cassandra secondary index: distributed index, supports better high volume > of data, the index itself is distributed so there is no bottleneck. The > tradeoff is that depending on the cardinality of data having the same > "bucketname+tenantID" the performance may drop sharply. Please read this: > http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_when_use_index_c.html?scroll=concept_ds_sgh_yzz_zj__when-no-index. > There are several restrictions to secondary index > > 2. Manual index: easy to design, but potentially wide row and not well > balance if data having the same "bucketname+tenantID" is very large. > Furthermore you need to manage index consistency manually so that it is > synced with source data updates. > > The best thing to do is to benchmark both solutions and takes the approach > giving you the best results. Be careful with benchmarks, it should be > representative of the data pattern you likely have in production. > > > On Fri, Aug 22, 2014 at 7:47 AM, Leleu Eric <eric.le...@worldline.com> wrote: > Hi, > > > > > > I’m new with Cassandra and I wondering what is the best design for my case. > > > > I have a set of buckets that contain one or thousands of contents. > > > > Here is my Content CF : > > > > CREATE TABLE IF NOT EXISTS contents (tenantID varchar, > > key varchar, > > type varchar, > > bucket varchar, > > owner varchar, > > workspace varchar, > > public_read boolean PRIMARY KEY ((key, tenantID), type, workspace)); > > > > > > To retrieve all contents that belong to a bucket, I have created an index on > the bucket column. > > > > CREATE INDEX IF NOT EXISTS bucket_to_contents ON contents (bucket); > > > > The column value “bucket” is concatenated with the tenantId (bucket = > bucketname+tenantID) in order to avoid filtering on the tenantID on my > application. > > > > Is it the rights way to do or should I create another column family to link > each content to the bucket ? > > > > CREATE TABLE IF NOT EXISTS bucket_to_contents (tenantID varchar, > > key varchar, > > type varchar, > > bucket varchar, > > owner varchar, > > workspace varchar, > > public_read boolean PRIMARY KEY ((bucket, tenantID), key)); > > > > Under the hood what is the difference of the both solutions? > > > > According to my understanding, the result will be the same. Both will have > the rowkey equals to the “bucketname” and the “tenantID”. > > Excepted that the secondary index can have a replication delay… > > > > Can you help me on this point? > > > > Regards, > > Eric > > > > > > Ce message et les pièces jointes sont confidentiels et réservés à l'usage > exclusif de ses destinataires. Il peut également être protégé par le secret > professionnel. Si vous recevez ce message par erreur, merci d'en avertir > immédiatement l'expéditeur et de le détruire. L'intégrité du message ne > pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra > être recherchée quant au contenu de ce message. Bien que les meilleurs > efforts soient faits pour maintenir cette transmission exempte de tout virus, > l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne > saurait être recherchée pour tout dommage résultant d'un virus transmis. > > This e-mail and the documents attached are confidential and intended solely > for the addressee; it may also be privileged. If you receive this e-mail in > error, please notify the sender immediately and destroy it. As its integrity > cannot be secured on the Internet, the Worldline liability cannot be > triggered for the message content. Although the sender endeavours to maintain > a computer virus-free network, the sender does not warrant that this > transmission is virus-free and will not be liable for any damages resulting > from any virus transmitted. >