Re: Secondary index or dedicated CF?

Mark Reddy Fri, 22 Aug 2014 08:08:50 -0700

Hi,

As a general rule of thumb I would steer clear of secondary indexes, this is 
also the official stand that DataStax take (see p5 of their best practices doc: 
http://www.datastax.com/wp-content/uploads/2014/04/WP-DataStax-Enterprise-Best-Practices.pdf).


“It is best to avoid using Cassandra's built-in secondary indexes where 
possible. Instead, it is recommended to denormalize data and manually maintain 
a dynamic table as a form of an index instead of using a secondary index. If 
and when secondary indexes are to be used, they should be created only on 
columns containing low-cardinality data (for example: fields with less than 
1000 states).“

Mark

On 22 Aug 2014, at 15:58, DuyHai Doan <doanduy...@gmail.com> wrote:

> Hello Eric
> 
> "Under the hood what is the difference of the both solutions?" 
> 
>  1. Cassandra secondary index: distributed index, supports better high volume 
> of data, the index itself is distributed so there is no bottleneck. The 
> tradeoff is that depending on the cardinality of data having the same 
> "bucketname+tenantID" the performance may drop sharply. Please read this: 
> http://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_when_use_index_c.html?scroll=concept_ds_sgh_yzz_zj__when-no-index.
>  There are several restrictions to secondary index
> 
> 2. Manual index: easy to design, but potentially wide row and not well 
> balance if  data having the same "bucketname+tenantID" is very large. 
> Furthermore you need to manage index consistency manually so that it is 
> synced with source data updates. 
> 
>  The best thing to do is to benchmark both solutions and takes the approach 
> giving you the best results. Be careful with benchmarks, it should be 
> representative of the data pattern you likely have in production.
> 
> 
> On Fri, Aug 22, 2014 at 7:47 AM, Leleu Eric <eric.le...@worldline.com> wrote:
> Hi,
> 
>  
> 
>  
> 
> I’m new with Cassandra and I wondering what is the best design for my case.
> 
>  
> 
> I have a set of buckets that contain one or thousands of contents.
> 
>  
> 
> Here is my Content CF :
> 
>  
> 
>     CREATE TABLE IF NOT EXISTS contents (tenantID varchar,
> 
>      key varchar,
> 
>     type varchar,
> 
>     bucket varchar,
> 
>     owner varchar,
> 
>     workspace varchar,
> 
>      public_read boolean  PRIMARY KEY ((key, tenantID), type, workspace));
> 
>  
> 
>  
> 
> To retrieve all contents that belong to a bucket, I have created an index on 
> the bucket column.
> 
>  
> 
>     CREATE INDEX IF NOT EXISTS bucket_to_contents ON contents (bucket);
> 
>  
> 
> The column value “bucket” is concatenated with the tenantId (bucket = 
> bucketname+tenantID) in order to avoid filtering on the tenantID on my 
> application.
> 
>  
> 
> Is it the rights way to do or should I create another column family to link 
> each content to the bucket ?
> 
>  
> 
>     CREATE TABLE IF NOT EXISTS bucket_to_contents (tenantID varchar,
> 
>      key varchar,
> 
>     type varchar,
> 
>     bucket varchar,
> 
>     owner varchar,
> 
>     workspace varchar,
> 
>      public_read boolean  PRIMARY KEY ((bucket, tenantID), key));
> 
>  
> 
> Under the hood what is the difference of the both solutions?
> 
>  
> 
> According to my understanding, the result will be the same. Both will have 
> the rowkey equals to the “bucketname”  and the “tenantID”.
> 
> Excepted that the secondary index can have a replication delay…
> 
>  
> 
> Can you help me on this point?
> 
>  
> 
> Regards,
> 
> Eric
> 
>  
> 
> 
> 
> Ce message et les pièces jointes sont confidentiels et réservés à l'usage 
> exclusif de ses destinataires. Il peut également être protégé par le secret 
> professionnel. Si vous recevez ce message par erreur, merci d'en avertir 
> immédiatement l'expéditeur et de le détruire. L'intégrité du message ne 
> pouvant être assurée sur Internet, la responsabilité de Worldline ne pourra 
> être recherchée quant au contenu de ce message. Bien que les meilleurs 
> efforts soient faits pour maintenir cette transmission exempte de tout virus, 
> l'expéditeur ne donne aucune garantie à cet égard et sa responsabilité ne 
> saurait être recherchée pour tout dommage résultant d'un virus transmis.
> 
> This e-mail and the documents attached are confidential and intended solely 
> for the addressee; it may also be privileged. If you receive this e-mail in 
> error, please notify the sender immediately and destroy it. As its integrity 
> cannot be secured on the Internet, the Worldline liability cannot be 
> triggered for the message content. Although the sender endeavours to maintain 
> a computer virus-free network, the sender does not warrant that this 
> transmission is virus-free and will not be liable for any damages resulting 
> from any virus transmitted.
>

Re: Secondary index or dedicated CF?

Reply via email to