Re: Using Cassandra for my usecase

Erick Ramirez Sun, 11 Jun 2017 22:15:07 -0700

>
> *Given my use case is cassandra the best suited one or is there any other
> database which suits my requirement better?*

Probably not the right forum for that question. It's like walking into a
Ford dealership and asking if the Mustang is the best car for you. 😄

In any case, you would choose Cassandra because you require:
- high availability
- very fast reads
- no single-point-of-failure
- no downtime
- you have a scale problem
- etc

*What would be best way to implement multi-tenancy?*

The "best" way is what works for your use case based on testing you've
done. As you already are aware in the example you provided, adding a column
as the tenant indicator could lead to large partitions so you need to be
careful about how you model your data.

Some implementations completely side-step this by distributing tenants
across keyspaces but that may not suit your needs.

*Given that I need to query by multiple dimensions would denormalized
> tables work better or should I be using materialized views?*

With denormalised tables, your application needs to implement the logic for
batching the updates together.

With materialised views, that complexity is managed for you by C* but you
need to be aware of the performance impact associated with it. For example
with RF=3 on the base table, MV adds another RF=3 for an additional table
so RF=3+3. A second MV increases RF=3+3+3 and so on.

*Anything else that I need to consider based on your experiences with
> cassandra?*

Multi-tenancy can be difficult particularly for complex use cases. Test,
test and test. And make sure you always correctly size your cluster with
enough nodes.

You need to limit the number of tables to about 200 at the most (regardless
of the number of keyspaces). Having too many tables puts pressure on the
heap of each node.

Good luck!

On Sun, Jun 11, 2017 at 2:07 AM, Govindarajan Srinivasaraghavan <
govindragh...@gmail.com> wrote:

> Hi All,
>
> Just to give a background I'm working on a project where I need to store
> fast incoming time series data and have rest api's to query and serve the
> data to users when needed. The data as such is a single JSON which is 1kb
> in size and the data has to be purged after a specific time period (say few
> weeks or months). The incoming rate would be approximately 100k messages
> per second and the biggest challenge is the data should be query-able by
> multiple dimensions with sorting, paging and data dump options.
>
> I started looking into database options and felt like cassandra might be a
> good choice for my use case since the requirement needs faster writes. In
> order to query by multiple dimensions I had to insert the same record into
> multiple denormalized tables (around 8 tables). Now I need to implement
> multitenancy and having an extra column in the partition key to query by
> tenant will not work since there will be some tenants with huge amounts of
> data compared to the rest. My other option is to have the tenant identifier
> appended to the table names so that I can perform per teannt queries
> easily.
>
> Here are my questions for which I need some help.
> - Given my use case is cassandra the best suited one or is there any other
> database which suits my requirement better?
> - What would be best way to implement multi-tenancy?
> - Given that I need to query by multiple dimensions would denormalized
> tables work better or should I be using materialized views?
> - Anything else that I need to consider based on your experiences with
> cassandra?
>
> Thanks
>

Re: Using Cassandra for my usecase

Reply via email to