Re: Schema question : Query to support "Find which all of these 500 email ids have been registered"

Roshni Rajagopal Thu, 26 Jul 2012 22:59:49 -0700

In general I believe wide rows (many cols ) are preferable to skinny rows
(many rows) so that you can get all the information in 1 go,
One can store 2 billion cols in a row.

However, on what basis would you store the 500 email ids in 1 row? What
can be the row key?
For e.g. If the query you want to answer with this column family is 'how
many email addresses are registered in this application?', then
application id can be a row key, and 500 email ids can be stored as
columns. Each other applications would be another row . Since you want to
search by application this may be the best approach.

If your information doesn't fit neatly into the model above, you can go
for 
An email id as a row key, and list of applications as columns.

Reading 500 rows does not seem a big  task - I doubt it would be a
performance issue given cassandra's powers.

On 27/07/12 11:12 AM, "Aklin_81" <asdk...@gmail.com> wrote:

>I need to find out what all email ids among a list of 500 ids passed
>in a single query, have been registered on my app. (Total registered
>email ids may be in millions). What is the best way to store this kind
>of data?
>
>Should I store each email id in a separate row ? But then I would have
>to read 500 rows at a single time ! Or if I use single row or less no
>of rows then they would get too heavy.
>
>Btw Would it be really bad if I read 500 rows at a single time,
>they'll be just 1 column rows & never modified once written columns.

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***

Re: Schema question : Query to support "Find which all of these 500 email ids have been registered"

Reply via email to