In general I believe wide rows (many cols ) are preferable to skinny rows
(many rows) so that you can get all the information in 1 go,
One can store 2 billion cols in a row.
However, on what basis would you store the 500 email ids in 1 row? What
can be the row key?
For e.g. If the query you want to answer with this column family is 'how
many email addresses are registered in this application?', then
application id can be a row key, and 500 email ids can be stored as
columns. Each other applications would be another row . Since you want to
search by application this may be the best approach.
If your information doesn't fit neatly into the model above, you can go
for
An email id as a row key, and list of applications as columns.
Reading 500 rows does not seem a big task - I doubt it would be a
performance issue given cassandra's powers.
On 27/07/12 11:12 AM, Aklin_81 asdk...@gmail.com wrote:
I need to find out what all email ids among a list of 500 ids passed
in a single query, have been registered on my app. (Total registered
email ids may be in millions). What is the best way to store this kind
of data?
Should I store each email id in a separate row ? But then I would have
to read 500 rows at a single time ! Or if I use single row or less no
of rows then they would get too heavy.
Btw Would it be really bad if I read 500 rows at a single time,
they'll be just 1 column rows never modified once written columns.
This email and any files transmitted with it are confidential and intended
solely for the individual or entity to whom they are addressed. If you have
received this email in error destroy it immediately. *** Walmart Confidential
***