Re: Schema question : Query to support Find which all of these 500 email ids have been registered

2012-07-27 Thread Aklin_81
Sorry for the confusion created. I need to store emails registered
just for a single application. So although my data model would fit
into just a single row. But is storing a hundred million  columns(col
name size= 8 byte; col value size=4 byte ) in a single row a good idea
? I am very much tempted to store it in single row but I also heard it
is recommended to keep a row size within 10s of MBs for optimal
performance.


Re: Schema question : Query to support Find which all of these 500 email ids have been registered

2012-07-27 Thread Aklin_81
What about if I spread these columns across 20 rows ? Then I have to
query each of these 20 rows for 500 columns. but still this seems a
better solution than one row for all cols or separate row for each
email id approaches !?

On Fri, Jul 27, 2012 at 11:36 AM, Aklin_81 asdk...@gmail.com wrote:
 Sorry for the confusion created. I need to store emails registered
 just for a single application. So although my data model would fit
 into just a single row. But is storing a hundred million  columns(col
 name size= 8 byte; col value size=4 byte ) in a single row a good idea
 ? I am very much tempted to store it in single row but I also heard it
 is recommended to keep a row size within 10s of MBs for optimal
 performance.


Re: Schema question : Query to support Find which all of these 500 email ids have been registered

2012-07-26 Thread Roshni Rajagopal
In general I believe wide rows (many cols ) are preferable to skinny rows
(many rows) so that you can get all the information in 1 go,
One can store 2 billion cols in a row.

However, on what basis would you store the 500 email ids in 1 row? What
can be the row key?
For e.g. If the query you want to answer with this column family is 'how
many email addresses are registered in this application?', then
application id can be a row key, and 500 email ids can be stored as
columns. Each other applications would be another row . Since you want to
search by application this may be the best approach.

If your information doesn't fit neatly into the model above, you can go
for 
An email id as a row key, and list of applications as columns.



Reading 500 rows does not seem a big  task - I doubt it would be a
performance issue given cassandra's powers.

On 27/07/12 11:12 AM, Aklin_81 asdk...@gmail.com wrote:

I need to find out what all email ids among a list of 500 ids passed
in a single query, have been registered on my app. (Total registered
email ids may be in millions). What is the best way to store this kind
of data?

Should I store each email id in a separate row ? But then I would have
to read 500 rows at a single time ! Or if I use single row or less no
of rows then they would get too heavy.

Btw Would it be really bad if I read 500 rows at a single time,
they'll be just 1 column rows  never modified once written columns.

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***