RE: Data Model Design for Login Servie

Dan Hendry Fri, 18 Nov 2011 07:14:51 -0800

I they are not limited to repeating values but the Datastax docs[1] on 
secondary indexes certainly seem to indicate they would be a poor fit for this 
case (high read load, many unique values).


 

[1] http://www.datastax.com/docs/1.0/ddl/indexes

 

Dan

 

From: Maciej Miklas [mailto:[email protected]] 
Sent: November-18-11 1:39
To: [email protected]
Subject: Re: Data Model Design for Login Servie

 

but secondary index is limited only to repeating values like enums. In my case 
I would have performance issue. right?


On 18.11.2011, at 02:08, Maxim Potekhin <[email protected]> wrote:

1122: {
          gender: MALE
          birthdate: 1987.11.09
          name: Alfred Tester
          pwd: e72c504dc16c8fcd2fe8c74bb492affa
          alias1: [email protected]
          alias2: [email protected]
          alias3: [email protected]
         }

...and you can use secondary indexes to query on anything.

Maxim


On 11/17/2011 4:08 PM, Maciej Miklas wrote: 

Hallo all,

I need your help to design structure for simple login service. It contains 
about 100.000.000 customers and each one can have about 10 different logins - 
this results 1.000.000.000 different logins.
    
Each customer contains following data:
- one to many login names as string, max 20 UTF-8 characters long
- ID as long - one customer has only one ID
- gender
- birth date
- name
- password as MD5

Login process needs to find user by login name.
Data in Cassandra is replicated - this is necessary to obtain all required 
login data in single call. Also usually we expect low write traffic and heavy 
read traffic - round trips for reading data should be avoided.
Below I've described two possible cassandra data models based on example: we 
have two users, first user has two logins and second user has three logins
   
A) Skinny rows
 - row key contains login name - this is the main search criteria
 - login data is replicated - each possible login is stored as single row which 
contains all user data - 10 logins for single customer create 10 rows, where 
each row has different key and the same content

    // first 3 rows has different key and the same replicated data
        [email protected] {
          id: 1122
          gender: MALE
          birthdate: 1987.11.09
          name: Alfred Tester
          pwd: e72c504dc16c8fcd2fe8c74bb492affa  
        },
        [email protected] {
          id: 1122
          gender: MALE
          birthdate: 1987.11.09
          name: Alfred Tester
          pwd: e72c504dc16c8fcd2fe8c74bb492affa  
        },
        [email protected] {
          id: 1122
          gender: MALE
          birthdate: 1987.11.09
          name: Alfred Tester
          pwd: e72c504dc16c8fcd2fe8c74bb492affa  
        },
    
    // two following rows has again the same data for second customer
        [email protected] {
          id: 1133
          gender: MALE
          birthdate: 1997.02.01
          name: Manfredus Maximus
          pwd: e44c504ff16c8fcd2fe8c74bb492adda  
        },
        [email protected] {
          id: 1133
          gender: MALE
          birthdate: 1997.02.01
          name: Manfredus Maximus
          pwd: e44c504ff16c8fcd2fe8c74bb492adda  
        }
    
B) Rows grouped by alphabetical prefix
- Number of rows is limited - for example first letter from login name
- Each row contains all logins which benign with row key - row with key 'a' 
contains all logins which begin with 'a'
- Data might be unbalanced, but we avoid skinny rows - this might have positive 
performance impact (??)
- to avoid super columns each row contains directly columns, where column name 
is the user login and column value is corresponding data in kind of serialized 
form (I would like to have is human readable)

    a {
        [email protected]:"1122;MALE;1987.11.09;
                                 Alfred 
Tester;e72c504dc16c8fcd2fe8c74bb492affa",
        
        [email protected]@xyz.de:"1122;MALE;1987.11.09;
                                 Alfred 
Tester;e72c504dc16c8fcd2fe8c74bb492affa",
            
        [email protected]@xyz.de:"1122;MALE;1987.11.09;
                                 Alfred Tester;e72c504dc16c8fcd2fe8c74bb492affa"
      },
            
    m {
        [email protected]:"1133;MALE;1997.02.01;
                  Manfredus Maximus;e44c504ff16c8fcd2fe8c74bb492adda"    
      },
            
    r {
        [email protected]:"1133;MALE;1997.02.01;
                  Manfredus Maximus;e44c504ff16c8fcd2fe8c74bb492adda"    
            
      }

Which solution is better, especially for better read performance? Do you have 
better idea?

Thanks,
Maciej

 

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.920 / Virus Database: 271.1.1/4022 - Release Date: 11/17/11 
02:34:00

RE: Data Model Design for Login Servie

Reply via email to