Hi all,

I'm modelling a schema for storing and retrieving threaded messages, where, for 
planning purposes:

        - there are many millions of users.
        - a user might have up to 1000 threads.
        - each thread might have up to 50000 messages (with some threads being 
sparse with only a few messages).
        - the Stargate REST interface is used. 

I want to be able to execute the following queries:

        - retrieve x latest active threads, with the latest message.
        - retrieve x latest active threads, with the latest message, offset by 
y.
        - retrieve x latest messages from a thread.
        - retrieve x latest messages from a thread, offset by y.

I've come up with a few possible methods for modelling this. But any insights 
would be greatly appreciated.

Thanks in advance,

Mark


Possible solution 1:

TABLE:          threads
KEY:            userID : threadID
COLUMN:         latest_message

TABLE:          messages
KEY:            userID : threadID : timestamp
COLUMN:         message

Messages are first written to the messages table, and then the threads table's 
thread is updated with the latest message.

To fetch the latest x active threads, with the latest message:
        
        - I retrieve all threads and then sort and reduce the results on the 
client.

A concern with this is the fetching of all threads to sort on each request. 
This could be unwieldy!


Possible solution 2:

TABLE:          threads
KEY:            userID : timestamp : threadID
COLUMN:         latest_message

TABLE:          messages
KEY:            userID : threadID : timestamp
COLUMN:         message

Messages are first written to the messages table, and then the threads table's 
is updated with the latest message. The previous latest message is then deleted 
from the threads table.

To fetch the latest x active threads, with the latest message:
        
        - I scan the threads table until I get x unique threads.

A concern with this could be the issue of keeping the threads table in sync 
with the messages table - especially with the deletion of old latest messages.


Possible solution 3:

TABLE:          messages
KEY:            userID : timestamp : threadID
COLUMN:         message

To fetch the latest x active threads, with the latest message:
        
        - I scan the messages table until I get x unique threads.

One of my concerns with this method is that some threads will be busier than 
others, forcing a scan through nearly all of a user's messages. And there will 
be an ever increasing number of messages. A periodic archiving process - moving 
older messages to another table - might alleviate things here.


Possible solution 4:

Use SOLR/Elastic search or equivalent.




Reply via email to