Hi all,
I'm modelling a schema for storing and retrieving threaded messages, where, for
planning purposes:
- there are many millions of users.
- a user might have up to 1000 threads.
- each thread might have up to 50000 messages (with some threads being
sparse with only a few messages).
- the Stargate REST interface is used.
I want to be able to execute the following queries:
- retrieve x latest active threads, with the latest message.
- retrieve x latest active threads, with the latest message, offset by
y.
- retrieve x latest messages from a thread.
- retrieve x latest messages from a thread, offset by y.
I've come up with a few possible methods for modelling this. But any insights
would be greatly appreciated.
Thanks in advance,
Mark
Possible solution 1:
TABLE: threads
KEY: userID : threadID
COLUMN: latest_message
TABLE: messages
KEY: userID : threadID : timestamp
COLUMN: message
Messages are first written to the messages table, and then the threads table's
thread is updated with the latest message.
To fetch the latest x active threads, with the latest message:
- I retrieve all threads and then sort and reduce the results on the
client.
A concern with this is the fetching of all threads to sort on each request.
This could be unwieldy!
Possible solution 2:
TABLE: threads
KEY: userID : timestamp : threadID
COLUMN: latest_message
TABLE: messages
KEY: userID : threadID : timestamp
COLUMN: message
Messages are first written to the messages table, and then the threads table's
is updated with the latest message. The previous latest message is then deleted
from the threads table.
To fetch the latest x active threads, with the latest message:
- I scan the threads table until I get x unique threads.
A concern with this could be the issue of keeping the threads table in sync
with the messages table - especially with the deletion of old latest messages.
Possible solution 3:
TABLE: messages
KEY: userID : timestamp : threadID
COLUMN: message
To fetch the latest x active threads, with the latest message:
- I scan the messages table until I get x unique threads.
One of my concerns with this method is that some threads will be busier than
others, forcing a scan through nearly all of a user's messages. And there will
be an ever increasing number of messages. A periodic archiving process - moving
older messages to another table - might alleviate things here.
Possible solution 4:
Use SOLR/Elastic search or equivalent.