Not original with me, I have to admit. Some of the ideas are best described in the OpenTSDB descriptions.
On Fri, Apr 1, 2011 at 8:01 PM, M. C. Srivas <[email protected]> wrote: > Ted, this is a pretty clever idea. > > > On Thu, Mar 31, 2011 at 9:27 PM, Ted Dunning <[email protected]>wrote: > >> Solr/Elastic search is a fine solution, but probably won't be quite as >> fast >> as a well-tuned hbase solution. >> >> One key assumption you seem to be making is that you will store messages >> only once. If you are willing to make multiple updates to tables, then >> you >> can arrange the natural ordering of the table to get what you want. For >> instance, you could keep the most recent messages (say the last 10 from >> each >> of the 1000 most recently updated threads) in an in memory table. Then >> you >> could store messages in a thread table indexed by thread:timestamp. >> Finally >> you could store messages in a table indexed by user:thread or >> user:timestamp. This would allow you to display the most recent messages >> or >> thread in near zero time, to display all or the most recent messages from >> a >> particular thread with only one retrieval and all of the messages from a >> particular user in time order in one retrieval. >> >> On Thu, Mar 31, 2011 at 5:56 PM, Mark Jarecki <[email protected] >> >wrote: >> >> > Hi all, >> > >> > I'm modelling a schema for storing and retrieving threaded messages, >> where, >> > for planning purposes: >> > >> > - there are many millions of users. >> > - a user might have up to 1000 threads. >> > - each thread might have up to 50000 messages (with some threads >> > being sparse with only a few messages). >> > - the Stargate REST interface is used. >> > >> > I want to be able to execute the following queries: >> > >> > - retrieve x latest active threads, with the latest message. >> > - retrieve x latest active threads, with the latest message, >> offset >> > by y. >> > - retrieve x latest messages from a thread. >> > - retrieve x latest messages from a thread, offset by y. >> > >> > I've come up with a few possible methods for modelling this. But any >> > insights would be greatly appreciated. >> > >> > Thanks in advance, >> > >> > Mark >> > >> > >> > Possible solution 1: >> > >> > TABLE: threads >> > KEY: userID : threadID >> > COLUMN: latest_message >> > >> > TABLE: messages >> > KEY: userID : threadID : timestamp >> > COLUMN: message >> > >> > Messages are first written to the messages table, and then the threads >> > table's thread is updated with the latest message. >> > >> > To fetch the latest x active threads, with the latest message: >> > >> > - I retrieve all threads and then sort and reduce the results on >> the >> > client. >> > >> > A concern with this is the fetching of all threads to sort on each >> request. >> > This could be unwieldy! >> > >> > >> > Possible solution 2: >> > >> > TABLE: threads >> > KEY: userID : timestamp : threadID >> > COLUMN: latest_message >> > >> > TABLE: messages >> > KEY: userID : threadID : timestamp >> > COLUMN: message >> > >> > Messages are first written to the messages table, and then the threads >> > table's is updated with the latest message. The previous latest message >> is >> > then deleted from the threads table. >> > >> > To fetch the latest x active threads, with the latest message: >> > >> > - I scan the threads table until I get x unique threads. >> > >> > A concern with this could be the issue of keeping the threads table in >> sync >> > with the messages table - especially with the deletion of old latest >> > messages. >> > >> > >> > Possible solution 3: >> > >> > TABLE: messages >> > KEY: userID : timestamp : threadID >> > COLUMN: message >> > >> > To fetch the latest x active threads, with the latest message: >> > >> > - I scan the messages table until I get x unique threads. >> > >> > One of my concerns with this method is that some threads will be busier >> > than others, forcing a scan through nearly all of a user's messages. And >> > there will be an ever increasing number of messages. A periodic >> archiving >> > process - moving older messages to another table - might alleviate >> things >> > here. >> > >> > >> > Possible solution 4: >> > >> > Use SOLR/Elastic search or equivalent. >> > >> > >> > >> > >> > >> > >
