Have you considered using Cassandra? Kevin
On Fri, Apr 1, 2011 at 11:01 PM, M. C. Srivas <[email protected]> wrote: > Ted, this is a pretty clever idea. > > On Thu, Mar 31, 2011 at 9:27 PM, Ted Dunning <[email protected]> > wrote: > > > Solr/Elastic search is a fine solution, but probably won't be quite as > fast > > as a well-tuned hbase solution. > > > > One key assumption you seem to be making is that you will store messages > > only once. If you are willing to make multiple updates to tables, then > you > > can arrange the natural ordering of the table to get what you want. For > > instance, you could keep the most recent messages (say the last 10 from > > each > > of the 1000 most recently updated threads) in an in memory table. Then > you > > could store messages in a thread table indexed by thread:timestamp. > > Finally > > you could store messages in a table indexed by user:thread or > > user:timestamp. This would allow you to display the most recent messages > > or > > thread in near zero time, to display all or the most recent messages from > a > > particular thread with only one retrieval and all of the messages from a > > particular user in time order in one retrieval. > > > > On Thu, Mar 31, 2011 at 5:56 PM, Mark Jarecki <[email protected] > > >wrote: > > > > > Hi all, > > > > > > I'm modelling a schema for storing and retrieving threaded messages, > > where, > > > for planning purposes: > > > > > > - there are many millions of users. > > > - a user might have up to 1000 threads. > > > - each thread might have up to 50000 messages (with some threads > > > being sparse with only a few messages). > > > - the Stargate REST interface is used. > > > > > > I want to be able to execute the following queries: > > > > > > - retrieve x latest active threads, with the latest message. > > > - retrieve x latest active threads, with the latest message, > > offset > > > by y. > > > - retrieve x latest messages from a thread. > > > - retrieve x latest messages from a thread, offset by y. > > > > > > I've come up with a few possible methods for modelling this. But any > > > insights would be greatly appreciated. > > > > > > Thanks in advance, > > > > > > Mark > > > > > > > > > Possible solution 1: > > > > > > TABLE: threads > > > KEY: userID : threadID > > > COLUMN: latest_message > > > > > > TABLE: messages > > > KEY: userID : threadID : timestamp > > > COLUMN: message > > > > > > Messages are first written to the messages table, and then the threads > > > table's thread is updated with the latest message. > > > > > > To fetch the latest x active threads, with the latest message: > > > > > > - I retrieve all threads and then sort and reduce the results on > > the > > > client. > > > > > > A concern with this is the fetching of all threads to sort on each > > request. > > > This could be unwieldy! > > > > > > > > > Possible solution 2: > > > > > > TABLE: threads > > > KEY: userID : timestamp : threadID > > > COLUMN: latest_message > > > > > > TABLE: messages > > > KEY: userID : threadID : timestamp > > > COLUMN: message > > > > > > Messages are first written to the messages table, and then the threads > > > table's is updated with the latest message. The previous latest message > > is > > > then deleted from the threads table. > > > > > > To fetch the latest x active threads, with the latest message: > > > > > > - I scan the threads table until I get x unique threads. > > > > > > A concern with this could be the issue of keeping the threads table in > > sync > > > with the messages table - especially with the deletion of old latest > > > messages. > > > > > > > > > Possible solution 3: > > > > > > TABLE: messages > > > KEY: userID : timestamp : threadID > > > COLUMN: message > > > > > > To fetch the latest x active threads, with the latest message: > > > > > > - I scan the messages table until I get x unique threads. > > > > > > One of my concerns with this method is that some threads will be busier > > > than others, forcing a scan through nearly all of a user's messages. > And > > > there will be an ever increasing number of messages. A periodic > archiving > > > process - moving older messages to another table - might alleviate > things > > > here. > > > > > > > > > Possible solution 4: > > > > > > Use SOLR/Elastic search or equivalent. > > > > > > > > > > > > > > > > > >
