In brief: Take all of your search terms and put them into a HashTable that maps from keyword to subscriber. Tokenize each tweet's text field and apply each token to the HashTable, sending the Tweet on to all subscribers. Each subscriber can do a generational deduplication to avoid getting each tweet twice -- by storing the status id in the subscriber object.
If each subscriber keeps a copy of their search terms, you can even do subscriber removal from the HashTable when the subscriber stops their query. You can tokenize multi-threaded, but do the hash table apply and hash table set operations in a single thread. This is plenty of concurrency and leads to a simple programming model -- and the easy generational deduplication scheme above. -John Kalucki http://twitter.com/jkalucki Infrastructure, Twitter Inc. On Mon, Apr 19, 2010 at 11:28 AM, Jeffrey Greenberg <jeffreygreenb...@gmail.com> wrote: > I was unable to attend Chirp in person, so I could not hear John > Kalucki's comments on this... Anyone have any notes on this... John? > > j > > On Apr 16, 3:36 pm, Jeffrey Greenberg <jeffreygreenb...@gmail.com> > wrote: >> So I'm looking at the streaming api (track), and I've got thousands of >> searches. (http://tweettronics.com) I mainly need it to deal with >> terms that are very high volume, and to deal search api rate limiting. >> >> The main difficulty I'm thinking about is the best way to de-multiplex >> the stream back into the individual searches I'm trying to accomplish. >> >> 1. How do you handle if the searches are more complex than single >> terms, but a boolean expression... Do you convert the boolean into >> something like regex, and then run that regex on every tweet... So if >> I have several thousand regexs and thousands of tweets, that's a huge >> amount of processing just todemultiplex... But is that the way to go? >> 2 And if the search is just a simple expression, do folks >> simplydemultiplexby doing a string search for each word in the search for >> every received tweet... like above? >> >> I'm looking for recommended ways todemultiplexthe search stream... >> >> Thanks, >> jeffrey greenberg >> >> -- >> Subscription >> settings:http://groups.google.com/group/twitter-development-talk/subscribe?hl=en >