I know it's not Web 2.0-cool, but I'm writing to SQL Server 2008 (Standard, x64) and using fulltext indexing/searching from there. On production hardware, I hardly see any real impact as far as latency goes, even on busy predicates. I can't imagine that the lighter-weight/more efficient Lucene would have a significantly perceivable impact.
∞ Andy Badera ∞ +1 518-641-1280 Google Voice ∞ This email is: [ ] bloggable [x] ask first [ ] private ∞ Google me: http://www.google.com/search?q=andrew%20badera On Fri, Apr 16, 2010 at 6:59 PM, Mark McBride <mmcbr...@twitter.com> wrote: > One idea off the top of my head: write tweets to something like Lucene, and > then rely on its more sophisticated query engine to pull tweets. You'll > sacrifice some latency here of course. > ---Mark > > http://twitter.com/mccv > > > On Fri, Apr 16, 2010 at 3:47 PM, Jeffrey Greenberg > <jeffreygreenb...@gmail.com> wrote: >> >> So I'm looking at the streaming api (track), and I've got thousands of >> searches. ( http://tweettronics.com ) I mainly need it to deal with >> terms that are very high volume, and to deal search api rate limiting. >> >> The main difficulty I'm thinking about is the best way to de-multiplex >> the stream back into the individual searches I'm trying to accomplish. >> >> 1. How do you handle if the searches are more complex than single >> terms, but a boolean expression... Do you convert the boolean into >> something like regex, and then run that regex on every tweet... So if >> I have several thousand regexs and thousands of tweets, that's a huge >> amount of processing just to demultiplex... But is that the way to go? >> 2 And if the search is just a simple expression, do folks simply >> demultiplex by doing a string search for each word in the search for >> every received tweet... like above? >> >> I'm looking for recommended ways to demultiplex the search stream... >> >> Thanks, >> jeffrey greenberg >> >> >> -- >> Subscription settings: >> http://groups.google.com/group/twitter-development-talk/subscribe?hl=en > >