[twitter-dev] Recommended ways to demultiplex the search stream with thousands of searches
So I'm looking at the streaming api (track), and I've got thousands of searches. ( http://tweettronics.com ) I mainly need it to deal with terms that are very high volume, and to deal search api rate limiting. The main difficulty I'm thinking about is the best way to de-multiplex the stream back into the individual searches I'm trying to accomplish. 1. How do you handle if the searches are more complex than single terms, but a boolean expression... Do you convert the boolean into something like regex, and then run that regex on every tweet... So if I have several thousand regexs and thousands of tweets, that's a huge amount of processing just to demultiplex... But is that the way to go? 2 And if the search is just a simple expression, do folks simply demultiplex by doing a string search for each word in the search for every received tweet... like above? I'm looking for recommended ways to demultiplex the search stream... Thanks, jeffrey greenberg -- Subscription settings: http://groups.google.com/group/twitter-development-talk/subscribe?hl=en
[twitter-dev] Recommended ways to demultiplex the search stream with thousands of searches
So I'm looking at the streaming api (track), and I've got thousands of searches. ( http://tweettronics.com ) I mainly need it to deal with terms that are very high volume, and to deal search api rate limiting. The main difficulty I'm thinking about is the best way to de-multiplex the stream back into the individual searches I'm trying to accomplish. 1. How do you handle if the searches are more complex than single terms, but a boolean expression... Do you convert the boolean into something like regex, and then run that regex on every tweet... So if I have several thousand regexs and thousands of tweets, that's a huge amount of processing just to demultiplex... But is that the way to go? 2 And if the search is just a simple expression, do folks simply demultiplex by doing a string search for each word in the search for every received tweet... like above? I'm looking for recommended ways to demultiplex the search stream... Thanks, jeffrey greenberg -- Subscription settings: http://groups.google.com/group/twitter-development-talk/subscribe?hl=en
Re: [twitter-dev] Recommended ways to demultiplex the search stream with thousands of searches
One idea off the top of my head: write tweets to something like Lucene, and then rely on its more sophisticated query engine to pull tweets. You'll sacrifice some latency here of course. ---Mark http://twitter.com/mccv On Fri, Apr 16, 2010 at 3:47 PM, Jeffrey Greenberg jeffreygreenb...@gmail.com wrote: So I'm looking at the streaming api (track), and I've got thousands of searches. ( http://tweettronics.com ) I mainly need it to deal with terms that are very high volume, and to deal search api rate limiting. The main difficulty I'm thinking about is the best way to de-multiplex the stream back into the individual searches I'm trying to accomplish. 1. How do you handle if the searches are more complex than single terms, but a boolean expression... Do you convert the boolean into something like regex, and then run that regex on every tweet... So if I have several thousand regexs and thousands of tweets, that's a huge amount of processing just to demultiplex... But is that the way to go? 2 And if the search is just a simple expression, do folks simply demultiplex by doing a string search for each word in the search for every received tweet... like above? I'm looking for recommended ways to demultiplex the search stream... Thanks, jeffrey greenberg -- Subscription settings: http://groups.google.com/group/twitter-development-talk/subscribe?hl=en
Re: [twitter-dev] Recommended ways to demultiplex the search stream with thousands of searches
I know it's not Web 2.0-cool, but I'm writing to SQL Server 2008 (Standard, x64) and using fulltext indexing/searching from there. On production hardware, I hardly see any real impact as far as latency goes, even on busy predicates. I can't imagine that the lighter-weight/more efficient Lucene would have a significantly perceivable impact. ∞ Andy Badera ∞ +1 518-641-1280 Google Voice ∞ This email is: [ ] bloggable [x] ask first [ ] private ∞ Google me: http://www.google.com/search?q=andrew%20badera On Fri, Apr 16, 2010 at 6:59 PM, Mark McBride mmcbr...@twitter.com wrote: One idea off the top of my head: write tweets to something like Lucene, and then rely on its more sophisticated query engine to pull tweets. You'll sacrifice some latency here of course. ---Mark http://twitter.com/mccv On Fri, Apr 16, 2010 at 3:47 PM, Jeffrey Greenberg jeffreygreenb...@gmail.com wrote: So I'm looking at the streaming api (track), and I've got thousands of searches. ( http://tweettronics.com ) I mainly need it to deal with terms that are very high volume, and to deal search api rate limiting. The main difficulty I'm thinking about is the best way to de-multiplex the stream back into the individual searches I'm trying to accomplish. 1. How do you handle if the searches are more complex than single terms, but a boolean expression... Do you convert the boolean into something like regex, and then run that regex on every tweet... So if I have several thousand regexs and thousands of tweets, that's a huge amount of processing just to demultiplex... But is that the way to go? 2 And if the search is just a simple expression, do folks simply demultiplex by doing a string search for each word in the search for every received tweet... like above? I'm looking for recommended ways to demultiplex the search stream... Thanks, jeffrey greenberg -- Subscription settings: http://groups.google.com/group/twitter-development-talk/subscribe?hl=en