[twitter-dev] Recommended ways to demultiplex the search stream with thousands of searches

2010-04-16 Thread Jeffrey Greenberg
So I'm looking at the streaming api (track), and I've got thousands of
searches.  ( http://tweettronics.com ) I mainly need it to deal with
terms that are very high volume, and to deal search api rate limiting.

The main difficulty I'm thinking about is the best way to de-multiplex
the stream back into the individual searches I'm trying to accomplish.

1. How do you handle if the searches are more complex than single
terms, but a boolean expression... Do you convert the boolean into
something like regex, and then run that regex on every tweet... So if
I have several thousand regexs and thousands of tweets, that's a huge
amount of processing just to demultiplex... But is that the way to go?
2 And if the search is just a simple expression, do folks simply
demultiplex by doing a string search for each word in the search for
every received tweet... like above?

I'm looking for recommended ways to demultiplex the search stream...

Thanks,
jeffrey greenberg


-- 
Subscription settings: 
http://groups.google.com/group/twitter-development-talk/subscribe?hl=en


[twitter-dev] Recommended ways to demultiplex the search stream with thousands of searches

2010-04-16 Thread Jeffrey Greenberg
So I'm looking at the streaming api (track), and I've got thousands of
searches.  ( http://tweettronics.com ) I mainly need it to deal with
terms that are very high volume, and to deal search api rate limiting.

The main difficulty I'm thinking about is the best way to de-multiplex
the stream back into the individual searches I'm trying to accomplish.

1. How do you handle if the searches are more complex than single
terms, but a boolean expression... Do you convert the boolean into
something like regex, and then run that regex on every tweet... So if
I have several thousand regexs and thousands of tweets, that's a huge
amount of processing just to demultiplex... But is that the way to go?
2 And if the search is just a simple expression, do folks simply
demultiplex by doing a string search for each word in the search for
every received tweet... like above?

I'm looking for recommended ways to demultiplex the search stream...

Thanks,
jeffrey greenberg


-- 
Subscription settings: 
http://groups.google.com/group/twitter-development-talk/subscribe?hl=en


Re: [twitter-dev] Recommended ways to demultiplex the search stream with thousands of searches

2010-04-16 Thread Mark McBride
One idea off the top of my head: write tweets to something like Lucene, and
then rely on its more sophisticated query engine to pull tweets.  You'll
sacrifice some latency here of course.

  ---Mark

http://twitter.com/mccv


On Fri, Apr 16, 2010 at 3:47 PM, Jeffrey Greenberg 
jeffreygreenb...@gmail.com wrote:

 So I'm looking at the streaming api (track), and I've got thousands of
 searches.  ( http://tweettronics.com ) I mainly need it to deal with
 terms that are very high volume, and to deal search api rate limiting.

 The main difficulty I'm thinking about is the best way to de-multiplex
 the stream back into the individual searches I'm trying to accomplish.

 1. How do you handle if the searches are more complex than single
 terms, but a boolean expression... Do you convert the boolean into
 something like regex, and then run that regex on every tweet... So if
 I have several thousand regexs and thousands of tweets, that's a huge
 amount of processing just to demultiplex... But is that the way to go?
 2 And if the search is just a simple expression, do folks simply
 demultiplex by doing a string search for each word in the search for
 every received tweet... like above?

 I'm looking for recommended ways to demultiplex the search stream...

 Thanks,
 jeffrey greenberg


 --
 Subscription settings:
 http://groups.google.com/group/twitter-development-talk/subscribe?hl=en



Re: [twitter-dev] Recommended ways to demultiplex the search stream with thousands of searches

2010-04-16 Thread Andrew Badera
I know it's not Web 2.0-cool, but I'm writing to SQL Server 2008
(Standard, x64) and using fulltext indexing/searching from there. On
production hardware, I hardly see any real impact as far as latency
goes, even on busy predicates. I can't imagine that the
lighter-weight/more efficient Lucene would have a significantly
perceivable impact.

∞ Andy Badera
∞ +1 518-641-1280 Google Voice
∞ This email is: [ ] bloggable [x] ask first [ ] private
∞ Google me: http://www.google.com/search?q=andrew%20badera



On Fri, Apr 16, 2010 at 6:59 PM, Mark McBride mmcbr...@twitter.com wrote:
 One idea off the top of my head: write tweets to something like Lucene, and
 then rely on its more sophisticated query engine to pull tweets.  You'll
 sacrifice some latency here of course.
   ---Mark

 http://twitter.com/mccv


 On Fri, Apr 16, 2010 at 3:47 PM, Jeffrey Greenberg
 jeffreygreenb...@gmail.com wrote:

 So I'm looking at the streaming api (track), and I've got thousands of
 searches.  ( http://tweettronics.com ) I mainly need it to deal with
 terms that are very high volume, and to deal search api rate limiting.

 The main difficulty I'm thinking about is the best way to de-multiplex
 the stream back into the individual searches I'm trying to accomplish.

 1. How do you handle if the searches are more complex than single
 terms, but a boolean expression... Do you convert the boolean into
 something like regex, and then run that regex on every tweet... So if
 I have several thousand regexs and thousands of tweets, that's a huge
 amount of processing just to demultiplex... But is that the way to go?
 2 And if the search is just a simple expression, do folks simply
 demultiplex by doing a string search for each word in the search for
 every received tweet... like above?

 I'm looking for recommended ways to demultiplex the search stream...

 Thanks,
 jeffrey greenberg


 --
 Subscription settings:
 http://groups.google.com/group/twitter-development-talk/subscribe?hl=en