Thanks Mark, but as I said, we need to fetch more complex feeds to. So we'll use the OR with the simple query, and then query the search API with the complex query to see if a given tweet matches what we need!
Julien On Dec 8, 12:55 am, Mark McBride <mmcbr...@twitter.com> wrote: > Note that search API whitelisting is different from regular API > whitelisting, and getting a 20k hour limit there is much more > restrictive. > > I still haven't seen a case where you couldn't do the matching on your > side. As John says, with the streaming API right now you can only > match simple terms, so the complex terms aren't a factor. In fact the > track you posted won't actually function as you intend with the > streaming API. You could track for tweets containing starbucks or > free. But currently that's it. "starbucks AND free" is something > you'd have to implement on your side. Same with near. > > > > > > On Mon, Dec 7, 2009 at 3:45 PM,Julien<julien.genest...@gmail.com> wrote: > > Hum... ok... sad, but I have an idea. Please tell me if this is > > stupid. > > > So, for each tweet I receive, I know what searches it _may_ match. > > Right? > > So, with all these "candidates" query, what I can do is perform them > > against the regular search API (as long as they're complex). If the > > result from the polling includes them, then, I know that the searches > > matches and I don't have to build anything on top of what you built. > > > Let's take an example : > > - If I have a search for "starbuck AND free near:94123" > > - I track "starbuck" with the streaming API > > - Whenever you guys send me a tweet for this track > > - I check internally all the queries that may match Starbucks > > - I perform them on your API > > - if the tweet you sent me is in the results, then I know this tweet > > is valid, > > - if not, I discard it. > > > My only concern here is the 20k/hour limit. I think this is still > > doable, because > > 1) we will only make queries to the search API when we receive > > notifications > > 2) we will only make queries to the search API for complex queries > > (IE : AND, +, "" or near: > > > The pros : > > - whener you guys change/add stuff to your search DSL, I don't have to > > change anything on my side. > > > How does that sound? > > > Thanks John anyway for your great help! > > >Julien > > > On Dec 5, 3:32 pm, John Kalucki <j...@twitter.com> wrote: > >> This could only make sense if the Streaming API supported "search engine > >> logic". Currently Streaming only supports keyword matching -- you have to > >> post-process to add additional predicate operators beyond OR. You can > >> reproduce the keyword match in a few lines of code, and the rest is > >> (currently) all up to you anyway. Just remember that a given tweet could > >> have triggered multiple predicates. > > >> Beyond being a low priority feature, rendering and delivering custom > >> responses per user would be a performance risk. We currently can support a > >> very large number of filter clients per server, and we want to preserve > >> this > >> performance. > > >> -John Kaluckihttp://twitter.com/jkalucki > >> Services, Twitter Inc. > > >> On Sat, Dec 5, 2009 at 3:18 AM,Julien<julien.genest...@gmail.com> wrote: > >> > Thanks Dave, > > >> > I think I get it from your example... yet, in our case, we have > >> > several thousands of keywords, and many many complex searches (with > >> > filter:, "and", "or", :near ... an so on). > > >> > I keep thinking that instead of re-implementing on my side the search > >> > engine logic that Twitter has, it would be simpler for them to also > >> > send the macthing keywords. And even more elegant solution (yet > >> > slightly more complex) would be to be able to parse parameters along > >> > with the search I give, such as a unique search_id (that I can store > >> > on my side) and then, instead of giving me the matched keywords/search > >> > terms, they could just give me back that search_id. That would be > >> > something like this : > > >> > Right now it is : > >> > POST http://stream.twitter.com/1/statuses/filter.json > >> > track=paris,twitter+superfeedr,<http://stream.twitter.com/1/statuses/filter.json%0Atrack=paris,twitte...,>"julien > >> > near:france" > > >> > It would be awesome if I could do : > >> > POST http://stream.twitter.com/1/statuses/filter.json > >> > track={"paris":"my_search_1","twitter > >> > +superfeedr":"my_search_2","juliennear:france":"my_search_3"} > > >> > And then, upon notifications, they would just pass me this search key > >> > my_search_xx > > >> > I know and understand and implies a little bit of work for Twitter, > >> > but it also removes the pain from each susbcriber to this streaming > >> > API who has to re-implement again and again the "search engine" from > >> > Twitter. > > >> > On Dec 4, 11:33 am, Dave Sherohman <d...@fishtwits.com> wrote: > >> > > On Thu, Dec 03, 2009 at 03:12:05PM -0800,Julienwrote: > >> > > > Well, then I'd need some help with that... > > >> > > > Again, it's easy with single search keywords, but I haven't found a > >> > > > solution for combined searches like twitter+stream or photo+Paris... > >> > > > because I would have to compare each combination of tokens in the > >> > > > tweet... > > >> > > > Can someone give more details. > > >> > > I don't mean to be flogging my site today, but take a look > >> > athttp://fishtwits.comfortheresults I'm producing (just click the logo > >> > > at the top of the page to view the full site without logging in): Any > >> > > tweets from users followed by FishTwits are scanned for fishing-related > >> > > terms and all such terms found in the tweet are displayed below it. At > >> > > this moment, for instance, the first displayed tweet shows matches for > >> > > both "Fly Fishing" and "Sole". > > >> > > This is accomplished with the following Perl code (edited to remove > >> > > parts which aren't directly relevant): > > >> > > sub load_from_text { > >> > > my ($class, $text) = @_; > > >> > > unless($topic_regex) { > >> > > require Regexp::Assemble; > >> > > my $ra = Regexp::Assemble->new( > >> > > chomp => 0, > >> > > anchor_word_begin => 1, > >> > > anchor_word_end => 1, > >> > > ); > >> > > for my $topic (@topic_list) { > >> > > $ra->add(lc $topic); > >> > > } > >> > > $topic_regex = $ra->re; > >> > > } > > >> > > $text = lc $text; > >> > > my @topics = $text =~ /$topic_regex/g; > > >> > > return sort @topics; > > >> > > } > > >> > > It first uses Regexp::Assemble to build a $topic_regex[1] which will > >> > > match any of the words/phrases found in the topic table, then does a > >> > > global match of $text (the body of the tweet being examined) against > >> > > $topic_regex, capturing all matches into the array @topics, which is > >> > > then sorted and returned to the caller. > > >> > > After the match is performed, @topics contains every search term which > >> > > is matched, no matter how many there may be, which should fill your > >> > > requirement for "combined searches", unless I'm misunderstanding it. > > >> > > If you mean you would want that "Fly Fishing", "Sole" tweet to return > >> > > three hits rather than two ("Fly Fishing", "Sole", "Fly Fishing+Sole"), > >> > > that's easy enough to create from @topics, just generate every > >> > > permutation of the terms which the individual tweet matched. > > >> > > [1] If you're only dealing with 10 or so keywords, you'd probably be > >> > > just as well off building the regex by hand. The main reason I'm using > >> > > Regexp::Assemble to do it on the fly is because manually creating and > >> > > then maintaining a regex that will efficiently match any of 1300 terms > >> > > would be a nightmare. > > >> > > -- > >> > > Dave Sherohman > > -- > ---Mark > > http://twitter.com/mccv