Re: [twitter-dev] Track streaming : how to match tweets?

2009-12-03 Thread Dave Sherohman
On Wed, Dec 02, 2009 at 03:15:21PM -0800, Julien wrote:
> If I get a tweet, the only way to know what keyword it matches is to
> compare all of its words to the words I'm tracking... (mayvbe there is
> something easier).
> 
> That's quite "hard" but it becomes harder if I add operands. Say I
> have a search "romeo+juliet". When I get a tweet, I need to compare it
> to all the keywords, plus all the combinations :/ Technically that is
> not even doable if i have more than 10 keywords, since there are a LOT
> of combinations possible.

You are mistaken.  Provided you have appropriate support from your
language or its libraries, accomplishing this is trivial.  Using Perl
and Regexp::Assemble, FishTwits is currently tracking 1,358 words/
phrases and, for each tweet, building a list of which words/phrases
appear in that tweet.  It's very doable (quick, even), despite having
far more than 10 keywords involved.

-- 
Dave Sherohman


Re: [twitter-dev] Track streaming : how to match tweets?

2009-12-02 Thread John Kalucki
Julien,

Parsing the status text and matching the tokens against your local predicate
set is neither computationally complex nor particularly difficult to code.
In fact, such an implementation is practically indistinguishable from
indicating which predicates matched on our end -- you'd still have to match
keywords to your predicates.

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.


On Wed, Dec 2, 2009 at 3:15 PM, Julien  wrote:

> Hey,
>
> I am pretty sure this is an issue that was raised by several people,
> but I'd love to see if we can find a solution.
>
> Right now, with the streaming API, I can track keywords, the problem,
> when I deal with 5 different keywords is to identify which keyword a
> tweet matches.
>
> Say, I subscribe to the following keywords : julien,superfeedr,google
>
> If I get a tweet, the only way to know what keyword it matches is to
> compare all of its words to the words I'm tracking... (mayvbe there is
> something easier).
>
> That's quite "hard" but it becomes harder if I add operands. Say I
> have a search "romeo+juliet". When I get a tweet, I need to compare it
> to all the keywords, plus all the combinations :/ Technically that is
> not even doable if i have more than 10 keywords, since there are a LOT
> of combinations possible.
>
> What I'm suggesting is basically that Twitter would tell me which
> keyword this tweet matches. Twitter has the information, since it
> sends me only this specific tweet, right? That would definitely change
> the schema a little bit, but makes things easier for a lot of people
> and the investment is not so big on your side I think
>
> Doable?
>
>