[twitter-dev] Re: Streaming Api - Keywords matched

2009-12-02 Thread Julien
I kind of disagree with you here... not because it's hard to match the users (the algo you offered is what we use) but because you assume that queries will juts match 1 single keyword. I think this is not doable if you start introducing things like + or or || or , because you need to compare a

RE: [twitter-dev] Re: Streaming Api - Keywords matched

2009-12-02 Thread Bryan Nehl
Have you researched Vector Space Model (VSM) and cosine theta calculations or approximations? You could calculate one of the approximations on the incoming stream yourself. Check out this paper http://www.cse.ust.hk/~dlee/Papers/ir/ieee-sw-rank.pdf Regards, Bryan

[twitter-dev] Re: Streaming Api - Keywords matched

2009-11-05 Thread Joel Strellner
A little late to this convo, but I disagree with the need for this feature. It adds extra complexity to twitter that really should be on the application level, and, since the streaming API only returns one tweet, even if it matched two or more keywords that you are watching, it'd add extra load on

[twitter-dev] Re: Streaming Api - Keywords matched

2009-11-03 Thread Adam Green
I agree with the idea, since I too have this need, but I think that you'll still need to check the existence of matches in filtered stream results. The algorithm used by this API doesn't always return what you'd expect or need, such as making sure the matches are separate words, or they are used

[twitter-dev] Re: Streaming Api - Keywords matched

2009-11-03 Thread John Kalucki
The assumption is that client services will, in any case, have to parse and route statuses to potentially multiple end-users. Providing this sort of hint wouldn't eliminate the need to parse the status and would likely result in duplicate effort. We're aware that we are, in some use cases,

[twitter-dev] Re: Streaming Api - Keywords matched

2009-11-03 Thread John Kalucki
The Streaming API and the Search indexer both tee off the same point in the new status event pipeline. New statuses are born in the web containers and queued for a cluster of processes that begin the offline processing pipeline. This first process does many things, including routing statuses to

[twitter-dev] Re: Streaming Api - Keywords matched

2009-11-03 Thread Fabien Penso
I agree, however it would help a lot because instead of doing : for keyword in all_keywords if tweet.match(keyword) //matched, notify users end end we could do for keyword in keywords_matched // same as above end for matching 5,000 keywords, it would bring the first loop from 5,000 to

[twitter-dev] Re: Streaming Api - Keywords matched

2009-11-03 Thread John Kalucki
May I suggest a potentially much more efficient algorithm? Place all keywords in a HashMap that maps keywords to a list of subscribed users. Tokenize the status text, and look up each token in the hash table to deliver the status to each subscribed user. Within the user, apply a generational