So we have customer that is searching, for example, for hotels.com. So we use the search api and we get from Twitter a tweet that has no such text in it, but it turns out that the shortened URL contains the string 'hotels.com':
Here's the tweet: Siam Bayview Hotel Pattaya, Beach Rd. from THB 2,010 incl breakfast Special Rate http://bit.ly/295HOI Thailand hotels He're the walked bit.ly url: http://www.r24.org/patong-beach-hotels.com/pattaya/siambayview/ In this case, this match isn't good. They don't want r24.org stuff, they want hotels.com stuff... On the other hand, it's great when it really shows hotels.com stuff.. I'm not sure what the 'right" thing to do is at this moment, as I'm reacting to the customer's urgency and problem in getting unrelated stuff showing up in their search... I'm not sure how I should address this: 1. recommend that twitter do some sort of mod to the search api ( I don't have a good idea at the moment about what you should do: make such url walking optional? etc?) 2. do some sort of processing on our end, and communicating about better about what search does to our customers So: a. What's ya'll thoughts on this one? b. I believe that you (twitter) walk some shorteners but not all of them: e.g. bit.ly urls and your own shortener What is the current list that you do walk? This is related to entity parsing discussion here: http://groups.google.com/group/twitter-development-talk/browse_thread/thread/9b869a9fe4d4252e/861a2aa59b563f33?lnk=gst&q=search+url#861a2aa59b563f33 Thanks, Jeffrey Greenberg tweettronics.com