On Wed, May 27, 2009 at 5:04 PM, Christian Heilmann < [email protected]> wrote:
> > http://isithackday.com/hacks/placemaker/tweet-locations.php?user=codepo8 > > What do you think? Hey, nicely done. I like the maps. Are you sending the raw tweet texts to the Yahoo Placemaker service? Do you try to use the tweet['user']['location'] data at all? It's interesting to look at the quality level of this yahoo service. Unfortunately, it makes lots of mistakes. I was looking at my own feed (since i know what i was trying to talk about): http://isithackday.com/hacks/placemaker/tweet-locations.php?user=brendan642 Out of 10 identifications, 5 of them are errors. - "#scala" != "Monte Scala, Switzerland" - i meant the programming language. - "middle-of-the-street *valencia* parking" != "valencia, CA" - that's a street name (in san francisco). - "go easy on the *cancun*" != "cancun, MX" - minor error: name of a (mexican) restaurant. - "sports, *mission*, *bay* bridge" != "mission bay, SF, CA" - that's a list of several things. the "mission bay" neighborhood is not one of them .. "bay" is part of the multiword "bay bridge". and most humorously, - "giant *ec2* nodes" != "EC2 area code, London, England" ... I haven't used this Yahoo service before, but I bet that, if it's any good at all, it's probably optimized for web pages or big documents, where there are many more context words to help disambiguate and safely identify. There hasn't been a ton of NLP research on really short twitter-length messages, and I suspect the problem is harder, and might require somewhat different algorithms, than document-sized NLP problems. Are there any applications for this where a 50% error rate is OK? -Brendan -- Brendan O'Connor - http://anyall.org
