Hello Everyone, Semiocast has been developing technologies for analyzing micro-messages (such as Twitter and Facebook messages) and today Semiocast is making two of these technologies available to the public as a web service through its API: - language identification from short text: 61 languages recognized (including non-latin languages such as Arabic, Chinese, Farsi, Japanese, Korean, Pashto, Russian, Ukrainian…); - location extraction from short text: city and country recognized from free form text or GPS coordinates.
Although these technologies were specifically developed for short free form texts, Semiocast API is tailored for analyzing Twitter messages (and other networks such as Facebook, StatusNet…). With Semiocast API, you can: - analyze status updates and timelines to extract language and user location; - filter timelines according to languages and locations; - prepare annotations based on semantic analysis for a message to be posted on Twitter using the forthcoming annotation feature. We are very excited by what you will be able to build using this API. For example, you can use Semiocast API to compute statistics about messages or to filter out messages in languages end-users do not understand. Semiocast has used the very same technology for the February and March 2010 studies that revealed that less than 50% of tweets were not in English and gave a break down of tweets by languages and countries. Future releases will give access to more of our technologies such as tokenization, sentiment analysis and topic extraction. For more information, please visit Semiocast API website: http://developer.semiocast.com/ Let us know what you think! Jean-Charles