Hi Alex, As far as I am aware Zemanta [1] does something similar to OpenCalais, but it is mainly used in text in blogs, as apposed to news related text.
I might be worth you checking out their stuff, I could be wrong though... Mischa [1] http://developer.zemanta.com/ On 2 Jul 2010, at 16:53, Alex McLintock wrote: > I'm quite interested in OpenCalais - a Reuters/Thompson initiative. It > is a web service to take your free text and identify important terms > in it like people, businesses, places, and so on. If you are the > document owner you can submit your document to their web site and get > back important tags saying what this document is about. I'd like to > tag this sort of data and feed it into a Lucene style index so that it > can be used in searches AND in focussed/topical crawls. > > Now, here comes the problem. When we crawl the web we don't own the > documents we are crawling so we don't really have permission to use > Reuters' servers to do this analysis. (Maybe we could cut a deal > though if we were a big enough company). > > So has anyone else looked at alternatives to OpenCalais which takes > free text and tries to understand what it is about? I've been looking > for software to do this but nothing seems suitable. > > Alex ___________________________________ Mischa Tuffield PhD Email: [email protected] Homepage - http://mmt.me.uk/ Garlik Limited, 1-3 Halford Road, Richmond, TW10 6AW +44(0)845 645 2824 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

