Or you could have a look at the NLP stuff out of Sheffield University in the UK :
GATE : http://gate.ac.uk/ Mischa *not an NLP expert On 2 Jul 2010, at 16:58, Kevin Conor wrote: > That sounds like Named Entity Recognition. It's typically done with a > Conditional Random Field. You could take a look at > http://nlp.stanford.edu/software/CRF-NER.shtml. > > On Fri, Jul 2, 2010 at 10:53 AM, Alex McLintock > <[email protected]>wrote: > >> I'm quite interested in OpenCalais - a Reuters/Thompson initiative. It >> is a web service to take your free text and identify important terms >> in it like people, businesses, places, and so on. If you are the >> document owner you can submit your document to their web site and get >> back important tags saying what this document is about. I'd like to >> tag this sort of data and feed it into a Lucene style index so that it >> can be used in searches AND in focussed/topical crawls. >> >> Now, here comes the problem. When we crawl the web we don't own the >> documents we are crawling so we don't really have permission to use >> Reuters' servers to do this analysis. (Maybe we could cut a deal >> though if we were a big enough company). >> >> So has anyone else looked at alternatives to OpenCalais which takes >> free text and tries to understand what it is about? I've been looking >> for software to do this but nothing seems suitable. >> >> Alex >> ___________________________________ Mischa Tuffield PhD Email: [email protected] Homepage - http://mmt.me.uk/ Garlik Limited, 1-3 Halford Road, Richmond, TW10 6AW +44(0)845 645 2824 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

