Re: OpenCalais alternatives for use with Nutch?

Mischa Tuffield Fri, 02 Jul 2010 09:00:55 -0700

Or you could have a look at the NLP stuff out of Sheffield University in the UK 
:


GATE : 

http://gate.ac.uk/

Mischa *not an NLP expert
On 2 Jul 2010, at 16:58, Kevin Conor wrote:

> That sounds like Named Entity Recognition.  It's typically done with a
> Conditional Random Field.  You could take a look at
> http://nlp.stanford.edu/software/CRF-NER.shtml.
> 
> On Fri, Jul 2, 2010 at 10:53 AM, Alex McLintock 
> <[email protected]>wrote:
> 
>> I'm quite interested in OpenCalais - a Reuters/Thompson initiative. It
>> is a web service to take your free text and identify important terms
>> in it like people, businesses, places, and so on. If you are the
>> document owner you can submit your document to their web site and get
>> back important tags saying what this document is about. I'd like to
>> tag this sort of data and feed it into a Lucene style index so that it
>> can be used in searches AND in focussed/topical crawls.
>> 
>> Now, here comes the problem. When we crawl the web we don't own the
>> documents we are crawling so we don't really have permission to use
>> Reuters' servers to do this analysis. (Maybe we could cut a deal
>> though if we were a big enough company).
>> 
>> So has anyone else looked at alternatives to OpenCalais which takes
>> free text and tries to understand what it is about? I've been looking
>> for software to do this but nothing seems suitable.
>> 
>> Alex
>> 

___________________________________
Mischa Tuffield PhD
Email: [email protected]
Homepage - http://mmt.me.uk/
Garlik Limited, 1-3 Halford Road, Richmond, TW10 6AW
+44(0)845 645 2824  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Re: OpenCalais alternatives for use with Nutch?

Reply via email to