Re: OpenCalais alternatives for use with Nutch?

Kevin Conor Fri, 02 Jul 2010 08:58:37 -0700

That sounds like Named Entity Recognition.  It's typically done with a
Conditional Random Field.  You could take a look at
http://nlp.stanford.edu/software/CRF-NER.shtml.


On Fri, Jul 2, 2010 at 10:53 AM, Alex McLintock <[email protected]>wrote:

> I'm quite interested in OpenCalais - a Reuters/Thompson initiative. It
> is a web service to take your free text and identify important terms
> in it like people, businesses, places, and so on. If you are the
> document owner you can submit your document to their web site and get
> back important tags saying what this document is about. I'd like to
> tag this sort of data and feed it into a Lucene style index so that it
> can be used in searches AND in focussed/topical crawls.
>
> Now, here comes the problem. When we crawl the web we don't own the
> documents we are crawling so we don't really have permission to use
> Reuters' servers to do this analysis. (Maybe we could cut a deal
> though if we were a big enough company).
>
> So has anyone else looked at alternatives to OpenCalais which takes
> free text and tries to understand what it is about? I've been looking
> for software to do this but nothing seems suitable.
>
> Alex
>

Re: OpenCalais alternatives for use with Nutch?

Reply via email to