Hi,

Open Source NLP platforms like GATE (http://gate.ac.uk) or Apache UIMA are
typically used for these types of tasks. GATE in particular comes with an
application called ANNIE which does Named Entity Recognition. OpenCalais
does that as well and should be easy to embed, but it can't be tuned to do
more specific things unlike UIMA or GATE based applications.

Depending on the architecture you have in mind it could be worth
investigating Nutch and add the NER as a custom plugin; NLP being often a
CPU intensive task you could leverage the scalability of Hadoop in Nutch.
There is a patch which allows to delegate the indexing to SOLR. As someone
else already said these named entities could then be used as facets.

HTH

Julien
-- 
DigitalPebble Ltd
http://www.digitalpebble.com

2008/10/24 Rogerio Pereira <[EMAIL PROTECTED]>

> I agree Ryan and I would like see a completly integration between solr,
> nutch, tika and mahout in the future.
>
> 2008/10/24 Ryan McKinley <[EMAIL PROTECTED]>
>
> > This is not something solr does currently...
> >
> > It sounds like something that should be added to Mahout:
> > http://lucene.apache.org/mahout/
> >
> >
> >
> > On Oct 24, 2008, at 4:18 PM, Charlie Jackson wrote:
> >
> >  During a recent sales pitch to my company by FAST, they mentioned entity
> >> extraction. I'd never heard of it before, but they described it as
> >> basically recognizing people/places/things in documents being indexed
> >> and then being able to do faceting on this data at query time. Does
> >> anything like this already exist in SOLR? If not, I'm not opposed to
> >> developing it myself, but I could use some pointers on where to start.
> >>
> >>
> >>
> >> Thanks,
> >>
> >> - Charlie
> >>
> >>
> >
>
>
> --
> Regards,
>
> Rogério (_rogerio_)
>
> [Blog: http://faces.eti.br]  [Sandbox: http://bmobile.dyndns.org]
>  [Twitter:
> http://twitter.com/ararog]
>
> "Faça a diferença! Ajude o seu país a crescer, não retenha conhecimento,
> distribua e aprenda mais."
> (http://faces.eti.br/2006/10/30/conhecimento-e-amadurecimento)
>

Reply via email to