Hi David,
The challenge you described is usually referred to as "(in document)
coreference resolution". It is very related to the entity disambiguation
problem, as entity disambiguation can be seen as cross-document coreference
resolution (by using identifiers from a pre-established KB). However I
think it's worth thinking of it separately from (but in close connection
with) the targeted entity disambiguation problem. This is because there are
many alternatives with pros and cons, including:
1. clustering mentions at recognition time and then disambiguating,
2. clustering mentions after disambiguating, or
3. jointly disambiguating/clustering.

In DBpedia Spotlight we use a very simple heuristic rule: if a first name
or last name is spotted, we look backwards for a full name and assign
everyone in the chain to the same entity [1]. It is a very crude
assumption, but works quite well in practice.

Cheers,
Pablo

[1]
https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/core/src/main/scala/org/dbpedia/spotlight/filter/annotations/CoreferenceFilter.scala

On Thu, Aug 23, 2012 at 5:27 PM, David Riccitelli <da...@insideout.io>wrote:

> Thanks Kritarth,
>
> Let me discuss another case, with another example: there's a text like this
> "Valentino Rossi won the MotoGP. Everybody loves Rossi.".
>
> Right now the enhancer correctly identifies "Valentino Rossi (racer)" in
> the TextAnnotation "Valentino Rossi", while makes different suggestions for
> the TextAnnotation "Rossi" , sorted by ranking (unfortunately Valentino
> Rossi non being the first):
>  - "Daniele De Rossi (soccer player)"
>  - "Vasco Rossi (singer)"
>  - "Valentino Rossi (racer)"
>
> In this case would the disambiguation engine boost the score of the
> EntityAnnotation "Valentino Rossi (racer)"?
>
> BR,
> David
>
> On Thu, Aug 23, 2012 at 4:43 PM, kritarth anand <kritarth.an...@gmail.com
> >wrote:
>
> > Hi David,
> > Thanks for your interest.
> >
> > What would a sentence like this yield, "Paris is not the city in United
> > States" ?
> >
> > It would yield Paris,Texas too. Well those are one the reasons the
> problem
> > is very hard.
> >
> > Kritarth
> >
> > On Thu, Aug 23, 2012 at 7:06 PM, David Riccitelli <da...@insideout.io
> > >wrote:
> >
> > > What would a sentence like this yield, "Paris is not the city in United
> > > States" ?
> > >
> > > On Thu, Aug 23, 2012 at 4:23 PM, kritarth anand <
> > kritarth.an...@gmail.com
> > > >wrote:
> > >
> > > > Dear members of Stanbol community,
> > > >
> > > > I hereby would like to discuss about the next few iterations of the
> > > > Disambiguation Engine. The Disambiguation Engine, To Disambiguate
> > Engines
> > > > few versions of Engines have been prepared. I would like to briefly
> > > > describe them below. I hope to become a permanent committer for
> Stanbol
> > > if
> > > > my contribution is considered after this GSOC period. I will be
> > > committing
> > > > the code versions soon. And applying patch to JIRA soon.
> > > >
> > > > 1. How disambiguation Engine problem was approached.
> > > >  For certain text annotations there are might be many Entity
> > Annotations
> > > > mapped, It was required to rank them in the order of there
> likelihood.
> > > > Paris is the a small city in the United States.
> > > >
> > > > a.The Paris is this sentence without disambiguation (using Dbpedia as
> > > > vocabulary). There are three entity annotations mapped 1. Paris,
> > France ,
> > > > 2. Paris, Texas 3. Paris, *Something* (The entity mapped with highest
> > > > fise:confidence is Paris, France.)
> > > > b. Now how would disambiguation by humans take place. On reading the
> > line
> > > > an individual thinks of the context the text is referring to. Doing
> so
> > he
> > > > realizes that since the text talks about Paris and also about United
> > > > States. The Paris mentioned here is More Like Paris,Texas(which is in
> > > > United States) and therefore must refer to it.
> > > > c. The approach followed in implementation takes inspiration from the
> > > > example and works in the following manner somewhat follows the pseudo
> > > code
> > > > below.
> > > >     for( K: TextAnnotations)
> > > >     {    List EntityAnnotations =getEntityAnnotationsRelated(K);
> > > >         Context=GetContextInformation(K);
> > > >
> > > >         List Results=QueryMLTVocabularies(K, Context);
> > > >         updateConfidences(Result,EntityAnnotations)
> > > >     }
> > > >
> > > > d. My current approach to handle disambiguation involved a lot of
> > > > variations however for the purpose of simplicity I'll talk only about
> > > > differences in obtaining "Context".
> > > >
> > > > 2. The Context Procurement:
> > > > a. All Entity Context: The context would be decided on by all the
> > > > textannotations of the text. It proves to show good results for
> shorter
> > > > texts, but introduces lot of redundant annotations in longer ones
> > making
> > > > context less useful
> > > > b. All link Context: The context is decided on the basis of site or
> > > > reference link associated with the text annotations, which of course
> > can
> > > be
> > > > required to disambiguate. So it does not behave in a very good
> fashion
> > > > c. Selection Context: The selection context is basically contains
> text
> > > one
> > > > sentence prior and after the current one. Also another version worked
> > > with
> > > > Text Annotations in this region of text.
> > > > d. Vicinity Entity Context: The vicinity annotation detection
> measures
> > > > distance in the neighborhood of the text annotation.
> > > >
> > > > 3. Future
> > > > a. With a running POC of this Engine it can be used to create an
> > advanced
> > > > version like the Spotlight approach or using Markov Logic Networks
> > > > discussed earlier.
> > > >
> > >
> > >
> > >
> > > --
> > > David Riccitelli
> > >
> > >
> > >
> >
> ********************************************************************************
> > > InsideOut10 s.r.l.
> > > P.IVA: IT-11381771002
> > > Fax: +39 0110708239
> > > ---
> > > LinkedIn: http://it.linkedin.com/in/riccitelli
> > > Twitter: ziodave
> > > ---
> > > Layar Partner Network<
> > >
> >
> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
> > > >
> > >
> > >
> >
> ********************************************************************************
> > >
> >
>
>
>
> --
> David Riccitelli
>
>
> ********************************************************************************
> InsideOut10 s.r.l.
> P.IVA: IT-11381771002
> Fax: +39 0110708239
> ---
> LinkedIn: http://it.linkedin.com/in/riccitelli
> Twitter: ziodave
> ---
> Layar Partner Network<
> http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
> >
>
> ********************************************************************************
>



-- 
---
Pablo N. Mendes
http://pablomendes.com
Events: http://wole2012.eurecom.fr

Reply via email to